Summary:
- Signed 16-bit should have priority over unsigned.
- For la, unsigned 16-bit must use ori+addu rather than directly use ori.
- Correct tests on 32-bit immediates with 64-bit predicates by
sign-extending the immediate beforehand. For example, isInt<16>(0xffff8000)
should be true and use addiu.
Also split li/la testing into separate files due to their size.
Reviewers: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10967
llvm-svn: 242139
This patch allows VSX swap optimization to succeed more frequently.
Specifically, it is concerned with common code sequences that occur
when copying a scalar floating-point value to a vector register. This
patch currently handles cases where the floating-point value is
already in a register, but does not yet handle loads (such as via an
LXSDX scalar floating-point VSX load). That will be dealt with later.
A typical case is when a scalar value comes in as a floating-point
parameter. The value is copied into a virtual VSFRC register, and
then a sequence of SUBREG_TO_REG and/or COPY operations will convert
it to a full vector register of the class required by the context. If
this vector register is then used as part of a lane-permuted
computation, the original scalar value will be in the wrong lane. We
can fix this by adding a swap operation following any widening
SUBREG_TO_REG operation. Additional COPY operations may be needed
around the swap operation in order to keep register assignment happy,
but these are pro forma operations that will be removed by coalescing.
If a scalar value is otherwise directly referenced in a computation
(such as by one of the many XS* vector-scalar operations), we
currently disable swap optimization. These operations are
lane-sensitive by definition. A MentionsPartialVR flag is added for
use in each swap table entry that mentions a scalar floating-point
register without having special handling defined.
A common idiom for PPC64LE is to convert a double-precision scalar to
a vector by performing a splat operation. This ensures that the value
can be referenced as V[0], as it would be for big endian, whereas just
converting the scalar to a vector with a SUBREG_TO_REG operation
leaves this value only in V[1]. A doubleword splat operation is one
form of an XXPERMDI instruction, which takes one doubleword from a
first operand and another doubleword from a second operand, with a
two-bit selector operand indicating which doublewords are chosen. In
the general case, an XXPERMDI can be permitted in a lane-swapped
region provided that it is properly transformed to select the
corresponding swapped values. This transformation is to reverse the
order of the two input operands, and to reverse and complement the
bits of the selector operand (derivation left as an exercise to the
reader ;).
A new test case that exercises the scalar-to-vector and generalized
XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll.
The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to
use CHECK-DAG instead of CHECK for two independent instructions that
now appear in reverse order.
There are two small unrelated changes that are added with this patch.
First, the XXSLDWI instruction was incorrectly omitted from the list
of lane-sensitive instructions; this is now fixed. Second, I observed
that the same webs were being rejected over and over again for
different reasons. Since it's sufficient to reject a web only once, I
added a check for this to speed up the compilation time slightly.
llvm-svn: 242081
Enable partial and runtime loop unrolling for NVPTX backend via
TTI::UnrollingPreferences with a small threshold. This partially unrolls
small loops which are often unrolled by the PTX to SASS compiler
and unrolling earlier can be beneficial.
llvm-svn: 242049
The two-address instruction pass will convert these back to v_mad_f32
if necessary.
Differential Revision: http://reviews.llvm.org/D11060
llvm-svn: 242038
The 64/128-bit vector types are legal if NEON instructions are
available. However, there was no matching patterns for @llvm.cttz.*()
intrinsics and result in fatal error.
This commit fixes the problem by lowering cttz to:
a. ctpop((x & -x) - 1)
b. width - ctlz(x & -x) - 1
llvm-svn: 242037
In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch.
I temporary removed the old intrinsics test (just to split this patch).
Half types are not covered here.
Differential Revision: http://reviews.llvm.org/D11134
llvm-svn: 242023
Register r12 ('ip') is used by GCC for this purpose
and hence is used here. As discussed on the GCC mailing
list, the register choice is an ABI issue and so
choosing the same register as GCC means
__builtin_call_with_static_chain is compatible.
A similar patch has just gone in the AArch64 backend,
so this is just the ARM counterpart, following the same
discussion.
Patch by Stephen Cross.
llvm-svn: 241996
While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized.
This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together.
Differential Revision: http://reviews.llvm.org/D11063
llvm-svn: 241989
r238842 added the TargetRecip system for controlling use of reciprocal
estimates for sqrt and division using a set of parameters that can be set by
the frontend. Clang now supports a sophisticated -mrecip option, and this will
allow that option to effectively control the relevant code-generation
functionality of the PPC backend.
llvm-svn: 241985
This adds support for the 'nest' attribute, which allows the static chain
register to be set for functions calls under non-Darwin PPC/PPC64 targets. r11
is the chain register (which the PPC64 ELF ABI calls the "environment
pointer"). For indirect calls under PPC64 ELFv1, this would normally be loaded
from the function descriptor, but providing an explicit 'nest' parameter will
override that process and use the value provided.
This allows __builtin_call_with_static_chain to work as expected on PowerPC.
llvm-svn: 241984
Disallow all mutation of `MCSubtargetInfo` expect the feature bits.
Besides deleting the assignment operators -- which were dead "code" --
this restricts `InitMCProcessorInfo()` to subclass initialization
sequences, and exposes a new more limited function called
`setDefaultFeatures()` for use by the ARMAsmParser `.cpu` directive.
There's a small functional change here: ARMAsmParser used to adjust
`MCSubtargetInfo::CPUSchedModel` as a side effect of calling
`InitMCProcessorInfo()`, but I've removed that suspicious behaviour.
Since the AsmParser shouldn't be doing any scheduling, there shouldn't
be any observable change...
llvm-svn: 241961
Most loads and stores are derived from pointers derived from
a kernel argument load inserted during argument lowering.
This was just using the EntryToken chain for the argument loads,
and any users of these loads were also on the EntryToken chain.
Return the chain of the lowered argument load so that dependent loads
end up on the correct chain.
No test since I'm not aware of any case where this actually
broke.
llvm-svn: 241960
Force all creators of `MCSubtargetInfo` to immediately initialize it,
merging the default constructor and the initializer into an initializing
constructor. Besides cleaning up the code a little, this makes it clear
that the initializer is never called again later.
Out-of-tree backends need a trivial change: instead of calling:
auto *X = new MCSubtargetInfo();
InitXYZMCSubtargetInfo(X, ...);
return X;
they should call:
return createXYZMCSubtargetInfoImpl(...);
There's no real functionality change here.
llvm-svn: 241957
Remove all calls to `MCSubtargetInfo::InitCPUSched()` and merge its body
into the only relevant caller, `MCSubtargetInfo::InitMCProcessorInfo()`.
We were only calling the former after explicitly calling the latter with
the same CPU; it's confusing to have both methods exposed.
Besides a minor (surely unmeasurable) speedup in ARM and X86 from
avoiding running the logic twice, no functionality change.
llvm-svn: 241956
Fixes PR23804: assertion failure in emitPrologue in the case of a
function with an empty frame and a dynamic alloca that needs stack
realignment. This is a typical case for AddressSanitizer.
llvm-svn: 241943
Summary:
Following the discussion on r241884, it's more reasonable to assume that a
target has no vector registers by default instead of letting every such
target overrides getNumberOfRegisters.
Therefore, this patch modifies BasicTTIImpl::getNumberOfRegisters to
return 0 when Vector is true, and partially reverts r241884 which
modifies NVPTXTTIImpl::getNumberOfRegisters.
It also fixes a performance bug in LoopVectorizer. Even if a target has
no vector registers, vectorization may still help ILP. So, we need both
checks to be false before disabling loop vectorization all together.
Reviewers: hfinkel
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11108
llvm-svn: 241942
This commit factors out common code from MergeBaseUpdateLoadStore() and
MergeBaseUpdateLSMultiple() and introduces a new function
MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a
strd/ldrd instruction into an strd/ldrd instruction with writeback where
possible.
Differential Revision: http://reviews.llvm.org/D10676
llvm-svn: 241928
Summary:
This code is based on AArch64 for modern backend good practice, and NVPTX for
virtual ISA concerns.
Reviewers: sunfish
Subscribers: aemerson, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11070
llvm-svn: 241923
Summary:
The target frame lowering's concrete type is always known in RegisterInfo, yet it's only sometimes devirtualized through a static_cast. This change adds an auto-generated static function <Target>GenRegisterInfo::getFrameLowering(const MachineFunction &MF) which does this devirtualization, and uses this function in all targets which can.
This change was suggested by sunfish in D11070 for WebAssembly, I figure that I may as well improve the other targets while I'm here.
Subscribers: sunfish, ted, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11093
llvm-svn: 241921
This improves the logic in several ways and is a preparation for
followup patches:
- First perform an analysis and create a list of merge candidates, then
transform. This simplifies the code in that you have don't have to
care to much anymore that you may be holding iterators to
MachineInstrs that get removed.
- Analyze/Transform basic blocks in reverse order. This allows to use
LivePhysRegs to find free registers instead of the RegisterScavenger.
The RegisterScavenger will become less precise in the future as it
relies on the deprecated kill-flags.
- Return the newly created node in MergeOps so there's no need to look
around in the schedule to find it.
- Rename some MBBI iterators to InsertBefore to make their role clear.
- General code cleanup.
Differential Revision: http://reviews.llvm.org/D10140
llvm-svn: 241920
Summary:
Without this patch, LoopVectorizer in certain cases (see loop-vectorize.ll)
produces code with complex control flow which hurts later optimizations. Since
NVPTX doesn't have vector registers in LLVM's sense
(NVPTXTTI::getRegisterBitWidth(true) == 32), we for now declare no vector
registers to effectively disable loop vectorization.
Reviewers: jholewinski
Subscribers: jingyue, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11089
llvm-svn: 241884
Apparently this is important, otherwise _except_handler3 assumes that
the registration node is corrupted and ignores it.
Also fix a bug in WinEHPrepare where we would insert code after a
terminator instruction.
llvm-svn: 241877
The runtime does not restore CSRs when transferring control back to the
function handling the exception. According to the experts on IRC, LLVM's
register allocator has no way to model register clobbers that only
happen on one edge of the CFG. For now, don't worry about trying to use
the meager three CSRs available on 32-bit X86 and just say that such
invokes preserve nothing.
llvm-svn: 241865
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.
Differential Revision: http://reviews.llvm.org/D10977
llvm-svn: 241827
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.
Reviewers: nadav, majnemer, sanjoy, RKSimon
Subscribers: RKSimon, llvm-commits
Differential Revision: http://reviews.llvm.org/D10767
llvm-svn: 241806
The nest attribute is currently supported on the x86 (32-bit) and x86-64
backends, but not on ARM (32-bit) or AArch64. This patch adds support for
nest to the AArch64 backend.
Register x18 is used by GCC for this purpose and hence is used here.
As discussed on the GCC mailing list the register choice is an ABI issue
and so choosing the same register as GCC means __builtin_call_with_static_chain
is compatible.
Patch by Stephen Cross.
llvm-svn: 241794
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.
Reviewers: nadav, majnemer, sanjoy, RKSimon
Subscribers: RKSimon, llvm-commits
Differential Revision: http://reviews.llvm.org/D10767
llvm-svn: 241790
Summary:
Remove empty subclass in the process.
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted
Differential Revision: http://reviews.llvm.org/D11045
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241780
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: yaron.keren, rafael, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11042
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241779
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11040
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: yaron.keren, rafael, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11038
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241777
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11037
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D11028
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
DataLayout is no longer optional. It was initialized with or without
a DataLayout, and the DataLayout when supplied could have been the
one from the TargetMachine.
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11021
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241774
Summary:
Avoid using the TargetMachine owned DataLayout and use the Module owned
one instead. This requires passing the DataLayout up the stack to
ComputeValueVTs().
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, yaron.keren, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D11019
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241773
All the usual X86 target-specific conventions are collapsed to the
normal Win64 convention, but the custom conventions like GHC and webkit
should not be.
Previously we would assume that the caller allocated 32 bytes of shadow
space for us, which is not how webkit_jscc or other custom conventions
are supposed to work.
Based on a patch by peavo@outlook.com.
Fixes PR24051.
llvm-svn: 241725
The 32-bit lowering assumed that WinEHPrepare had this invariant.
WinEHPrepare did it for C++, but not SEH. The result was that we would
insert calls to llvm.x86.seh.restoreframe in normal basic blocks, which
corrupted the frame pointer.
llvm-svn: 241699
- Implement copying ASR to/from GPR regs.
- Mark ASRs as non-allocatable, so it won't try to arbitrarily use
them inappropriately.
- Instead of inserting explicit WRASR/RDASR nodes in the MUL/DIV
routines, just do normal register copies.
- Also...mark div as using Y, not just writing it.
Added a test case with some code which previously died with an
assertion failure (with -O0), or produced wrong code (otherwise).
(Third time's the charm?)
Differential Revision: http://reviews.llvm.org/D10401
llvm-svn: 241686
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11017
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241655
The incoming EBP value points to the end of a local stack allocation, so
we can use that to restore ESI, the base pointer. Once we do that, we
can use local stack allocations. If we know we need stack realignment,
spill the original frame pointer in the prologue and reload it after
restoring ESI.
llvm-svn: 241648
Tim Northover has told me that they can occur when the compiler cleverly
constructs constants - as demonstrated in the test case.
rdar://21703486
llvm-svn: 241641
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.
These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.
Naming suggestions at this point are welcome, I'm happy to re-run sed.
Reviewers: majnemer, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11011
llvm-svn: 241633
This type of prologue isn't supported yet. Implementing it should be a
matter of copying the adjusted incoming EBP into ESI (the base pointer)
instead of EBP. The original EBP can be saved and restored from other
memory afterwards.
llvm-svn: 241597
This includes code that is intended to be target-independent as well
as the Hexagon-specific details. This is just the framework without
any users.
llvm-svn: 241595
be emitted.
This is needed to enable ARM long calls for LTO and enable and disable it on a
per-function basis.
Out-of-tree projects currently using EnableARMLongCalls to emit long calls
should start passing "+long-calls" to the feature string (see the changes made
to clang in r241565).
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D9364
llvm-svn: 241566
The vperm2f128/vperm2i128 shuffle mask decoding was not attempting to deal with shuffles that give zero lanes. This patch fixes this so that the assembly printer can provide shuffle comments.
As this decoder is also used in X86ISelLowering for shuffle combining, I've added an early-out to match existing behaviour. The hope is that we can add zero support in the future, this would allow other ops' decodes (e.g. insertps) to be combined as well.
Differential Revision: http://reviews.llvm.org/D10593
llvm-svn: 241516
Extend the reassociation optimization of http://reviews.llvm.org/rL240361 (D10460)
to SSE scalar FP SP adds in addition to AVX scalar FP SP adds.
With the 'switch' in place, we can trivially add other opcodes and test cases in
future patches.
Differential Revision: http://reviews.llvm.org/D10975
llvm-svn: 241515
This patch adds vectorization support for uniform constant i64 arithmetic shift right operators.
Differential Revision: http://reviews.llvm.org/D9645
llvm-svn: 241514
This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.
As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.
From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.
Differential Revision: http://reviews.llvm.org/D10146
llvm-svn: 241508
With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations.
This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code.
Differential Revision: http://reviews.llvm.org/D10947
llvm-svn: 241506
This commit adds a 'run-pass' option to llc, which instructs the compiler to run
one specific code generation pass only.
Llc already has the 'start-after' and the 'stop-after' options, and this new
option complements the other two by making it easier to write tests that want
to invoke a single pass only.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10776
llvm-svn: 241476
Running this after the scheduler enables scheduling
waits later so other ALU instructions can run while
this would be waiting.
When combined with enabling the post-RA scheduler, this
gives about a ~20% improvement on sgemm.
llvm-svn: 241473
Summary:
This concludes the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
At this point, the StringRef-form of GNU Triples should only be used in the
public API (including IR serialization) and a couple objects that directly
interact with the API (most notably the Module class). The next step is to
replace these Triple objects with the TargetTuple object that will represent
our authoratative/unambiguous internal equivalent to GNU Triples.
Reviewers: rengolin
Subscribers: llvm-commits, jholewinski, ted, rengolin
Differential Revision: http://reviews.llvm.org/D10962
llvm-svn: 241472
We don't have a good way to detect most situations where
DS offsets are usable on SI, so add an option to force using
them even if unsafe for debugging performance problems.
llvm-svn: 241462
These are mostly from the chart in the SparcV8 spec, section "A.3
Synthetic Instructions".
Differential Revision: http://reviews.llvm.org/D9834
llvm-svn: 241461
The code in AArch64A57FPLoadBalancing::scavengeRegister() to handle dead defs
was not correctly handling aliased registers. E.g. if the dead def was of D2,
then S2 was not being marked as unavailable, so it could potentially be used
across a live-range in which it would be clobbered.
Patch by Geoff Berry <gberry@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D10900
llvm-svn: 241449
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.
Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.
Differential Revision: http://reviews.llvm.org/D10941
llvm-svn: 241413
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.
llvm-svn: 241411
Correctly support assembling "pushw $imm8" on x86-64 targets.
Also some cleanup of the PUSH instructions (PUSH64i16 and PUSHi16 actually
represent the same instruction)
This fixes PR23996
Patch by: david.l.kreitzer@intel.com
Differential Revision: http://reviews.llvm.org/D10878
llvm-svn: 241404
Followup to D10433 and D10589 that fixes i8/i16 uint2fp vector conversions by zero extending to i32 and using the sint2fp path (unless the target does actually support uint2fp).
llvm-svn: 241394
Add support for v2i8/v2i16 to v2f64 by using a sign extension to v2i32 before conversion to v2f64.
Differential Revision: http://reviews.llvm.org/D10589
llvm-svn: 241325
This patch adds support for sign extension for sub 128-bit vectors, such as to v2i32. It concatenates with UNDEF subvectors up to 128-bits, performs the sign extension (i.e. as v4i32) and then extracts the target subvector.
Patch 1/2 of D10589 - the second patch covers the conversion of v2i8/v2i16 to v2f64.
llvm-svn: 241323
This function can really fail since the string table offset can be out of
bounds.
Using ErrorOr makes sure the error is checked.
Hopefully a lot of the boilerplate code in tools/* can go away once we have
a diagnostic manager in Object.
llvm-svn: 241297
In r241285, I removed the SUBREG_TO_REG restriction from VSX swap
removal, determining that this was overly conservative. We have
another form of the same restriction in that we check for the presence
of implicit subregs in vector operations. As with SUBREG_TO_REG for
partial register conversions, an implicit subreg is safe in and of
itself, provided no other operation makes a lane-sensitive assumption
about the result. This patch removes that restriction, by removing
the HasImplicitSubreg flag and all code that relies on it.
I've added a test case that fails to optimize before this patch is
applied, and optimizes properly with the patch. Test based on a
report from Anton Blanchard.
llvm-svn: 241290
With a previous patch, the VSX swap optimization is able to recognize
the doubleword load-splat idiom that can be implemented using lxvdsx.
However, that does not cover a doubleword splat where the source is a
register. We can implement this using xxspltd (a special form of
xxpermdi). This patch teaches the swap optimization pass about this
idiom.
As a prerequisite, it also permits swap optimization to succeed for
all forms of SUBREG_TO_REG. Previously we were conservative and only
allowed SUBREG_TO_REG when it copied a full register. However, on
reflection any form of SUBREG_TO_REG is safe in and of itself, so long
as an unsafe operation is not performed on its result. In particular,
a widening SUBREG_TO_REG often occurs as an input to a doubleword
splat idiom, particularly in auto-vectorized code.
The doubleword splat idiom is an XXPERMDI operation where both source
registers are identical, and the selection mask is either 0 (splat the
first element) or 3 (splat the second element). To determine whether
the registers are identical, we use the existing mechanism for looking
through "copy-like" operations. That mechanism has a side effect of
marking the XXPERMDI operation as using a physical register, which
would invalidate its presence in a swap-optimized region. This is
correct for the form of XXPERMDI that performs a swap and hence would
be removed, but is not what we want for a doubleword-splat variety of
XXPERMDI. Therefore we reset the physical-register flag on the
XXPERMDI when it represents a splat.
A simple test case is added to verify that we generate the splat and
that we also remove the xxswapd instructions that would otherwise be
associated with the load and store of another operand.
llvm-svn: 241285
This checks subtarget feature compatibility for inlining by verifying
that the callee is a strict subset of the caller's features. This includes
the cpu as part of the subtarget we can get via the incoming functions as
the backend takes CPUs as feature sets.
This allows us to inline things like:
int foo() { return baz(); }
int __attribute__((target("sse4.2"))) bar() {
return foo();
}
so that generic code can be inlined into specialized functions.
llvm-svn: 241221
Summary:
* Add 64-bit address space feature.
* Rename SIMD feature to SIMD128.
* Handle single-thread model with an IR pass (same way ARM does).
* Rename generic processor to MVP, to follow design's lead.
* Add bleeding-edge processors, with all features included.
* Fix a few DEBUG_TYPE to match other backends.
Test Plan: ninja check
Reviewers: sunfish
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D10880
llvm-svn: 241211
Summary:
According to PTX ISA:
For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.
So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.
As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]
This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.
Patched by Gang Hu.
Test Plan: Test case attached.
Reviewers: jingyue
Reviewed By: jingyue
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D10876
llvm-svn: 241191
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.
Patched by Xuetian Weng.
Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10853
llvm-svn: 241185
When adding little-endian vector support for PowerPC last year, I
inadvertently disabled an optimization that recognizes a load-splat
idiom and generates the lxvdsx instruction. This patch moves the
offending logic so lxvdsx is once again generated.
This pattern is frequently generated by the vectorizer for scalar
loads of an effective constant. Previously the lxvdsx instruction was
wrongly listed as lane-sensitive for the VSX swap optimization (since
both doublewords are identical, swaps are safe). This patch fixes
this as well, so that vectorized code using lxvdsx can now have swaps
removed from the computation.
There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll
that checks for the missing optimization. However, vsx.ll was only
being tested for POWER7 with big-endian code generation. I've added
a little-endian RUN statement and expected LE code generation for all
the tests in vsx.ll to give us a bit better VSX coverage, including
what's needed for this patch.
llvm-svn: 241183
The EH code might have been deleted as unreachable and the personality
pruned while the filter is still present. Currently I'm hitting this at
-O0 due to the clang bug PR24009.
llvm-svn: 241170
This patch teaches the AsmParser to accept add/adds/sub/subs/cmp/cmn
with a negative immediate operand and convert them as shown:
add Rd, Rn, -imm -> sub Rd, Rn, imm
sub Rd, Rn, -imm -> add Rd, Rn, imm
adds Rd, Rn, -imm -> subs Rd, Rn, imm
subs Rd, Rn, -imm -> adds Rd, Rn, imm
cmp Rn, -imm -> cmn Rn, imm
cmn Rn, -imm -> cmp Rn, imm
Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers (gas). They are documented in the "ARMv8 Instruction Set
Overview", in the "Arithmetic (immediate)" section. This makes llvm-mc
a programmer-friendly assembler !
This also fixes PR20978: "Assembly handling of adding negative numbers
not as smart as gas".
llvm-svn: 241166
Move some instructions into order of sections in the spec, as the rest
already were.
Differential Revision: http://reviews.llvm.org/D9102
llvm-svn: 241163
Only consider an instruction a candidate for relaxation if the last operand of the
instruction is an expression. We previously checked whether any operand is an expression,
which is useless, since for all instructions concerned, the only operand that may be
affected by relaxation is the last one.
In addition, this removes the check for having RIP as an argument, since it was
plain wrong - even when one of the arguments is RIP, relaxation may still be needed.
This fixes PR9807.
Patch by: david.l.kreitzer@intel.com
Differential Revision: http://reviews.llvm.org/D10766
llvm-svn: 241152
The incoming EBP value established by the runtime is actually a pointer
to the end of the EH registration object, and not the true parent
function frame pointer. Clang doesn't need llvm.x86.seh.exceptioninfo
anymore because we know that the exception info pointer is at a fixed
offset from this incoming EBP.
The llvm.x86.seh.recoverfp intrinsic takes an EBP value provided by the
EH runtime and returns a pointer that is usable with llvm.framerecover.
The llvm.x86.seh.restoreframe intrinsic is inserted by the 32-bit
specific preparation pass in blocks targetted by the EH runtime. It
re-establishes any physical registers used by the parent function to
address the stack, such as the frame, base, and stack pointers.
Neither of these intrinsics correctly handle stack realignment prologues
yet, but it's possible to add that later.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D10848
llvm-svn: 241125
Summary:
Really check if %SP is not used in other places, instead of checking only exact
one non-dbg use.
Patched by Xuetian Weng.
Test Plan:
@foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that
SP will appear twice.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: llvm-commits, sfantao, jholewinski
Differential Revision: http://reviews.llvm.org/D10844
llvm-svn: 241099
Duplicating an FP register "as itself" is a bad idea, since it violates the
invariant that every FP register is mapped to at most one FPU stack slot.
Use the scratch FP register instead.
This fixes PR23957.
llvm-svn: 241069
These directives are used to set the default value of the SoftFloat feature.
They have the same effect as setting -m{soft, hard}-float from the command line.
Differential Revision: http://reviews.llvm.org/D9073
llvm-svn: 241066
represented by uint64_t, this patch replaces these
usages with the FeatureBitset (std::bitset) type.
Differential Revision: http://reviews.llvm.org/D10542
llvm-svn: 241058
Realistically, this will be returning ErrorOr for some time as refactoring the
user code to check once per section will take some time.
Given that, use it for checking if a relocation has addend or not.
While at it, add ELFRelocationRef to simplify the users.
llvm-svn: 241028
This change unifies how LTOModule and the backend obtain linker flags
for globals: via a new TargetLoweringObjectFile member function named
emitLinkerFlagsForGlobal. A new function LTOModule::getLinkerOpts() returns
the list of linker flags as a single concatenated string.
This change affects the C libLTO API: the function lto_module_get_*deplibs now
exposes an empty list, and lto_module_get_*linkeropts exposes a single element
which combines the contents of all observed flags. libLTO should never have
tried to parse the linker flags; it is the linker's job to do so. Because
linkers will need to be able to parse flags in regular object files, it
makes little sense for libLTO to have a redundant mechanism for doing so.
The new API is compatible with the old one. It is valid for a user to specify
multiple linker flags in a single pragma directive like this:
#pragma comment(linker, "/defaultlib:foo /defaultlib:bar")
The previous implementation would not have exposed
either flag via lto_module_get_*deplibs (as the test in
TargetLoweringObjectFileCOFF::getDepLibFromLinkerOpt was case sensitive)
and would have exposed "/defaultlib:foo /defaultlib:bar" as a single flag via
lto_module_get_*linkeropts. This may have been a bug in the implementation,
but it does give us a chance to fix the interface.
Differential Revision: http://reviews.llvm.org/D10548
llvm-svn: 241010
When the store sequence being combined actually stores the base register, we
should not mark it as killed until the end.
rdar://21504262
llvm-svn: 241003
This is a new version of http://reviews.llvm.org/D10260.
It turned out that when you specify an integer register in inline asm on
x86 you get the register of the required type size back. That means that
X86TargetLowering::getRegForInlineAsmConstraint() has to accept any of
the integer registers and adapt its size to the given target size which
may be any 8/16/32/64 bit sized type. Surprisingly that means given a
constraint of "{ax}" and a type of MVT::F32 we need to return X86::EAX.
This change makes this face explicit, the previous code seemed like
working by accident because there it never returned an error once a
register was found. On the other hand this rewrite allows to actually
return errors for invalid situations like requesting an integer register
for an i128 type.
Related to rdar://21042280
Differential Revision: http://reviews.llvm.org/D10813
llvm-svn: 241002
Some of the the permissible ARM -mfpu options, which are supported in GCC,
are currently not present in llvm/clang.This patch adds the options:
'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16.
These are related to half-precision floating-point and single precision.
Reviewers: rengolin, ranjeet.singh
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10645
llvm-svn: 240930
Summary:
Previously it (incorrectly) used GPR's.
Patch by Simon Dardis. A couple small corrections by myself.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10567
llvm-svn: 240883
Summary:
Some front ends make kernel pointers global already. In that case,
handlePointerParams does nothing.
Test Plan: more tests in lower-kernel-ptr-arg.ll
Reviewers: grosser
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10779
llvm-svn: 240849
Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime.
Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D10777
llvm-svn: 240841
Summary:
This way the function symbol points to the start of amd_kernel_code_t
rather than the start of the function.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10705
llvm-svn: 240829
If pseudoToMCOpcode failed, we would return the original opcode, so operands
would be swapped, but the instruction would remain the same.
It resulted in LSHLREV a, b ---> LSHLREV b, a.
This fixes Glamor text rendering and
piglit/arb_sample_shading-builtin-gl-sample-mask on VI.
This is a candidate for stable branches.
v2: the test was simplified by Tom Stellard
llvm-svn: 240824
This patch corresponds to review:
http://reviews.llvm.org/D10638
This is the back end portion of patch
http://reviews.llvm.org/D10637
It just adds the code gen and intrinsic functions necessary to support that patch to the back end.
llvm-svn: 240820
SDNode already had ops() which would iterate over the operands and return
SDUse*. This version instead gets the SDValue's out of the SDUse's so that
we can use foreach in more places.
Reviewed by David Blaikie.
llvm-svn: 240805
This patch fixes the error in ARM.td which stated that Cortex-R5
floating point unit can do only single precision, when it can do double as well.
Reviewers: rengolin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10769
llvm-svn: 240799
Summary:
This only adds support for ULW of an immediate address with/without a source register.
It does not include support for ULW of the address of a symbol.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9663
llvm-svn: 240782
Cortex-R4F TRM states that fpu supports both single and double precision.
This patch corrects the information in ARM.td file and corresponding test.
Reviewers: rengolin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10763
llvm-svn: 240776
This patch adds support for the vector merge even word and vector merge odd word
instructions introduced in POWER8.
Phabricator review: http://reviews.llvm.org/D10704
llvm-svn: 240650
Summary:
Simplify emitDirectiveModuleFP() by having it just print the current information
from MipsABIFlagsSection and doing an updateABIInfo() before such calls.
This prevents us from forgetting to update the STI.FeatureBits,
because updateABIInfo() uses those to update the MipsABIFlagsSection object,
and also makes sure we use the update mechanism from MipsABIFlagsSection.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, mpf
Differential Revision: http://reviews.llvm.org/D10642
llvm-svn: 240637
As pointed out by Justin Bogner (see r240520), SystemZDAGToDAGISel::Select
currently attempts to convert boolean operations into RxSBG even on some
non-integer types (in particular, vector types). This would not work in
any case, and it happened to trigger undefined behaviour in allOnes.
This patch verifies that we have a (<= 64-bit) integer type before
attempting to perform this optimization.
llvm-svn: 240634
Summary:
We can simplify emitDirectiveModuleOddSPReg() by having it print the current OddSPReg information
from MipsABIFlagsSection and doing an updateABIInfo() before such calls.
This prevents us from forgetting to update the STI.FeatureBits, because updateABIInfo() uses those to update the MipsABIFlagsSection object,
and also makes sure we use the update mechanism from MipsABIFlagsSection.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, mpf
Differential Revision: http://reviews.llvm.org/D10641
llvm-svn: 240630
Summary:
In an expression such as "(((a+b)+c)+d)", parseParenExpression() would only parse the "a+b)+c", which would result in an error later on in the parser.
This means that we can only parse one level of inner parentheses.
In order to fix this, I added a new function called parseParenExprOfDepth(), which parses a specified number of trailing parenthesis expressions
(except for the outermost parenthesis), and changed MipsAsmParser to use it in parseMemOffset instead of parseParenExpression().
Reviewers: dsanders, rafael
Reviewed By: dsanders, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9742
llvm-svn: 240625
We don't always have FMA, for example when using 'clang -mavx512f'
without an explicit CPU.
Also check for an explicit +avx512f instead of CPUs in a couple
related tests.
llvm-svn: 240616
Summary
This change turns on the emission of
__LLVM_Stackmaps section when generating COFF binaries.
Test Plan
Added a scenario to the test case:
test\CodeGen\X86\statepoint-stackmap-format.ll.
Code Review:
http://reviews.llvm.org/D10680
llvm-svn: 240613
- Deciding that insn->sibIndex is SIB_INDEX_NONE does not require another
check beyond the fully decoded bits being equal to 0x4.
The expression insn->sibIndex == SIB_INDEX_sib could not have been true unless
index were 0x4, because SIB_INDEX_sib is merely the range base (SIB_INDEX_EAX)
plus 4. Respectively SIB_INDEX_sib64.
- Don't use a switch statement to perform left-shift.
Differential Revision: http://reviews.llvm.org/D9762
llvm-svn: 240598
Summary:
This patch first change the register that holds local address for stack
frame to %SPL. Then the new NVPTXPeephole pass will try to scan the
following pattern
%vreg0<def> = LEA_ADDRi64 <fi#0>, 4
%vreg1<def> = cvta_to_local %vreg0
and transform it into
%vreg1<def> = LEA_ADDRi64 %VRFrameLocal, 4
Patched by Xuetian Weng
Test Plan: test/CodeGen/NVPTX/local-stack-frame.ll
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: eliben, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10549
llvm-svn: 240587
COFF and MachO only define symbol sizes for common symbols. Reflect that
in the class hierarchy by having a method for common symbols only in the base
and a general one in ELF.
This avoids the need of using a magic value for the size, which had a few
problems
* Most callers didn't check for it.
* The ones that did could not tell the magic value from a file actually having
that value.
llvm-svn: 240529
This stops shifting a 32-bit value by such absurd amounts as 96 and
120. We do this by dropping a call to the function that was doing this
entirely, which rather surprisingly doesn't break *any* tests.
I've also added an assert in the misbehaving function to prove that
it's no longer being called with completely invalid arguments.
This change looks pretty bogus and we should probably be reverting
r238692 instead, but this is hard to do with the number of follow ups
that have happened since. It can't be any worse than the undefined
behaviour that was happening before though.
llvm-svn: 240526
This allOnes function hits undefined behaviour if Count is greater
than 64, but we can avoid that and simplify the calculation by just
saturating if such a value is passed in.
This comes up under ubsan becauseRxSBGOperands is sometimes created
with values that are 128 bits wide. Somebody more familiar with this
code should probably look into whether that's expected, as a 64 bit
mask may or may not be appropriate for such types.
llvm-svn: 240520
We used to erroneously match:
(v4i64 shuffle (v2i64 load), <0,0,0,0>)
Whereas vbroadcasti128 is more like:
(v4i64 shuffle (v2i64 load), <0,1,0,1>)
This problem doesn't exist for vbroadcastf128, which kept matching
the intrinsic after r231182. We should perhaps re-introduce the
intrinsic here as well, but that's a separate issue still being
discussed.
While there, add some proper vbroadcastf128 tests. We don't currently
match those, like for loading vbroadcastsd/ss on AVX (the reg-reg
broadcasts where added in AVX2).
Fixes PR23886.
llvm-svn: 240488
When UpdateBaseRegUses sees an instruction that defines the base
register it must stop, as the base register value it is updating is no
longer live. Ideally we would already have seen the register be killed
(which is already checked for), but the kill flags may be inaccurate
and we have to account for this.
Differential Revision: http://reviews.llvm.org/D10566
llvm-svn: 240424
Summary:
This only adds support for ULHU of an immediate address with/without a source register.
It does not include support for ULHU of the address of a symbol.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9671
llvm-svn: 240410
Summary: This isn't used right now, but it will be in some upcoming changes.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10568
llvm-svn: 240407
So far, LLVM has not emitted correct addend for N64 and N32 ABI. This patch
fixes that. It also removes fixup from MCJIT for R_MIPS_PC16 relocation.
Patch by Vladimir Radosavljevic.
Differential Revision: http://reviews.llvm.org/D10565
llvm-svn: 240404
Summary: For the sake of consistency and to make some upcoming changes a little less noisy.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10639
llvm-svn: 240398
The summary is that it moves the mangling earlier and replaces a few
calls to .addExternalSymbol with addSym.
I originally wanted to replace all the uses of addExternalSymbol with
addSym, but noticed it was a lot of work and doesn't need to be done
all at once.
llvm-svn: 240395
Currently ( D10321, http://reviews.llvm.org/rL239486 ), we can use the machine combiner pass
to reassociate the following sequence to reduce the critical path:
A = ? op ?
B = A op X
C = B op Y
-->
A = ? op ?
B = X op Y
C = A op B
'op' is currently limited to x86 AVX scalar FP adds (with fast-math on), but in theory, it could
be any associative math/logic op (see TODO in code comment).
This patch generalizes the pattern match to ignore the instruction that defines 'A'. So instead of
a sequence of 3 adds, we now only need to find 2 dependent adds and decide if it's worth
reassociating them.
This generalization has a compile-time cost because we can now match more instruction sequences
and we rely more heavily on the machine combiner to discard sequences where reassociation doesn't
improve the critical path.
For example, in the new test case:
A = M div N
B = A add X
C = B add Y
We'll match 2 reassociation patterns, but this transform doesn't reduce the critical path:
A = M div N
B = A add Y
C = B add X
We need the combiner to reject that pattern but select this:
A = M div N
B = X add Y
C = B add A
Differential Revision: http://reviews.llvm.org/D10460
llvm-svn: 240361
The _Int instructions are special, in that they operate on the full
VR128 instead of FR32. The load folding then looks at MOVSS, at the
user, and bails out when it sees a size mismatch.
What we really know is that the rm_Int instructions don't load the
higher lanes, so folding is fine.
This happens for the straightforward intrinsic code, e.g.:
_mm_add_ss(a, _mm_load_ss(p));
Fixes PR23349.
Differential Revision: http://reviews.llvm.org/D10554
llvm-svn: 240326
According to the documentation, .thumb_set is 'the equivalent of a .set directive'.
We didn't have equivalent behaviour in terms of all the errors we could throw, for
example, when a symbol is redefined.
This change refactors parseAssignment so that it can be used by .set and .thumb_set
and implements tests for .thumb_set for all the errors thrown by that method.
Reviewed by Rafael Espíndola.
llvm-svn: 240318
D8982 ( checked in at http://reviews.llvm.org/rL239001 ) added command-line
options to allow reciprocal estimate instructions to be used in place of
divisions and square roots.
This patch changes the default settings for x86 targets to allow that recip
codegen (except for scalar division because that breaks too much code) when
using -ffast-math or its equivalent.
This matches GCC behavior for this kind of codegen.
Differential Revision: http://reviews.llvm.org/D10396
llvm-svn: 240310
Before this we were producing a TargetExternalSymbol from a MCSymbol.
That meant extracting the symbol name and fetching the symbol again
down the pipeline.
This patch adds a DAG.getMCSymbol that lets the MCSymbol pass unchanged on the
DAG.
Doing so removes the need for MO_NOPREFIX and fixes the root cause of pr23900,
allowing r240130 to be committed again.
llvm-svn: 240300
Summary: In this case, we're supposed to load the immediate in AT and then ADDu it with the source register and put it in the destination register.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9367
llvm-svn: 240278
Summary:
In this case, we're supposed to load the address of the symbol in AT and then ADDu it with the source register and
put it in the destination register.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9366
llvm-svn: 240273
This allows more call sequences to use pushes instead of movs when optimizing for size.
In particular, calling conventions that pass some parameters in registers (e.g. thiscall) are now supported.
Differential Revision: http://reviews.llvm.org/D10500
llvm-svn: 240257
This patch changes getRelocationAddend to use ErrorOr and considers it an error
to try to get the addend of a REL section.
If, for example, a x86_64 file has a REL section, that file is corrupted and
we should reject it.
Using ErrorOr is not ideal since we check the section type once per relocation
instead of once per section.
Checking once per section would involve getRelocationAddend just asserting and
callers checking the section before iterating over the relocations.
In any case, this is an improvement and includes a test.
llvm-svn: 240176
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
Currently, we canonicalize shuffles that produce a result larger than
their operands with:
shuffle(concat(v1, undef), concat(v2, undef))
->
shuffle(concat(v1, v2), undef)
because we can access quad vectors (see PerformVECTOR_SHUFFLECombine).
This is useful in the general case, but there are special cases where
native shuffles produce larger results: the two-result ops.
We can look through the concat when lowering them:
shuffle(concat(v1, v2), undef)
->
concat(VZIP(v1, v2):0, :1)
This lets us generate the native shuffles instead of scalarizing to
dozens of VMOVs.
Differential Revision: http://reviews.llvm.org/D10424
llvm-svn: 240118
Deduplicates some code and lets us use LEA on atom when adjusting the
stack around callee-cleanup calls. This is the only intended
functionality change.
llvm-svn: 240044
They had been getting emitted as a section + offset reference, which
is bogus since the value needs to be the offset within the GOT, not
the actual address of the symbol's object.
Differential Revision: http://reviews.llvm.org/D10441
llvm-svn: 240020
Added explicit sign extension for v4i16/v8i16 to v4i32/v8i32 before conversion to floats. Matches existing support for v4i8/v8i8.
Follow up to D10433
llvm-svn: 239966
Summary:
This is done by first adding two additional instructions to convert the
alloca returned address to local and convert it back to generic. Then
replace all uses of alloca instruction with the converted generic
address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine
the generic addresscast and the corresponding Load, Store, Bitcast, GEP
Instruction together.
Patched by Xuetian Weng (xweng@google.com).
Test Plan: test/CodeGen/NVPTX/lower-alloca.ll
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: meheff, broune, eliben, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10483
llvm-svn: 239964
There is a one-to-one relationship between X86Subtarget and
X86FrameLowering, but every frame lowering method would previously pull
the subtarget off the MachineFunction and query some subtarget
properties.
Over time, these locals began to grow in complexity and it became
important to keep their names and meaning in sync across all of the
frame lowering methods, leading to duplication. We can eliminate that
duplication by computing them once in the constructor.
llvm-svn: 239948
The personality routine currently lives in the LandingPadInst.
This isn't desirable because:
- All LandingPadInsts in the same function must have the same
personality routine. This means that each LandingPadInst beyond the
first has an operand which produces no additional information.
- There is ongoing work to introduce EH IR constructs other than
LandingPadInst. Moving the personality routine off of any one
particular Instruction and onto the parent function seems a lot better
than have N different places a personality function can sneak onto an
exceptional function.
Differential Revision: http://reviews.llvm.org/D10429
llvm-svn: 239940
Summary:
This does not include support for the immediate variants of these pseudo-instructions.
Fixes llvm.org/PR20968.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D8537
llvm-svn: 239905
Summary:
Call MCSymbolRefExpr::create() with a MCSymbol* argument, not with a StringRef
of the Symbol's name, in order to avoid creating invalid temporary symbols for
relative labels (e.g. {$,.L}tmp00, {$,.L}tmp10 etc.).
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10498
llvm-svn: 239901
Summary:
Previously, MCSymbolRefExpr::create() was called with a StringRef of the symbol
name, which it would then search for in the Symbols StringMap (from MCContext).
However, relative labels (which are temporary symbols) are apparently not stored
in the Symbols StringMap, so we end up creating a new {$,.L}tmp symbol
({$,.L}tmp00, {$,.L}tmp10 etc.) each time we create an MCSymbolRefExpr by
passing in the symbol name as a StringRef.
Fortunately, there is a version of MCSymbolRefExpr::create() which takes an
MCSymbol* and we already have an MCSymbol* at that point, so we can just pass
that in instead of the StringRef.
I also removed the local StringRef calls to MCSymbolRefExpr::create() from
expandMemInst(), as those cases can be handled by evaluateRelocExpr() anyway.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9938
llvm-svn: 239897
Change builtin function name and signature ( add third parameter - rounding mode ).
Added tests for intrinsics.
Differential Revision: http://reviews.llvm.org/D10473
llvm-svn: 239888
that it is its own entity in the form of MemoryLocation, and update all
the callers.
This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.
llvm-svn: 239885
The patch triggers a miscompile on SPEC 2006 403.gcc with the (ref)
200.i and scilab.i inputs. I opened PR23866 to track analysis of this.
This reverts commit r238793.
llvm-svn: 239880
This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions.
Differential Revision: http://reviews.llvm.org/D10433
llvm-svn: 239855
Old names, new names, and what they really mean:
- IsWin64 -> IsWin64CC: This is true on non-Windows x86_64 platforms
when the ms_abi calling convention is used.
- IsWinEH -> IsWin64Prologue: True when the target is Win64, regardless
of calling convention. Changes the prologue to obey the constraints of
the Win64 unwinder.
- NeedsWinEH -> NeedsWinCFI: We're using the win64 prologue *and* the we
want .xdata unwind tables. Analogous to NeedsDwarfCFI.
NFC
llvm-svn: 239836
The mftb instruction was incorrectly marked as deprecated in the PPC
Backend. Instead, it should not be treated as deprecated, but rather be
implemented using the mfspr instruction. A similar patch was put into GCC last
year. Details can be found at:
https://sourceware.org/ml/binutils/2014-11/msg00383.html.
This change will replace instances of the mftb instruction with the mfspr
instruction for all CPUs except 601 and pwr3. This will also be the default
behaviour.
Additional details can be found in:
https://llvm.org/bugs/show_bug.cgi?id=23680
Phabricator review: http://reviews.llvm.org/D10419
llvm-svn: 239827
Summary:
Relocs that can be converted from absolute to PC-relative now do so if IsPCRel
is true. Relocs that require PC-relative now call llvm_unreachable() if IsPCRel
is false and similarly those that require absolute assert that IsPCRel is false.
Note that while it looks like some relocs (e.g. R_MIPS_26) can be converted into
the MIPS32r6/MIPS64r6 relocs (R_MIPS_PC*_S2), it isn't actually valid to do so.
Placeholders have been left in the testcase for unsupported relocs and relocs
that cannot be generated at the moment.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits, rafael
Differential Revision: http://reviews.llvm.org/D10184
llvm-svn: 239817
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10381
llvm-svn: 239815
Summary:
This affects other tools so the previous C++ API has been retained as a
deprecated function for the moment. Clang has been updated with a trivial
patch (not covered by the pre-commit review) to avoid breaking -Werror builds.
Other in-tree tools will be fixed with similar patches.
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
The first time this was committed it accidentally fixed an inconsistency in
triples in llvm-mc and this caused a failure. This inconsistency was fixed in
r239808.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10366
llvm-svn: 239812
When we multiply two 64-bit vectors, we extract lower and upper part and use the PMULUDQ instruction.
When one of the operands is a constant, the upper part may be zero, we know this at compile time.
Example: %a = mul <4 x i64> %b, <4 x i64> < i64 5, i64 5, i64 5, i64 5>.
I'm checking the value of the upper part and prevent redundant "multiply", "shift" and "add" operations.
llvm-svn: 239802
These are really immediate DUPs, and suffer from the same problem
with long instructions with a high/2 variant (e.g. smull).
By extending a MOVI (or DUP, before this patch), we can avoid an ext
on the other operand of the long instruction, e.g. turning:
ext.16b v0, v0, v0, #8
movi.4h v1, #0x53
smull.4s v0, v0, v1
into:
movi.8h v1, #0x53
smull2.4s v0, v0, v1
While there, add a now-necessary combine to fold (VT NVCAST (VT x)).
llvm-svn: 239799
This patch adds the safe stack instrumentation pass to LLVM, which separates
the program stack into a safe stack, which stores return addresses, register
spills, and local variables that are statically verified to be accessed
in a safe way, and the unsafe stack, which stores everything else. Such
separation makes it much harder for an attacker to corrupt objects on the
safe stack, including function pointers stored in spilled registers and
return addresses. You can find more information about the safe stack, as
well as other parts of or control-flow hijack protection technique in our
OSDI paper on code-pointer integrity (http://dslab.epfl.ch/pubs/cpi.pdf)
and our project website (http://levee.epfl.ch).
The overhead of our implementation of the safe stack is very close to zero
(0.01% on the Phoronix benchmarks). This is lower than the overhead of
stack cookies, which are supported by LLVM and are commonly used today,
yet the security guarantees of the safe stack are strictly stronger than
stack cookies. In some cases, the safe stack improves performance due to
better cache locality.
Our current implementation of the safe stack is stable and robust, we
used it to recompile multiple projects on Linux including Chromium, and
we also recompiled the entire FreeBSD user-space system and more than 100
packages. We ran unit tests on the FreeBSD system and many of the packages
and observed no errors caused by the safe stack. The safe stack is also fully
binary compatible with non-instrumented code and can be applied to parts of
a program selectively.
This patch is our implementation of the safe stack on top of LLVM. The
patches make the following changes:
- Add the safestack function attribute, similar to the ssp, sspstrong and
sspreq attributes.
- Add the SafeStack instrumentation pass that applies the safe stack to all
functions that have the safestack attribute. This pass moves all unsafe local
variables to the unsafe stack with a separate stack pointer, whereas all
safe variables remain on the regular stack that is managed by LLVM as usual.
- Invoke the pass as the last stage before code generation (at the same time
the existing cookie-based stack protector pass is invoked).
- Add unit tests for the safe stack.
Original patch by Volodymyr Kuznetsov and others at the Dependable Systems
Lab at EPFL; updates and upstreaming by myself.
Differential Revision: http://reviews.llvm.org/D6094
llvm-svn: 239761
This commit connects the machine function analysis pass (which creates machine
functions) to the MIR parser, which will initialize the machine functions
with the state from the MIR file and reconstruct the machine IR.
This commit introduces a new interface called 'MachineFunctionInitializer',
which can be used to provide custom initialization for the machine functions.
This commit also introduces a new diagnostic class called
'DiagnosticInfoMIRParser' which is used for MIR parsing errors.
This commit modifies the default diagnostic handling in LLVMContext - now the
the diagnostics are printed directly into llvm::errs() so that the MIR parsing
errors can be printed with colours.
Reviewers: Justin Bogner
Differential Revision: http://reviews.llvm.org/D9928
llvm-svn: 239753
Summary:
NFC: no one uses AnalyzeBranchPredicate yet.
Add TargetInstrInfo::AnalyzeBranchPredicate and implement for x86. A
later change adding support for page-fault based implicit null checks
depends on this.
Reviewers: reames, ab, atrick
Reviewed By: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10200
llvm-svn: 239742
Summary:
TargetInstrInfo::getLdStBaseRegImmOfs to
TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86. The
implementation only handles a few easy cases now and will be made more
sophisticated in the future.
This is NFCI: the only user of `getLdStBaseRegImmOfs` (now
`getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion`
is disabled for x86.
Reviewers: reames, ab, MatzeB, atrick
Reviewed By: MatzeB, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10199
llvm-svn: 239741
Summary:
This instruction encodes a loading operation that may fault, and a label
to branch to if the load page-faults. The locations of potentially
faulting loads and their "handler" destinations are recorded in a
FaultMap section, meant to be consumed by LLVM's clients.
Nothing generates FAULTING_LOAD_OP instructions yet, but they will be
used in a future change.
The documentation (FaultMaps.rst) needs improvement and I will update
this diff with a more expanded version shortly.
Depends on D10196
Reviewers: rnk, reames, AndyAyers, ab, atrick, pgavlin
Reviewed By: atrick, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10197
llvm-svn: 239740
LLVM targeting aarch64 doesn't correctly produce aligned accesses for non-aligned
data at -O0/fast-isel (-mno-unaligned-access).
The root cause seems to be in fast-isel not producing unaligned access correctly
for -mno-unaligned-access.
The patch just aborts fast-isel for loads and stores when -mno-unaligned-access is
present.
The regression test is updated to check this new test case (-mno-unaligned-access
together with fast-isel).
Differential Revision: http://reviews.llvm.org/D10360
llvm-svn: 239732
Summary:
This affects other tools so the previous C++ API has been retained as a
deprecated function for the moment. Clang has been updated with a trivial
patch (not covered by the pre-commit review) to avoid breaking -Werror builds.
Other in-tree tools will be fixed with similar trivial patches.
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10366
llvm-svn: 239721
Re-commit after adding "-aarch64-neon-syntax=generic" to fix the failure on OS X.
This patch was firstly committed in r239514, then reverted in r239544 because of a syntax incompatible failure on OS X.
llvm-svn: 239711
Now the library names in the Makefiles match the library names in
LLVMBuild.txt.
This should hopefully fix the remaining bot failures.
llvm-svn: 239661
r213101 changed the behaviour of this method to not only affect the
PostMachineScheduler scheduler but also the PostRAScheduler scheduler,
renaming should make this fact clear. Also document that the preferred
way is to specify this in the scheduling model instead of overriding
this method.
Differential Revision: http://reviews.llvm.org/D10427
llvm-svn: 239659
This will use Itinieraries if available, but will also work if just a
MCSchedModel is available.
Differential Revision: http://reviews.llvm.org/D10428
llvm-svn: 239658
- Add glc, slc, and tfe operands to flat instructions
- Add missing flat instructions
- Fix the encoding of flat_load_dwordx3 and flat_store_dwordx3.
llvm-svn: 239637
The alignment is not required, so we can just remove it for now.
The old code is a hack as it depends on the buffer management to find
the current column.
If the alignment is really desirable, the proper way to do it is
to pass in a formatted_raw_stream that knows the current column.
llvm-svn: 239603
We were putting them in the filter field, which is correct for 64-bit
but wrong for 32-bit.
Also switch the order of scope table entry emission so outermost entries
are emitted first, and fix an obvious state assignment bug.
llvm-svn: 239574
Remove the EFLAGS from the stackmap live-out mask. The EFLAGS register is not
supposed to be part of that set, because the X86 calling conventions mark the
register as NOT preserved.
Also remove the IP registers, since spilling and restoring those doesn't really
make any sense.
Related to rdar://problem/21019635.
llvm-svn: 239568
This intrinsic is like framerecover plus a load. It recovers the EH
registration stack allocation from the parent frame and loads the
exception information field out of it, giving back a pointer to an
EXCEPTION_POINTERS struct. It's designed for clang to use in SEH filter
expressions instead of accessing the EXCEPTION_POINTERS parameter that
is available on x64.
This required a minor change to MC to allow defining a label variable to
another absolute framerecover label variable.
llvm-svn: 239567
Summary:
For the moment, TargetMachine::getTargetTriple() still returns a StringRef.
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: ted, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10362
llvm-svn: 239554
Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions."
Revert "Fixing MSVC 2013 build error."
The test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X.
llvm-svn: 239544
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, jfb, rengolin
Differential Revision: http://reviews.llvm.org/D10361
llvm-svn: 239538
This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements:
1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node.
2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper).
3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial.
The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however.
Tested on SSE2, SSE41 and AVX machines.
Differential Revision: http://reviews.llvm.org/D9474
llvm-svn: 239509
This patch corresponds to review:
http://reviews.llvm.org/D10096
This is the back end portion of the patch related to D10095.
The patch adds the instructions and back end intrinsics for:
vbpermq
vgbbd
llvm-svn: 239505
This reverts commit r239437.
This broke clang-cl self-hosts. We'd end up calling the __imp_ symbol
directly instead of using it to do an indirect function call.
llvm-svn: 239502
It hasn't been used since r130964.
This also removes MachineModuleInfo::isUsedFunction and
MachineModuleInfo::AnalyzeModule, both of which were only
there to support UsedFunctions.
llvm-svn: 239501
This is a reimplementation of D9780 at the machine instruction level rather than the DAG.
Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a
starting point; see the TODO comments) to increase ILP when it's safe to do so.
The code is closely based on the existing MachineCombiner optimization that is implemented
for AArch64.
This patch should not cause the kind of spilling tragedy that led to the reversion of r236031.
Differential Revision: http://reviews.llvm.org/D10321
llvm-svn: 239486
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10311
llvm-svn: 239467
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10307
llvm-svn: 239465
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: echristo, rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10243
llvm-svn: 239464
Use a "safeseh" string attribute to do this. You would think we chould
just accumulate the set of personalities like we do on dwarf, but this
fails to account for the LSDA-loading thunks we use for
__CxxFrameHandler3. Each of those needs to make it into .sxdata as well.
The string attribute seemed like the most straightforward approach.
llvm-svn: 239448
Summary:
We used to assume V->RAUW only modifies the operand list of V's user.
However, if V and V's user are Constants, RAUW may replace and invalidate V's
user entirely.
This patch fixes the above issue by letting the caller replace the
operand instead of calling RAUW on Constants.
Test Plan: @nested_const_expr and @rauw in access-non-generic.ll
Reviewers: broune, jholewinski
Reviewed By: broune, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10345
llvm-svn: 239435
This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.
The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.
llvm-svn: 239433
Store instructions do not modify register values and therefore it's safe
to form a store pair even if the source register has been read in between
the two store instructions.
Previously, the read of w1 (see below) prevented the formation of a stp.
str w0, [x2]
ldr w8, [x2, #8]
add w0, w8, w1
str w1, [x2, #4]
ret
We now generate the following code.
stp w0, w1, [x2]
ldr w8, [x2, #8]
add w0, w8, w1
ret
All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass.
Performance results for SPEC2K were within noise.
llvm-svn: 239432
that was resetting it.
Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis.
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.
Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.
rdar://problem/13752163
Differential Revision: http://reviews.llvm.org/D10099
llvm-svn: 239427
array of bytes. The generation of this byte arrays was expecting
the host to be little endian, which prevents big endian hosts to be
used in the generation of the PTX code. This patch fixes the
problem by changing the way the bytes are extracted so that it
works for either little and big endian.
llvm-svn: 239412
Summary:
For some branches, GAS accepts an immediate instead of the 2nd register operand.
We only implement this for BNE and BEQ for now. Other branch instructions can be added later, if needed.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D9666
llvm-svn: 239396
Summary:
This cleans up most allocas NVPTXLowerKernelArgs emits for byval
parameters.
Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store.
Reviewers: eliben, jholewinski
Reviewed By: eliben, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10322
llvm-svn: 239368
Summary:
This was a longstanding FIXME and is a necessary precursor to cases
where foldOperandImpl may have to create more than one instruction
(e.g. to constrain a register class). This is the split out NFC changes from
D6262.
Reviewers: pete, ributzka, uweigand, mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, ted, llvm-commits
Differential Revision: http://reviews.llvm.org/D10174
llvm-svn: 239336
on a per-function basis.
Previously some of the passes were conditionally added to ARM's pass pipeline
based on the target machine's subtarget. This patch makes changes to add those
passes unconditionally and execute them conditonally based on the predicate
functor passed to the pass constructors. This enables running different sets of
passes for different functions in the module.
rdar://problem/20542263
Differential Revision: http://reviews.llvm.org/D8717
llvm-svn: 239325
While we have some code to transform specification like {ax} into
{eax}/{rax} if the operand type isn't 16bit, we should reject cases
where there is no sane way to do this, like the i128 type in the
example.
Related to rdar://21042280
Differential Revision: http://reviews.llvm.org/D10260
llvm-svn: 239309
This patch adds support for system register MMFR4_EL1 (memory model feature register) in the assembler.
This register provides information about the implemented memory model and memory management support.
llvm-svn: 239302
Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.
Differential Revision: http://reviews.llvm.org/D10310
llvm-svn: 239300
These are added mainly for the benefit of clang, but this also means that they
are now allowed in .fpu directives and we emit the correct .fpu directive when
single-precision-only is used.
Differential Revision: http://reviews.llvm.org/D10238
llvm-svn: 239151
Add getFPUFeatures to TargetParser, which gets the list of subtarget features
that are enabled/disabled for each FPU, and use it when handling the .fpu
directive.
No functional change in this commit, though clang will start behaving
differently once it starts using this.
Differential Revision: http://reviews.llvm.org/D10237
llvm-svn: 239150
Summary:
Only restoring AvailableFeatures is not enough and will lead to buggy behaviour.
For example, if we have a feature enabled and we ".set pop", the next time we try
to ".set" that feature nothing will happen because the "!(STI.getFeatureBits()[Feature])"
check will be false, because we didn't restore STI.FeatureBits.
In order to fix this, we need to make MipsAssemblerOptions remember the STI.FeatureBits
instead of the AvailableFeatures and then regenerate AvailableFeatures each time we ".set pop".
This is because, AFAIK, there is no way to convert from AvailableFeatures back to STI.FeatureBits,
but the reverse is possible by using ComputeAvailableFeatures(STI.FeatureBits).
I also moved the updating of AssemblerOptions inside the "if" statement in
setFeatureBits() and clearFeatureBits(), as there is no reason to update if
nothing changes.
Reviewers: dsanders, mkuper
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9156
llvm-svn: 239144
Summary:
A small bit that I missed when I updated the X86 backend to account for
the Win64 calling convention on non-Windows. Now we don't use dead
non-volatile registers when emitting a Win64 indirect tail call on
non-Windows.
Should fix PR23710.
Test Plan: Added test for the correct behavior based on the case I posted to PR23710.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10258
llvm-svn: 239111
Now that we can look at users, we can trivially do this: when we would
have otherwise disabled GlobalMerge (currently -O<3), we can just run
it for minsize functions, as it's usually a codesize win.
Differential Revision: http://reviews.llvm.org/D10054
llvm-svn: 239087
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.
rdar://21201804
llvm-svn: 239084
Summary:
With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a
pointer in the global address space. This change, along with
NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.*
and st.global.* for accessing kernel pointer arguments.
Minor changes:
1. refactor: extract function convertToPointerInAddrSpace
2. fix a bug in the test case in bug21465.ll
Test Plan: lower-kernel-ptr-arg.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: wengxt, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10154
llvm-svn: 239082
Summary:
-march=bpf -> host endian
-march=bpf_le -> little endian
-match=bpf_be -> big endian
Test Plan:
v1 was tested by IBM s390 guys and appears to be working there.
It bit rots too fast here.
Reviewers: chandlerc, tstellarAMD
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10177
llvm-svn: 239071
Now that we sometimes know the address space, this can
theoretically do a better job.
This needs better test coverage, but this mostly depends on
first updating the loop optimizatiosn to provide the address
space.
llvm-svn: 239053
Summary:
This is the first of several patches to eliminate StringRef forms of GNU
triples from the internals of LLVM. After this is complete, GNU triples
will be replaced by a more authoratitive representation in the form of
an LLVM TargetTuple.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: ted, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10236
llvm-svn: 239036
The fix is just that getOther had not been updated for packing the st_other
values in fewer bits and could return spurious values:
- unsigned Other = (getFlags() & (0x3f << ELF_STO_Shift)) >> ELF_STO_Shift;
+ unsigned Other = (getFlags() & (0x7 << ELF_STO_Shift)) >> ELF_STO_Shift;
Original message:
Pack the MCSymbolELF bit fields into MCSymbol's Flags.
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.
While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.
llvm-svn: 239012
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.
While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.
llvm-svn: 239006
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.
The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 239001
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.
Differential Revision: http://reviews.llvm.org/D9694
Recommiting after the revert in r238821, the buildbot still failed with
the patch removed so there seems to be another reason for the breakage.
llvm-svn: 238935
AVX-512: Implemented GETEXP instruction for KNL and SKX
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.
CR:
http://reviews.llvm.org/D9991
llvm-svn: 238923
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Of these, /U[qytnms]/ do not have backend tests but are accepted by clang.
No functional change intended.
Reviewers: t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D8203
llvm-svn: 238921
Intel® Memory Protection Extensions (Intel® MPX) is a new feature in Skylake.
It is a part of KNL and SKX sets. It is also a part of Skylake client.
I added definition of %bnd0 - %bnd3 registers, each register is a pair of 64-bit integers.
llvm-svn: 238916
This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....).
Since the refactoring of the shuffle lowering code this no longer has any use.
Differential Revision: http://reviews.llvm.org/D10169
llvm-svn: 238906
The first try (r238051) to land this was reverted due to bot failures
that were hopefully addressed by r238788.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238842
Summary:
With this change we are able to realign the stack dynamically, whenever it
contains objects with alignment requirements that are larger than the
alignment specified from the given ABI.
We have to use the $fp register as the frame pointer when we perform
dynamic stack realignment. In complex stack frames, with variably-sized
objects, we reserve additionally the callee-saved register $s7 as the
base pointer in order to reference locals.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8633
llvm-svn: 238829
This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot.
Since self-hosting issues with Clang are hard to investigate, I'm taking the
liberty to revert now, so we can investigate it offline.
llvm-svn: 238821
Summary: These directives are used to set the current value of the SoftFloat feature.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, mpf
Differential Revision: http://reviews.llvm.org/D9074
llvm-svn: 238813
This create a MCSymbolELF class and moves SymbolSize since only ELF
needs a size expression.
This reduces the size of MCSymbol from 56 to 48 bytes.
llvm-svn: 238801
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.
Differential Revision: http://reviews.llvm.org/D9694
llvm-svn: 238795
Previously CCMP/FCCMP instructions were only used by the
AArch64ConditionalCompares pass for control flow. This patch uses them
for SELECT like instructions as well by matching patterns in ISelLowering.
PR20927, rdar://18326194
Differential Revision: http://reviews.llvm.org/D8232
llvm-svn: 238793
Summary: Implement bswap intrinsic for MIPS FastISel. It's very different for misp32 r1/r2 .
Based on a patch by Reed Kotler.
Test Plan:
bswap1.ll
test-suite
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7219
llvm-svn: 238760
Summary:
Implement the intrinsics memset, memcopy and memmove in MIPS FastISel.
Make some needed infrastructure fixes so that this can work.
Based on a patch by Reed Kotler.
Test Plan:
memtest1.ll
The patch passes test-suite for mips32 r1/r2 and at O0/O2
Reviewers: rkotler, dsanders
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7158
llvm-svn: 238759
Summary: Implement the LLVM assembly urem/srem and sdiv/udiv instructions in MIPS FastISel.
Based on a patch by Reed Kotler.
Test Plan:
srem1.ll
div1.ll
test-suite at O0/O2 for mips32 r1/r2
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7028
llvm-svn: 238757
Summary: Implement the LLVM IR select statement for MIPS FastISelsel.
Based on a patch by Reed Kotler.
Test Plan:
"Make check" test included now.
Passes test-suite at O2/O0 mips32 r1/r2.
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D6774
llvm-svn: 238756
Summary:
The contents of the HI/LO registers are unpredictable after the execution of
the MUL instruction. In addition to implicitly defining these registers in the
MUL instruction definition, we have to mark those registers as dead too.
Without this the fast register allocator is running out of registers when the
MUL instruction is followed by another one that tries to allocate the AC0
register.
Based on a patch by Reed Kotler.
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D9825
llvm-svn: 238755
This is important because of different addressing modes
depending on the address space for GPU targets.
This only adds the argument, and does not update
any of the uses to provide the correct address space.
llvm-svn: 238723
The original version didn't properly account for the base register
being modified before the final jump, so caused miscompilations in
Chromium and LLVM. I've fixed this and tested with an LLVM self-host
(I don't have the means to build & test Chromium).
The general idea remains the same: in pathological cases jump tables
can be too far away from the instructions referencing them (like other
constants) so they need to be movable.
Should fix PR23627.
llvm-svn: 238680
best approach of each.
For vNi16, we use SHL + ADD + SRL pattern that seem easily the best.
For vNi32, we use the PUNPCK + PSADBW + PACKUSWB pattern. In some cases
there is a huge improvement with this in IACA's estimated throughput --
over 2x higher throughput!!!! -- but the measurements are too good to be
true. In one narrow case, the SHL + ADD + SHL + ADD + SRL pattern looks
slightly faster, but I'm not sure I believe any of the measurements at
this point. Both are the exact same uops though. Hard to be confident of
anything past that.
If anyone wants to collect very detailed (Agner-level) timings with the
result of this patch, or with the i32 case replaced with SHL + ADD + SHl
+ ADD + SRL, I'd be very interested. Note that you'll need to test it on
both Ivybridge and Haswell, with both SSE3, SSSE3, and AVX selected as
I saw unique behavior in each of these buckets with IACA all of which
should be checked against measured performance.
But this patch is still a useful improvement by dropping duplicate work
and getting the much nicer PSADBW lowering for v2i64.
I'd still like to rephrase this in terms of generic horizontal sum. It's
a bit lame to have a special case of that just for popcount.
llvm-svn: 238652
The plan was to move the whole table into the already existing ArchExtNames
but some fields depend on a table-generated file, and we don't yet have this
feature in the generic lib/Support side.
Once the minimum target-specific table-generated files are available in a
generic fashion to these libraries, we'll have to keep it in the ASM parser.
llvm-svn: 238651
shorter one. NFC.
In addition to being much shorter to type and requiring fewer arguments,
this change saves over 30 lines from this one file, all wasted on total
boilerplate...
llvm-svn: 238640
shifting vectors of bytes as x86 doesn't have direct support for that.
This removes a bunch of redundant masking in the generated code for SSE2
and SSE3.
In order to avoid the really significant code size growth this would
have triggered, I also factored the completely repeatative logic for
shifting and masking into two lambdas which in turn makes all of this
much easier to read IMO.
llvm-svn: 238637
in-register LUT technique.
Summary:
A description of this technique can be found here:
http://wm.ite.pl/articles/sse-popcount.html
The core of the idea is to use an in-register lookup table and the
PSHUFB instruction to compute the population count for the low and high
nibbles of each byte, and then to use horizontal sums to aggregate these
into vector population counts with wider element types.
On x86 there is an instruction that will directly compute the horizontal
sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily.
Various tricks are used to get vNi32 and vNi16 from the vNi8 that the
LUT computes.
The base implemantion of this, and most of the work, was done by Bruno
in a follow up to D6531. See Bruno's detailed post there for lots of
timing information about these changes.
I have extended Bruno's patch in the following ways:
0) I committed the new tests with baseline sequences so this shows
a diff, and regenerated the tests using the update scripts.
1) Bruno had noticed and mentioned in IRC a redundant mask that
I removed.
2) I introduced a particular optimization for the i32 vector cases where
we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD
+ PSADBW to compute doubled high i32 popcounts. This takes advantage
of the fact that to line up the high i32 popcounts we have to shift
them anyways, and we can shift them by one fewer bit to effectively
divide the count by two. While the PSHUFD based horizontal add is no
faster, it doesn't require registers or load traffic the way a mask
would, and provides more ILP as it happens on different ports with
high throughput.
3) I did some code cleanups throughout to simplify the implementation
logic.
4) I refactored it to continue to use the parallel bitmath lowering when
SSSE3 is not available to preserve the performance of that version on
SSE2 targets where it is still much better than scalarizing as we'll
still do a bitmath implementation of popcount even in scalar code
there.
With #1 and #2 above, I analyzed the result in IACA for sandybridge,
ivybridge, and haswell. In every case I measured, the throughput is the
same or better using the LUT lowering, even v2i64 and v4i64, and even
compared with using the native popcnt instruction! The latency of the
LUT lowering is often higher than the latency of the scalarized popcnt
instruction sequence, but I think those latency measurements are deeply
misleading. Keeping the operation fully in the vector unit and having
many chances for increased throughput seems much more likely to win.
With this, we can lower every integer vector popcount implementation
using the LUT strategy if we have SSSE3 or better (and thus have
PSHUFB). I've updated the operation lowering to reflect this. This also
fixes an issue where we were scalarizing horribly some AVX lowerings.
Finally, there are some remaining cleanups. There is duplication between
the two techniques in how they perform the horizontal sum once the byte
population count is computed. I'm going to factor and merge those two in
a separate follow-up commit.
Differential Revision: http://reviews.llvm.org/D10084
llvm-svn: 238636
a separate routine, generalize it to work for all the integer vector
sizes, and do general code cleanups.
This dramatically improves lowerings of byte and short element vector
popcount, but more importantly it will make the introduction of the
LUT-approach much cleaner.
The biggest cleanup I've done is to just force the legalizer to do the
bitcasting we need. We run these iteratively now and it makes the code
much simpler IMO. Other changes were minor, and mostly naming and
splitting things up in a way that makes it more clear what is going on.
The other significant change is to use a different final horizontal sum
approach. This is the same number of instructions as the old method, but
shifts left instead of right so that we can clear everything but the
final sum with a single shift right. This seems likely better than
a mask which will usually have to read the mask from memory. It is
certaily fewer u-ops. Also, this will be temporary. This and the LUT
approach share the need of horizontal adds to finish the computation,
and we have more clever approaches than this one that I'll switch over
to.
llvm-svn: 238635
It turns out that _except_handler3 and _except_handler4 really use the
same stack allocation layout, at least today. They just make different
choices about encoding the LSDA.
This is in preparation for lowering the llvm.eh.exceptioninfo().
llvm-svn: 238627
This patch corresponds to review:
http://reviews.llvm.org/D9941
It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.
llvm-svn: 238578
Small (really small!) C++ exception handling examples work on 32-bit x86
now.
This change disables the use of .seh_* directives in WinException when
CFI is not in use. It also uses absolute symbol references in the tables
instead of imagerel32 relocations.
Also fixes a cache invalidation bug in MMI personality classification.
llvm-svn: 238575
MIOperands/ConstMIOperands are classes iterating over the MachineOperand
of a MachineInstr, however MachineInstr::mop_iterator does the same
thing.
I assume these two iterators exist to have a uniform interface to
iterate over the operands of a machine instruction bundle and a single
machine instruction. However in practice I find it more confusing to have 2
different iterator classes, so this patch transforms (nearly all) the
code to use mop_iterators.
The only exception being MIOperands::anlayzePhysReg() and
MIOperands::analyzeVirtReg() still needing an equivalent, I leave that
as an exercise for the next patch.
Differential Revision: http://reviews.llvm.org/D9932
This version is slightly modified from the proposed revision in that it
introduces MachineInstr::getOperandNo to avoid the extra counting
variable in the few loops that previously used MIOperands::getOperandNo.
llvm-svn: 238539
This moves all the state numbering code for C++ EH to WinEHPrepare so
that we can call it from the X86 state numbering IR pass that runs
before isel.
Now we just call the same state numbering machinery and insert a bunch
of stores. It also populates MachineModuleInfo with information about
the current function.
llvm-svn: 238514
For x86 targets, do not do sibling call optimization when materializing
the callee's address would require a GOT relocation. We can still do
tail calls to internal functions, hidden functions, and protected
functions, because they do not require this kind of relocation. It is
still possible to get GOT relocations when the user explicitly asks for
it with musttail or -tailcallopt, both of which are supposed to
guarantee TCO.
Based on a patch by Chih-hung Hsieh.
Reviewers: srhines, timmurray, danalbert, enh, void, nadav, rnk
Subscribers: joerg, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D9799
llvm-svn: 238487
We were previously codegen'ing these as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach
the register allocator to allocate the registers in ascending order. This
never got implemented, and up to now we've been stuck with very poor codegen.
A much simpler approach for achiveing better codegen is to create LDM/STM
instructions with identical sets of virtual registers, let the register
allocator pick arbitrary registers and order register lists when printing an
MCInst. This approach also avoids the need to repeatedly calculate offsets
which ultimately ought to be eliminated pre-RA in order to decrease register
pressure.
This is implemented by lowering the memcpy intrinsic to a series of SD-only
MCOPY pseudo-instructions which performs a memory copy using a given number
of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a
little unusual, but it avoids the need to encode register lists in the SD,
and we can take advantage of SD use lists to decide whether to use the _UPD
variant of the instructions.
Fixes PR9199.
Differential Revision: http://reviews.llvm.org/D9508
llvm-svn: 238473
Octeon CPUs use dmtc2 rt,imm16 and dmfcp2 rt,imm16 for the crypto coprocessor.
E.g. dmtc2 rt,0x4057 starts calculation of sha-1.
I had to introduce a new deconding namespace to avoid a decoding conflict.
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D10083
llvm-svn: 238439
Add support for resolving MIPS64r2 and MIPS64r6 relocations in MCJIT.
Patch by Vladimir Radosavljevic.
Differential Revision: http://reviews.llvm.org/D9667
llvm-svn: 238424
Summary:
This patch made two improvements to NaryReassociate and the NVPTX pipeline
1. Run EarlyCSE/GVN after NaryReassociate to get rid of redundant common
expressions.
2. When adding an instruction to SeenExprs, maps both the SCEV before and after
reassociation to that instruction.
Test Plan: updated @reassociate_gep_nsw in nary-gep.ll
Reviewers: meheff, broune
Reviewed By: broune
Subscribers: dberlin, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9947
llvm-svn: 238396
Now that most of the methods in Clang and LLVM that were parsing arch/cpu/fpu
strings are using ARMTargetParser, it's time to make it a bit more conforming
with what the ABI says.
This commit adds some clarification on what build attributes are accepted and
which are "non-standard". It also makes clear that the "defaultCPU" and
"defaultArch" methods were really just build attribute getters.
It also diverges from GCC's behaviour to say that armv2/armv3 are really an
ARMv4 in the build attributes, when the ABI has a clear state for that: Pre-v4.
llvm-svn: 238344
This broke the llvm-mips-linux builder and several of our out-of-tree builders.
Initial investigations show that the commit probably isn't the problem but
reverting anyway while I investigate.
llvm-svn: 238302
With this patch the x86 backend is now shrink-wrapping capable
and this functionality can be tested by using the
-enable-shrink-wrap switch.
The next step is to make more test and enable shrink-wrapping by
default for x86.
Related to <rdar://problem/20821487>
llvm-svn: 238293
This gets gas and llc -filetype=obj to agree on the order of prefixes.
For llvm-mc we need to fix the asm parser to know that it makes a difference
on which line the "lock" is in.
Part of pr23594.
llvm-svn: 238232
v2: Use C++ comments and end with periods
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 238228
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first several times this was committed (e.g. r229831, r233055), it caused several buildbot failures.
Apparently the reason for most failures was both clang and gcc's inability to deal with large numbers (> 10K) of bitset constructor calls in tablegen-generated initializers of instruction info tables.
This should now be fixed.
llvm-svn: 238192
Summary:
Following on from r209907 which made personality encodings indirect, do the
same for TType encodings. This fixes the case where a try/catch block needs
to generate references to, for example, std::exception in the
.gcc_except_table.
This commit uses DW_EH_PE_sdata8 for N64 as far as is possible at the moment.
However, it is possible to end up with DW_EH_PE_sdata4 when a TargetMachine is
not available. There's no risk of issues with inconsistency here since the
tables are self describing but it does mean there is a small chance of the
PC-relative offset being out of range for particularly large programs.
Reviewers: petarj
Reviewed By: petarj
Subscribers: srhines, joerg, tberghammer, llvm-commits
Differential Revision: http://reviews.llvm.org/D9669
llvm-svn: 238190
Part of D9474, this patch extends AVX2 v16i16 types to 2 x 8i32 vectors and uses i32 shift variable shifts before packing back to i16.
Adds AVX2 tests for v8i16 and v16i16
llvm-svn: 238149
This lets us drop a parameter the opName parameter to the VINTRP
multiclass and makes it possible to create multiple VINTRP defs
with the same asm mnemonic.
llvm-svn: 238146
in POWER8:
vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).
http://reviews.llvm.org/D9081
llvm-svn: 238144
The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source.
The existing pattern switches src1 and src2 around, to match the "213" order, which ends up tying the original src2 to the dest. Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied.
This modifies the pattern to leave src1 and src2 in their original order.
Differential Revision: http://reviews.llvm.org/D9908
llvm-svn: 238131
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.
Most of the new testcases aren't optimally merged, and are for
later improvements.
llvm-svn: 238108
When the compare feeding a branch was in a different BB from the branch, we'd
try to "regenerate" the compare in the block with the branch, possibly trying
to make use of values not available there. Copy a page from AArch64's play book
here to fix the problem (at least in terms of correctness).
Fixes PR23640.
llvm-svn: 238097
This is part of the work to remove TargetMachine::resetTargetOptions.
In this patch, instead of updating global variable NoFramePointerElim in
resetTargetOptions, its use in DisableFramePointerElim is replaced with a call
to TargetFrameLowering::noFramePointerElim. This function determines on a
per-function basis if frame pointer elimination should be disabled.
There is no change in functionality except that cl:opt option "disable-fp-elim"
can now override function attribute "no-frame-pointer-elim".
llvm-svn: 238080
This patch adds a class for processing many recip codegen possibilities.
The TargetRecip class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238051
The 'off' field of 'struct bpf_insn' is in cpu-endianness,
since the rest is emitted as little endian, make sure
that 'off' field is little endian as well.
llvm-svn: 238038
The problem was that I slipped a change required for shrink-wrapping, namely I
used getFirstTerminator instead of the getLastNonDebugInstr that was here before
the refactoring, whereas the surrounding code is not yet patched for that.
Original message:
[X86] Refactor the prologue emission to prepare for shrink-wrapping.
- Add a late pass to expand pseudo instructions (tail call and EH returns).
Instead of doing it in the prologue emission.
- Factor some static methods in X86FrameLowering to ease code sharing.
NFC.
Related to <rdar://problem/20821487>
llvm-svn: 238035
This patch adds support for the ISA 2.07 additions involving the
branch history rolling buffer and event-based branching. These will
not be used by typical applications, so built-in support is not
required. They will only be available via inline assembly.
Assembly/disassembly tests are included in the patch.
llvm-svn: 238032
The list of subtarget features for the 7em triple contains 't2xtpk',
which actually disables that subtarget feature. Correct that to
'+t2xtpk' and test that the instructions enabled by that feature do
actually work.
Differential Revision: http://reviews.llvm.org/D9936
llvm-svn: 238022
Revert "[X86] Refactor the prologue emission to prepare for shrink-wrapping."
This reverts commit 6b3b93fc8b68a2c806aa992ee4bd3d7f61898d4b.
This reverts commit ab0b15dff8539826283a59c2dd700a18a9680e0f.
llvm-svn: 238011
- Add a late pass to expand pseudo instructions (tail call and EH returns).
Instead of doing it in the prologue emission.
- Factor some static methods in X86FrameLowering to ease code sharing.
NFC.
Related to <rdar://problem/20821487>
llvm-svn: 237977
Unfortunately, I can't reduce a small test case for this (although compiling
mpfr-3.1.2 with -O2 -mcpu=a2 would fairly reliably trigger a crash), but the
problem is fairly clear (at least once you know you're looking for one). If the
TLS instruction being replaced was at the end of the block, we'd increment the
iterator past it (so it would then point to MBB.end()), and then we'd increment
it again as part of the for statement, thus overrunning the end of the list.
Don't do that.
llvm-svn: 237974
The raw non-instruction/constant form of this is still relying on being
able to access the pointee type from a pointer type - those will be
cleaned up later. For now, just focus on the cases where the pointee
type is easily accessible.
llvm-svn: 237958
My recent patch to add support for ISA 2.07 vector pack/unpack
instructions didn't properly check for availability of the vpkudum
instruction when recognizing it as a special vector shuffle case.
This causes us to leave the vector shuffle in place (rather than
converting it to a vector permute) so that it can be recognized later
as a vpkudum, but that pattern is invalid for processors prior to
POWER8. Thus LLVM crashes with an "unable to select" message. We
observed this since one of our buildbots is configured to generate
code for a POWER7.
This patch fixes the problem by checking for availability of the
vpkudum instruction during custom lowering of vector shuffles.
I've added a test case variant for the vpkudum pattern when the
instruction isn't available.
llvm-svn: 237952
On X86 (and similar OOO cores) unrolling is very limited, and even if the
runtime unrolling is otherwise profitable, the expense of a division to compute
the trip count could greatly outweigh the benefits. On the A2, we unroll a lot,
and the benefits of unrolling are more significant (seeing a 5x or 6x speedup
is not uncommon), so we're more able to tolerate the expense, on average, of a
division to compute the trip count.
llvm-svn: 237947
http://reviews.llvm.org/D9891
Following up on the VSX single precision loads and stores added earlier, this
adds support for elementary arithmetic operations on single precision values
in VSX registers. These instructions utilize the new VSSRC register class.
Instructions added:
xsaddsp
xsdivsp
xsmulsp
xsresp
xsrsqrtesp
xssqrtsp
xssubsp
llvm-svn: 237937
This starts merging MCSection and MCSectionData.
There are a few issues with the current split between MCSection and
MCSectionData.
* It optimizes the the not as important case. We want the production
of .o files to be really fast, but the split puts the information used
for .o emission in a separate data structure.
* The ELF/COFF/MachO hierarchy is not represented in MCSectionData,
leading to some ad-hoc ways to represent the various flags.
* It makes it harder to remember where each item is.
The attached patch starts merging the two by moving the alignment from
MCSectionData to MCSection.
Most of the patch is actually just dropping 'const', since
MCSectionData is mutable, but MCSection was not.
llvm-svn: 237936
Predicate UseAVX depricates pattern selection on AVX-512.
This predicate is necessary for DAG selection to select EVEX form.
But mapping SSE intrinsics to AVX-512 instructions is not ready yet.
So I replaced UseAVX with HasAVX for intrinsics patterns.
llvm-svn: 237903
This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization).
It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner.
Differential Revision: http://reviews.llvm.org/D9848
llvm-svn: 237885
Ideally this is going to be and LLVM IR pass (shared, among others
with AArch64), but for the time being just enable it if consumers
ask us for optimization and not unconditionally.
Discussed with Tim Northover on IRC.
llvm-svn: 237837
Now that Intrinsic::ID is a typed enum, we can forward declare it and so return it from this method.
This updates all users which were either using an unsigned to store it, or had a now unnecessary cast.
llvm-svn: 237810
fixed extract-insert i1 element,
load i1, zextload i1 should be with "and $1, %reg" to prevent loading garbage.
added a bunch of new tests.
llvm-svn: 237793
Summary:
For N32/N64, private labels begin with '.L' but for O32 they begin with '$'.
MCAsmInfo now has an initializer function which can be used to provide information from the TargetMachine to control the assembly syntax.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: jfb, sandeep, llvm-commits, rafael
Differential Revision: http://reviews.llvm.org/D9821
llvm-svn: 237789
Summary:
The documentation writes vectors highest-index first whereas LLVM-IR writes
them lowest-index first. As a result, instructions defined in terms of
left_half() and right_half() had the halves reversed.
In addition to correcting them, they have been improved to allow shuffles
that use the same operand twice or in reverse order. For example, ilvev
used to accept masks of the form:
<0, n, 2, n+2, 4, n+4, ...>
but now accepts:
<0, 0, 2, 2, 4, 4, ...>
<n, n, n+2, n+2, n+4, n+4, ...>
<0, n, 2, n+2, 4, n+4, ...>
<n, 0, n+2, 2, n+4, 4, ...>
One further improvement is that splati.[bhwd] is now the preferred instruction
for splat-like operations. The other special shuffles are no longer used
for splats. This lead to the discovery that <0, 0, ...> would not cause
splati.[hwd] to be selected and this has also been fixed.
This fixes the enc-3des test from the test-suite on Mips64r6 with MSA.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9660
llvm-svn: 237689
This changes the ABI used on 32-bit x86 for passing vector arguments.
Historically, clang passes the first 4 vector arguments in-register, and additional vector arguments on the stack, regardless of platform. That is different from the behavior of gcc, icc, and msvc, all of which pass only the first 3 arguments in-register.
The 3-register convention is documented, unofficially, in Agner's calling convention guide, and, officially, in the recently released version 1.0 of the i386 psABI.
Darwin is kept as is because the OS X ABI Function Call Guide explicitly documents the current (4-register) behavior.
This fixes PR21510
Differential revision: http://reviews.llvm.org/D9644
llvm-svn: 237682
This reverts commit r237210.
Also fix X86/complex-fca.ll to match the code that we used to generate
on win32 and now generate everwhere to conform to SysV.
llvm-svn: 237639
ld64 currently mishandles internal pointer relocations (i.e.
ARM64_RELOC_UNSIGNED referred to by section & offset rather than symbol). The
existing __cfstring clause was an early discovery and workaround for this, but
the problem is wider and we should avoid such relocations wherever possible for
now.
This code should be reverted to allowing internal relocations as soon as
possible.
PR23437.
llvm-svn: 237621
This was previously returning int. However there are no negative opcode
numbers and more importantly this was needlessly different from
MCInstrDesc::getOpcode() (which even is the value returned here) and
SDValue::getOpcode()/SDNode::getOpcode().
llvm-svn: 237611
Previously, they were forced to immediately follow the actual branch
instruction. This was usually OK (the LEAs actually accessing them got emitted
nearby, and weren't usually separated much afterwards). Unfortunately, a
sufficiently nasty phi elimination dumps many instructions right before the
basic block terminator, and this can increase the range too much.
This patch frees them up to be placed as usual by the constant islands pass,
and consequently has to slightly modify the form of TBB/TBH tables to refer to
a PC-relative label at the final jump. The other jump table formats were
already position-independent.
rdar://20813304
llvm-svn: 237590
This pseudo-instruction expands into 'sethi' and 'or' instructions,
or, just one of them, if the other isn't necessary for a given value.
Differential Revision: http://reviews.llvm.org/D9089
llvm-svn: 237585
- Adds support for the asm syntax, which has an immediate integer
"ASI" (address space identifier) appearing after an address, before
a comma.
- Adds the various-width load, store, and swap in alternate address
space instructions. (ldsba, ldsha, lduba, lduha, lda, stba, stha,
sta, swapa)
This does not attempt to hook these instructions up to pointer address
spaces in LLVM, although that would probably be a reasonable thing to
do in the future.
Differential Revision: http://reviews.llvm.org/D8904
llvm-svn: 237581
(Note that register "Y" is essentially just ASR0).
Also added some test cases for divide and multiply, which had none before.
Differential Revision: http://reviews.llvm.org/D8670
llvm-svn: 237580
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.
This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.
Patch by Luke Cheeseman.
Differential Revision: http://reviews.llvm.org/D9699
llvm-svn: 237579
instructions. These intrinsics are comming with rounding mode.
Added intrinsics for MAXSS/D, MINSS/D - with and without sae.
By Asaf Badouh (asaf.badouh@intel.com)
llvm-svn: 237560
If some commits are happy, and some commits are sad, this is a sad commit. It
is sad because it restricts instruction scheduling to work around a binutils
linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was
updated not to produce code triggering this bug, and now we'll do the same...
When resolving an address using the ELF ABI TOC pointer, two relocations are
generally required: one for the high part and one for the low part. Only
the high part generally explicitly depends on r2 (the TOC pointer). And, so,
we might produce code like this:
.Ltmp526:
addis 3, 2, .LC12@toc@ha
.Ltmp1628:
std 2, 40(1)
ld 5, 0(27)
ld 2, 8(27)
ld 11, 16(27)
ld 3, .LC12@toc@l(3)
rldicl 4, 4, 0, 32
mtctr 5
bctrl
ld 2, 40(1)
And there is nothing wrong with this code, as such, but there is a linker bug
in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will
misoptimize this code sequence to this:
nop
std r2,40(r1)
ld r5,0(r27)
ld r2,8(r27)
ld r11,16(r27)
ld r3,-32472(r2)
clrldi r4,r4,32
mtctr r5
bctrl
ld r2,40(r1)
because the linker does not know (and does not check) that the value in r2
changed in between the instruction using the .LC12@toc@ha (TOC-relative)
relocation and the instruction using the .LC12@toc@l(3) relocation.
Because it finds these instructions using the relocations (and not by
scanning the instructions), it has been asserted that there is no good way
to detect the change of r2 in between. As a result, this bug may never be
fixed (i.e. it may become part of the definition of the ABI). GCC was
updated to add extra dependencies on r2 to instructions using the @toc@l
relocations to avoid this problem, and we'll do the same here.
This is done as a separate pass because:
1. These extra r2 dependencies are not really properties of the
instructions, but rather due to a linker bug, and maybe one day we'll be
able to get rid of them when targeting linkers without this bug (and,
thus, keeping the logic centralized here will make that
straightforward).
2. There are ISel-level peephole optimizations that propagate the @toc@l
relocations to some user instructions, and so the exta dependencies do
not apply only to a fixed set of instructions (without undesirable
definition replication).
The test case was reduced with the help of bugpoint, with minimal cleaning. I'm
looking forward to our upcoming MI serialization support, and with that, much
better tests can be created.
llvm-svn: 237556
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Of these, 'o' and 'v' are not tested but were already implemented.
I'm not sure why 'i' is required for X86 since it's supposed to be an
immediate constraint rather than a memory constraint. A test asserts
without it so I've included it for now.
No functional change intended.
Reviewers: nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8254
llvm-svn: 237517
This patch adds support for the following new instructions in the
Power ISA 2.07:
vpksdss
vpksdus
vpkudus
vpkudum
vupkhsw
vupklsw
These instructions are available through the vec_packs, vec_packsu,
vec_unpackh, and vec_unpackl built-in interfaces. These are
lane-sensitive instructions, so the built-ins have different
implementations for big- and little-endian, and the instructions must
be marked as killing the vector swap optimization for now.
The first three instructions perform saturating pack operations. The
fourth performs a modulo pack operation, which means it can be
represented with a vector shuffle, and conversely the appropriate
vector shuffles may cause this instruction to be generated. The other
instructions are only generated via built-in support for now.
Appropriate tests have been added.
There is a companion patch to clang for the rest of this support.
llvm-svn: 237499
Other pieces of CodeGen want to negate frame object offsets to account
for architectures where the stack grows down. Our object is a pseudo
object so it's offset doesn't matter. However, we shouldn't choose an
offset which results in undefined behavior if you negate it.
llvm-svn: 237474
Summary:
To maintain compatibility with GAS, we need to stop treating negative 32-bit immediates as 64-bit values when expanding LI/DLI.
This currently happens because of sign extension.
To do this we need to choose the 32-bit value expansion for values which use their upper 33 bits only for sign extension (i.e. no 0's, only 1's).
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8662
llvm-svn: 237428
Instead of doing that, create a temporary copy of MCTargetOptions and reset its
SanitizeAddress field based on the function's attribute every time an InlineAsm
instruction is emitted in AsmPrinter::EmitInlineAsm.
This is part of the work to remove TargetMachine::resetTargetOptions (the FIXME
added to TargetMachine.cpp in r236009 explains why this function has to be
removed).
Differential Revision: http://reviews.llvm.org/D9570
llvm-svn: 237412
The induction variable in the vectorized loop wasn't
recognized properly, so a hardware loop wasn't generated.
Differential Revision: http://reviews.llvm.org/D9722
llvm-svn: 237388
After converting a loop to a hardware loop, the pass should remove
any unnecessary instructions from the old compare-and-branch
code. This patch removes a dead constant assignment that was
used in the compare instruction.
Differential Revision: http://reviews.llvm.org/D9720
llvm-svn: 237373
If the loop trip count may underflow or wrap, the compiler should
not generate a hardware loop since the trip count will be
incorrect.
llvm-svn: 237365
Summary:
When we are trying to fill the delay slot of a call instruction, we must avoid
filler instructions that use the $ra register. This fixes the test
MultiSource/Applications/JM/lencod when we enable the forward delay slot filler.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9670
llvm-svn: 237362
Summary:
If we only pass the necessary operands, we don't have to determine the position of the symbol operand when entering expandLoadAddressSym().
This simplifies the expandLoadAddressSym() code.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9291
llvm-svn: 237355
i1 type is a legal type on AVX-512 and can be passed as parameter or return value.
i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here).
The result code is similar to the previous X86 targets, where i1 is allways promoted to i8.
llvm-svn: 237350
There's no need to manually pass modifier strings around to tell an operand how
to print now, that information is encoded in the operand itself since the MC
layer came along.
llvm-svn: 237295
We were creating and propagating two separate indices for each jump table (from
back in the mists of time). However, the generic index used by other backends
is sufficient to emit a unique symbol so this was unneeded.
llvm-svn: 237294
The previous logic mixed 2 separate questions:
+ Can we form a TBB/TBH instruction?
+ Can we remove the jump-table calculation before it?
It then performed a bunch of random tests on the instructions earlier in the
basic block, which were probably sufficient to answer 2 but only because of the
very limited ways in which a t2BR_JT can actually be created.
For example there's no reason to expect the LeaInst to define the same base
register as the following indexing calulation. In practice this means we might
have missed opportunities to form TBB/TBH, in theory you could end up
misidentifying a sequence and removing the wrong LEA:
%R1 = t2LEApcrelJT ...
%R2 = t2LEApcrelJT ...
<... using and killing %R2 ...>
%R2 = t2ADDr %R1, $Ridx
Before we would have looked for an LEA defining %R2 and found the wrong one. We
just got lucky that jump table setup was (almost?) always confined to a single
basic block and there was only one jump table per block.
llvm-svn: 237293
Some compilers warn about using the ternary operator with an unsigned variable
and enum.
I haven't seen this trigger in the llvm.org buildbots yet, but it probably will
at some point.
Reported by Daniel Sanders.
llvm-svn: 237262
The hardware loop pass should try to generate a hardware loop
instruction when the original loop has a critical edge.
Differential Revision: http://reviews.llvm.org/D9678
llvm-svn: 237258
Summary: A side-effect of this is that LA gains proper handling of unsigned and positive signed 16-bit immediates and more accurate error messages.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9290
llvm-svn: 237255
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first two times this was committed (r229831, r233055), it caused several buildbot failures.
At least some of the ARM and MIPS ones were due to gcc/binutils issues, and should now be fixed.
llvm-svn: 237234
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`. `id` gets propagated to the ID field
in the generated StackMap section. If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.
This change brings statepoints one step closer to patchpoints. With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.
PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`. This can be made more sophisticated
later.
Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9546
llvm-svn: 237214
They do more harm than good when used in the MachineScheduler as they
tend to take preference to register pressure minimsation which is more
important for swift.
Differential Revision: http://reviews.llvm.org/D9718
llvm-svn: 237179
Summary:
This rule was always in the old SysV i386 ABI docs and the new ones that
H.J. Lu has put together, but we never noticed:
EAX scratch register; also used to return integer and pointer values
from functions; also stores the address of a returned struct or union
Fixes PR23491.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9715
llvm-svn: 237175
AMDGPU::SI_SPILL_V96_RESTORE was missing from a switch statement, which
caused the srsrc and soffset register to not be set correctly.
This commit replaces the switch statement with a SITargetInfo query
to make sure all spill instructions are covered.
Differential Revision: http://reviews.llvm.org/D9582
llvm-svn: 237164
On Mips, frame pointer points to the same side of the frame as the stack
pointer. This function is used to decide where to put register scavenging
spill slot. So far, it was put on the wrong side of the frame, and thus it
was too far away from $fp when frame was larger than 2^15 bytes.
Patch by Vladimir Radosavljevic.
http://reviews.llvm.org/D8895
llvm-svn: 237153
Spilling can insert instructions almost anywhere, and this can mess
up control flow lowering in a multitude of ways, due to instruction
reordering. Let's sort this out the easy way: never spill registers
involved with control flow, i.e. saved EXEC masks.
Unfortunately, this does not work at all with optimizations disabled,
as the register allocator ignores spill weights. This should be
addressed in a future commit.
The test was reduced from the "stacks" shader of [1]. Some issues
trigger the machine verifier while another one is checked manually.
[1] http://madebyevan.com/webgl-path-tracing/
v2: only insert pass with optimizations enabled, merge test runs.
Patch by: Grigori Goronzy
llvm-svn: 237152
We had code to do this in SIRegisterInfo::eliminateFrameIndex(), but
it is easier to just change the definition of SI_SPILL_S32_RESTORE to
only allow numbered sgprs.
llvm-svn: 237143
Instead add m0 as an implicit operand. This allows us to avoid using
the M0Reg register class and eliminates a number of unnecessary spills
when using s_sendmsg instructions. This impacts one shader in the
shader-db:
SGPRS: 48 -> 40 (-16.67 %)
VGPRS: 112 -> 108 (-3.57 %)
Code Size: 40132 -> 38796 (-3.33 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 2048 -> 0 (-100.00 %) bytes per wave
llvm-svn: 237133
The other changes in the LowerShift() are not functional,
just to make the code more convenient.
So, the functional changes for SKX only.
llvm-svn: 237129
AEABI defines aligned variants of memcpy etc. that can be faster than
the default version due to not having to do alignment checks. When
emitting target code for these functions make use of these aligned
variants if possible. Also convert memset to memclr if possible.
Differential Revision: http://reviews.llvm.org/D8060
llvm-svn: 237127
Before revision 171146, function 'PerformTruncateCombine' used to perform
a premature lowering of TRUNCATE dag nodes.
Revision 171146 then moved all the logic implemented by PerformTruncateCombine
to a custom lowering hook. However, that revision forgot to delete
function PerformTruncateCombine from the code.
This patch removes function 'PerformTruncateCombine' since it has no effect
on the SelectionDAG. No functional change intended.
llvm-svn: 237122
Summary: Allow calls with non legal integer types based on i8 and i16 to be processed by mips fast-isel.
Based on a patch by Reed Kotler.
Test Plan:
"Make check" test forthcoming.
Test-suite passes at O0/O2 and with mips32 r1/r2
Reviewers: rkotler, dsanders
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D6770
llvm-svn: 237121
Summary:
Try to compute addresses when the offset from a memory location is a constant
expression.
Based on a patch by Reed Kotler.
Test Plan:
Passes test-suite for -O0/O2 and mips 32 r1/r2
Reviewers: rkotler, dsanders
Subscribers: llvm-commits, aemerson, rfuhler
Differential Revision: http://reviews.llvm.org/D6767
llvm-svn: 237117
sys/time.h on Solaris (and possibly other systems) defines "SEC" as "1"
using a cpp macro. The result is that this fails to compile.
Fixes https://llvm.org/PR23482
llvm-svn: 237112
The X86-specific DAGCombine for stores should not assume vector types are always simple.
This fixes PR23476.
Differential Revision: http://reviews.llvm.org/D9659
llvm-svn: 237097
to use the information in the module rather than TargetOptions.
We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.
For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.
Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.
llvm-svn: 237079
The TargetRegistry is just a namespace-like class, instantiated in one
place to use a range-based for loop. Instead, expose access to the
registry via a range-based 'targets()' function instead. This makes most
uses a bit awkward/more verbose - but eventually we should just add a
range-based find_if function which will streamline these functions. I'm
happy to mkae them a bit awkward in the interim as encouragement to
improve the algorithms in time.
llvm-svn: 237059
Summary:
r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.
This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:
1. Set loadextaction for f16 vectors to expand.
2. Handle FP_EXTEND in a switch statement when handling v2f32
3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or
(store (SINT_TO_FP )) to a FILD.
Tests included.
Reviewers: ab, srhines, delena
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9092
llvm-svn: 237004
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
llvm-svn: 236916
This new class in a global context contain arch-specific knowledge in order
to provide LLVM libraries, tools and projects with the ability to understand
the architectures. For now, only FPU, ARCH and ARCH extensions on ARM are
supported.
Current behaviour it to parse from free-text to enum values and back, so that
all users can share the same parser and codes. This simplifies a lot both the
ASM/Obj streamers in the back-end (where this came from), and the front-end
parsers for command line arguments (where this is going to be used next).
The previous implementation, using .def/.h includes is deprecated due to its
inflexibility to be built without the backend support and for being too
cumbersome. As more architectures join this scheme, and as more features of
such architectures are added (such as hardware features, type sizes, etc) into
a full blown TargetDescription class, having a set of classes is the most
sane implementation.
The ultimate goal of this refactor both LLVM's and Clang's target description
classes into one unique interface, so that we can de-duplicate and standardise
the descriptions, as well as make it available for other front-ends, tools,
etc.
The FPU parsing for command line options in Clang has been converted to use
this new library and a number of aliases were added for compatibility:
* A bogus neon-vfpv3 alias (neon defaults to vfp3)
* armv5/v6
* {fp4/fp5}-{sp/dp}-d16
Next steps:
* Port Clang's ARCH/EXT parsing to use this library.
* Create a TableGen back-end to generate this information.
* Run this TableGen process regardless of which back-ends are built.
* Expose more information and rename it to TargetDescription.
* Continue re-factoring Clang to use as much of it as possible.
llvm-svn: 236900
Refactored parts of the hardware loop pass to generate
more. Also, added more tests.
Differential Revision: http://reviews.llvm.org/D9568
llvm-svn: 236896
A trunc from i32 to i1 on x86_64 generates an instruction such as
%vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9
However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one.
Otherwise, we are killing the source of the truncate which could be used later in the program.
llvm-svn: 236890
This changes the shape of the statepoint intrinsic from:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)
to:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)
This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.
In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.
Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.
Differential Revision: http://reviews.llvm.org/D9501
llvm-svn: 236888
Improved the AnalyzeBranch, InsertBranch, and RemoveBranch
functions in order to handle more of our branch instructions.
This requires changes to analyzeCompare and PredicateInstructions.
Specifically, we've added support for new value compare jumps,
improved handling of endloop, added more compare instructions,
and improved support for predicate instructions.
Differential Revision: http://reviews.llvm.org/D9559
llvm-svn: 236876
The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes
where the mask node is a load from constant pool, and the constant pool node
is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching
it how to also look through X86ISD::WrapperRIP.
This helps function combineX86ShufflesRecusively to combine more shuffle
sequences containing PSHUFB nodes if we are in RIPRel PIC mode.
Before this change, llc (with -relocation-model=pic -march=x86-64) was unable
to decode a pshufb where the mask was loaded from a constant pool. For example,
the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its
operand, so instead of generating a single 'movaps' the backend always
generated a sub-optimal 'movdqa + pshufb' sequence.
Added test x86-fold-pshufb.ll.
llvm-svn: 236863
Summary:
In microMIPS, labels need to know whether they are on code or data. This is
indicated with STO_MIPS_MICROMIPS and can be inferred by being followed
by instructions. For empty basic blocks, we can ensure this by emitting the
.insn directive after the label.
Also, this fixes some failures in our out-of-tree microMIPS buildbots, for the
exception handling regression tests under: SingleSource/Regression/C++/EH
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9530
llvm-svn: 236815
We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there.
Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use.
llvm-svn: 236764
This patch corresponds to review:
http://reviews.llvm.org/D9440
It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.
llvm-svn: 236755
This is a follow-on to r236740 where I took Andrea's advice
in D9504 to remove a redundant pattern...except that I removed
the wrong pattern!
AFAICT, there is no change in the final code produced because
subsequent passes would clean up the extra instructions created
by the more complicated pattern.
llvm-svn: 236743
Finish the job that was abandoned in D6958 following the refactoring in
http://reviews.llvm.org/rL230221:
1. Uncomment the intrinsic def for the AVX r_Int instruction.
2. Add missing r_Int entries to the load folding tables; there are already
tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I
haven't added any more in this patch.
3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ).
So instead of this:
movaps %xmm0, %xmm1
rcpss %xmm1, %xmm1
movss %xmm1, %xmm0
We should now get:
rcpss %xmm0, %xmm0
And instead of this:
vsqrtss %xmm0, %xmm0, %xmm1
vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3]
We should now get:
vsqrtss %xmm0, %xmm0, %xmm0
Differential Revision: http://reviews.llvm.org/D9504
llvm-svn: 236740
http://reviews.llvm.org/D9517
The separate header file allows to reuse the MIPS ABI flags structure
constants in other LLVM tools like the llvm-readobj.
No functional changes.
llvm-svn: 236732
Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec.
By Igor Breger (igor.breger@intel.com)
llvm-svn: 236714
Summary: This will enable the IAS to reject floating point instructions if soft-float is enabled.
Reviewers: dsanders, echristo
Reviewed By: dsanders
Subscribers: jfb, llvm-commits, mpf
Differential Revision: http://reviews.llvm.org/D9053
llvm-svn: 236713
When folding a load in to another instruction, we need to fix the class of the index register
Otherwise, it could be something like GR64 not GR64_NOSP and would fail the machine verifier.
llvm-svn: 236644
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.
Differential Revision: http://reviews.llvm.org/D9515
llvm-svn: 236613
With neon enabled, we reach SelectBinaryFPOp and are able to get registers for a <2 x double> add.
However, we shouldn't actually attempt arithmetic on it as ARMIselLowering says "v2f64 is legal so that QR subregs can be extracted as f64 elements, but neither Neon nor VFP support any arithmetic operations on it."
This commit disables SelectBinaryFPOp for any vector types. There's already a FIXME to try handle neon. Doing so would require fixing this conditional which isn't safe for vectors 'VT == MVT::f64 || VT == MVT::i64'
llvm-svn: 236609
The initial code drop for VSX swap optimization permitted the
optimization only when all operations in a web of related computation
are lane-insensitive. For some lane-sensitive operations, we can
still permit the optimization provided that we make adjustments to
those operations. This patch adds special handling for vector splats
so that their presence doesn't kill the optimization.
Vector splats are lane-sensitive since they identify by number a
vector element to be used as the source of a splat. When swap
optimizations take place, the desired vector element will move to the
opposite doubleword of the quadword vector. We thus replace the index
I by (I + N/2) % N, where N is the number of elements in the vector.
A new test case is added to test that swap optimization succeeds when
vector splats are present, and that the proper input element is used
as the source of the splat.
An ancillary change removes SH_BUILDVEC as one of the kinds of special
handling that may be required by VSX swap optimization. From
experience with GCC, I had expected to need some modifications for
vector build operations, but I did not find that to be the case.
llvm-svn: 236606
Since r234249, i1 are sext instead of zext; because of that, doing
"CMP rN, #0; IT EQ/NE" isn't correct anymore.
"TST #1" is the conservatively correct alternative - the tradeoff being
that it doesn't have a 16-bit encoding -, so use that instead.
llvm-svn: 236569
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997
...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
There are 2 structural changes here:
1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.
2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes.
Differential Revision: http://reviews.llvm.org/D8900
llvm-svn: 236546
The register set for LDMIA begins at offset 3, not 4. We were previously
missing the short encoding of this instruction in the case where the base
register was the first register in the register set.
Also clean up some dead code:
- The isARMLowRegister check is redundant with what VerifyLowRegs does;
replace with an assert.
- Remove handling of LDMDB instruction, which has no short encoding (and
does not appear in ReduceTable).
Differential Revision: http://reviews.llvm.org/D9485
llvm-svn: 236535
This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.
For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.
For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result. Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.
Based on a patch by Richard Sandiford.
llvm-svn: 236527
The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be
passed in vector registers. We do not yet support those types, and
some infrastructure is missing before we can do so.
In order to prevent accidentally generating code violating the ABI,
this patch adds checks to detect those types and error out if user
code attempts to use them.
llvm-svn: 236526
The ABI allows sub-128 vectors to be passed and returned in registers,
with the vector occupying the upper part of a register. We therefore
want to legalize those types by widening the vector rather than promoting
the elements.
The patch includes some simple tests for sub-128 vectors and also tests
that we can recognize various pack sequences, some of which use sub-128
vectors as temporary results. One of these forms is based on the pack
sequences generated by llvmpipe when no intrinsics are used.
Signed unpacks are recognized as BUILD_VECTORs whose elements are
individually sign-extended. Unsigned unpacks can have the equivalent
form with zero extension, but they also occur as shuffles in which some
elements are zero.
Based on a patch by Richard Sandiford.
llvm-svn: 236525
The z13 vector facility includes some instructions that operate only on the
high f64 in a v2f64, effectively extending the FP register set from 16
to 32 registers. It's still better to use the old instructions if the
operands happen to fit though, since the older instructions have a shorter
encoding.
Based on a patch by Richard Sandiford.
llvm-svn: 236524
The architecture doesn't really have any native v4f32 operations except
v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
elements being used. Even so, using vector registers for <4 x float>
and scalarising individual operations is much better than generating
completely scalar code, since there's much less register pressure.
It's also more efficient to do v4f32 comparisons by extending to 2
v2f64s, comparing those, then packing the result.
This particularly helps with llvmpipe.
Based on a patch by Richard Sandiford.
llvm-svn: 236523
This adds ABI and CodeGen support for the v2f64 type, which is natively
supported by z13 instructions.
Based on a patch by Richard Sandiford.
llvm-svn: 236522
This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility. This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).
When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
(except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.
The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.
However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.
These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level. This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.
Based on a patch by Richard Sandiford.
llvm-svn: 236521
This patch adds support for the z13 processor type and its vector facility,
and adds MC support for all new instructions provided by that facilily.
Apart from defining the new instructions, the main changes are:
- Adding VR128, VR64 and VR32 register classes.
- Making FP64 a subclass of VR64 and FP32 a subclass of VR32.
- Adding a D(V,B) addressing mode for scatter/gather operations
- Adding 1-, 2-, and 3-bit immediate operands for some 4-bit fields.
Until now all immediate operands have been the same width as the
underlying field (hence the assert->return change in decode[SU]ImmOperand).
In addition, sys::getHostCPUName is extended to detect running natively
on a z13 machine.
Based on a patch by Richard Sandiford.
llvm-svn: 236520
This reverts commit r236360.
This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.
llvm-svn: 236508
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.
This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.
Phabricator review: http://reviews.llvm.org/D9475
llvm-svn: 236503
Summary:
When using the N64 ABI, element-indices use the i64 type instead of i32.
In many cases, we can use iPTR to account for this but additional patterns
and pseudo's are also required.
This fixes most (but not quite all) failures in the test-suite when using
N64 and MSA together.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9342
llvm-svn: 236494
When forming an IT block from the first MOV here:
%R2<def> = t2MOVr %R0, pred:1, pred:%CPSR, opt:%noreg
%R3<def> = tMOVr %R0<kill>, pred:14, pred:%noreg
the move in to R3 is moved out of the IT block so that later instructions on the same predicate can be inside this block, and we can share the IT instruction.
However, when moving the R3 copy out of the IT block, we need to clear its kill flags for anything in use at this point in time, ie, R0 here.
This appeases the machine verifier which thought that R0 wasn't defined when used.
I have a test case, but its extremely register allocator specific. It would be too fragile to commit a test which depends on the register allocator here.
llvm-svn: 236468
At the moment, all subregs defined by the SystemZ target can be modified
independently of the wider register. E.g. writing to a GR32 does not
change the upper 32 bits of the GR64. Writing to an FP32 does not change
the lower 32 bits of the FP64.
Hoewver, the upcoming support for the vector extension redefines FP64 as
one half of a V128. Floating-point operations leave the other half of
a V128 in an unpredictable state, so it's no longer the case that writing
to an FP32 leaves the bits of the underlying register (the V128) alone.
I'd prefer to have separate subreg_ names for this situation, so that
it's obvious at a glance whether we're talking about a subreg that leaves
the other parts of the register alone.
No behavioral change intended.
Patch originally by Richard Sandiford.
llvm-svn: 236433
We know what MemoryKind an operand has at the time we construct it,
so we might as well just record it in an unused part of the structure.
This makes it easier to add scatter/gather addresses later.
No behavioral change intended.
Patch originally by Richard Sandiford.
llvm-svn: 236432
It seems SystemZTargetLowering::getTargetNodeName got out of sync with
some recent changes to the SystemZISD opcode list. Add back all the
missing opcodes (and re-sort to the same order as SystemISelLowering.h).
llvm-svn: 236430
Removed code that was replicating v8i16 'shift + mask' implementation that is done more nicely by making use of LowerScalarImmediateShift
llvm-svn: 236388
This pass is responsible for constructing the EH registration object
that gets linked into fs:00, which is all it does in this change. In the
future, it will also insert stores to update the EH state number.
I considered keeping this functionality in WinEHPrepare, but it's pretty
separable and X86 specific. It has conceptually very little to do with
the task of WinEHPrepare, which is currently outlining. WinEHPrepare is
also in theory useful on ARM, but this logic is pretty x86 specific.
Reviewers: andrew.w.kaylor, majnemer
Differential Revision: http://reviews.llvm.org/D9422
llvm-svn: 236339
Converting from t2LDRs to tLDRr caused the shift argument to drop the internal flag. This would then throw machine verifier errors.
Unfortunately i'm having trouble reducing a test case. I'm going to keep trying, but so far its a scary combination of machine sinking, an 'and i1', loads feeding loads, and a bunch of code which shouldn't change IT block formation, but does. Its not useful to commit a test in that state as we have no way of knowing if it even hits this code reliably in future.
rdar://problem/20752113
llvm-svn: 236333
Functions with jump tables need an alignment of 4 because they use the ADR
instruction, which aligns the PC to 4 bytes before adding an offset.
Differential Revision: http://reviews.llvm.org/D9424
llvm-svn: 236327
Summary:
LI should never accept immediates larger than 32 bits.
The additional Is32BitImm boolean also paves the way for unifying the functionality that LA and LI have in common.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9289
llvm-svn: 236313
Summary:
Generate one DSLL32 of 0 instead of two consecutive DSLL of 16.
In order to do this I had to change createLShiftOri's template argument from a bool to an unsigned.
This also gave me the opportunity to rewrite the mips64-expansions.s test, as it was testing the same cases multiple times and skipping over other cases.
It was also somewhat unreadable, as the CHECK lines were grouped in a huge block of text at the beginning of the file.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8974
llvm-svn: 236311
This pass was generating 'Instruction does not dominate all uses!'
errors for programs which had loops with a condition variable that
depended on the result of a phi instruction from outside of the loop.
The pass was inserting new phi nodes outside of the loop which used values
defined inside the loop.
http://bugs.freedesktop.org/show_bug.cgi?id=90056
llvm-svn: 236306
If we move an instruction from one block down to a MOVC and predicate it,
then the original instruction could be moved in to a loop. In this case,
its invalid for any kill flags to remain on there.
Fails with -verfy-machineinstrs.
rdar://problem/20752113
llvm-svn: 236290
The expansion for t2ABS was always setting the kill flag on the rsb instruction.
It should instead only be set on rsb if it was set on the original ABS instruction.
rdar://problem/20752113
llvm-svn: 236272
This helps reduce the frequency of stack realignment prologues in 32-bit
X86 Windows code. Before this change and the corresponding clang change,
we would take the max of the type preferred alignment and the explicit
alignment on the alloca.
If you don't override aggregate alignment in datalayout, you get a
default of 8. This dates back to 2007 / r34356, and changing it seems
prohibitively difficult at this point.
llvm-svn: 236270
temporary.
Because of that:
1. The machine verifier was complaining on such code.
2. The generate code worked just because the thumb reduction size pass fixed the
opcode.
rdar://problem/20749824
llvm-svn: 236247
changes:
Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
Add location to getConstant
Drop comment about the ops being turned into expand
llvm-svn: 236240
This was breaking sqlite with the machine verifier because operand 0 was a def according to tablegen, but didn't have the 'isDef' flag set.
Looking at the ISA, its clear that this operand is a source as writing to st(0) is implicit. So move the operand to the correct place in the td file.
rdar://problem/20751584
llvm-svn: 236183
There's probably no way to test BXJ, but if the compiler ever did emit it
during CodeGen it would have to be a block terminator so "isBranch" is
appropriate.
BLX is more tricky. Clearly a call, but it affects surprisingly little.
rdar://18719544
llvm-svn: 236140
x86 Windows uses the '_' prefix for all global symbols, and this was
mistakenly being applied to frameescape labels, which are not externally
visible global symbols. They use the private global prefix 'L'.
The *right* way to fix this is probably to stop masquerading this label
as an ExternalSymbol and create a new SDNode type. These labels are not
"external", and we know they will be resolved by assembly time. Having a
custom SDNode type would allow us to do better X86 address mode
matching, so it's probably worth doing eventually.
llvm-svn: 236123
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
Reg+%g0 is preferred to Reg+imm0 by the manual, and is what GCC produces.
Futhermore, reg+imm is invalid for the (not yet supported) "alternate
address space" instructions.
Differential Revision: http://reviews.llvm.org/D8753
llvm-svn: 236107
Summary:
The existing code was correct for 32-bit GPR's but not 64-bit GPR's. It now
accounts for both cases.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits, mohit.bhakkad, sagar
Differential Revision: http://reviews.llvm.org/D9337
llvm-svn: 236099
Summary:
Do the assemble-time shifts from createLShiftOri at the source, which groups all the shifting together, closer to the main logic path, and
store the results in concisely-named variables to improve code clarity.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8973
llvm-svn: 236096
We were trying to look through COPY instructions, but only to the next
instruction in a BB and incorrectly anyway. The cases where that would actually
be a good idea are rare enough (and not even tested!) that it's not worth
trying to get right.
rdar://20721342
llvm-svn: 236050
We don't need codegen-only intrinsic instructions for the vector forms of these instructions.
This makes the reciprocal estimate instruction lowering identical to how we handle normal
square roots: (V)SQRTPS / (V)SQRTPD.
No existing regression tests fail with this patch.
Differential Revision: http://reviews.llvm.org/D9301
llvm-svn: 236013
Fixes a crash with basically any OpenGL application using the radeonsi
driver.
Patch by: Michel Dänzer
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90176
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 236004
llc converts all feature strings to lower case, while the LLVM C API
does not, so we need a lower case alias in order to test this with llc.
llvm-svn: 236003
We need to track if an AddrSpaceCast expression was seen when
generating an MCExpr for a ConstantExpr. This change introduces a
custom lowerConstant method to the NVPTX asm printer that will create
NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode
the information that a given symbol needs to be casted to a generic
address.
llvm-svn: 236000
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).
In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment,
we should eventually share this data between the IR passes and the backend.
We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.
The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.
In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.
Differential Revision: http://reviews.llvm.org/D9325
llvm-svn: 235997
Summary: If the immediate is 0, the ORi is pointless.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8969
llvm-svn: 235990
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
Summary: The new name is more accurate with regard to the functionality.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8968
llvm-svn: 235984
Summary: This removes multiple calls to getReg() and saves us column space in the source file.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8924
llvm-svn: 235978
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
This matches other assemblers and is less unexpected (e.g. PR23227).
On ELF, I tried binutils gas v2.24 and nasm 2.10.09, and they both
agree on LShr. On COFF, I couldn't get my hands on an assembler yet,
so don't change the behavior. For now, don't change it on non-AArch64
Darwin either, as the other assembler is gas v1.38, which does an AShr.
llvm-svn: 235963
After legalization, scalar SETCC has an i32 result type on AArch64.
The i1 requirement seems too conservative, replace it with an assert.
This also means that we now can run after legalization. That should also
be fine, since the ops legalizer runs again after each combine, and
all types created all have the same sizes as the (legal) inputs.
Exposed by r235917; while there, robustize its tests (bsl also uses the
register it defines).
llvm-svn: 235922
When the setcc has f64 operands, we can't build a vector setcc mask
to feed a vselect, because f64 doesn't divide v3f32 evenly.
Just bail out when that happens.
llvm-svn: 235917
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.
However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.
Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.
This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.
Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.
llvm-svn: 235910
Use a loop instruction with a constant extender for a hardware
loop instruction that is too far away from the start of the loop.
This is cheaper than changing the SA register value.
Differential Revision: http://reviews.llvm.org/D9262
llvm-svn: 235882
Summary:
Changed the warning message to show the current value of $at, similar to what clang does for typedef's, and renamed warnIfAssemblerTemporary to a more descriptive name.
I also changed the type of variables which store registers from int to unsigned, updated the relevant test and tried to make the related comments clearer.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8479
llvm-svn: 235881
This reapplies r235194, which was reverted in r235495 because it was causing a
failure in our out-of-tree buildbots for MIPS. With the sign-extension patch
in r235718, this patch doesn't cause any problem any more.
llvm-svn: 235878
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.
The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.
Differential Revision: http://reviews.llvm.org/D9115
llvm-svn: 235837
Summary:
Perform integer extension only when the destination type is one of
i8, i16 & i32 and when the source type is i1, i8 or i16. For other
combinations we fall back to SelectionDAG.
This fixes the test MultiSource/Benchmarks/7zip that was failing in our
out-of-tree MIPS buildbots.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9243
llvm-svn: 235718
Summary:
Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()"
on aggregates of addrspacecasts.
Test Plan: addrspacecast-gvar.ll
Reviewers: eliben, jholewinski
Reviewed By: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9130
llvm-svn: 235689
Match binutils by supporting the optional register name prefix for new vector
registers ("vs" for VSX registers and "q" for QPX registers).
llvm-svn: 235665
Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint
field specified (non-zero). Unforunately, the syntax for this instruction is
special in that it differs for server vs. embedded cores:
dcbt ra, rb, th [server]
dcbt th, ra, rb [embedded]
where th can be omitted when it is 0. dcbtst is the same. Thus we need to play
games in the parser and the printer to flip the operands around on the embedded
cores. We'll use the server syntax as the default (binutils currently uses the
embedded form by default, but IBM is changing that).
We also stop marking dcbtst as having unmodeled side effects (this is not
necessary, it is just a hint like dcbt -- noticed by inspection, so no separate
test case).
llvm-svn: 235657
When the base register index of the vector plus the constant offset
was less than zero, we were passing the wrong base register to the indirect
addressing instruction.
In this case, we need to set the base register to v0 and then add
the computed (negative) index to m0.
llvm-svn: 235641
The order in which branches appear in ImmBranches is approximately their
order within the function body. By visiting later branches first, we reduce
the distance between earlier forward branches and their targets, making it
more likely that the cbn?z optimization, which can only apply to forward
branches, will succeed for those earlier branches.
Differential Revision: http://reviews.llvm.org/D9185
llvm-svn: 235640
In particular, this preserves the kill flag, which allows the Thumb2 cbn?z
optimization to be applied in cases where a branch has been re-created after
the live variables analysis pass, e.g. by the machine block placement pass.
This appears to be low risk; a number of other targets seem to already be
doing something similar, e.g. AArch64, PowerPC.
Differential Revision: http://reviews.llvm.org/D9184
llvm-svn: 235639
This allows the constant island pass to lower these branches to cbn?z
instructions, resulting in a shorter instruction sequence.
Differential Revision: http://reviews.llvm.org/D9183
llvm-svn: 235638
This makes it more likely that we can use the 16-bit push and pop instructions
on Thumb-2, saving around 4 bytes per function.
Differential Revision: http://reviews.llvm.org/D9165
llvm-svn: 235637
This appears to have been introduced back in r76698 as part of an unrelated
change. I can find no official ARM documentation stating that Thumb-2 functions
require 4-byte alignment; in fact, ARM documentation appears to contradict
this (see, e.g., ARM Architecture Reference Manual Thumb-2 Supplement,
section 2.6.1: "Thumb-2 enforces 16-bit alignment on all instructions.").
Also remove code that sets alignment for ARM functions, which is redundant
with code in the MachineFunction constructor, and remove the hidden
-arm-align-constant-islands flag, which has been enabled by default since
r146739 (Dec 2011) and has probably received sufficient testing by now.
Differential Revision: http://reviews.llvm.org/D9138
llvm-svn: 235636
Summary:
We pick this order because SeparateConstOffsetFromGEP may create more
opportunities for SLSR.
Test Plan:
reassociate-geps-and-slsr.ll
no performance regression on internal benchmarks
Reviewers: meheff
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D9230
llvm-svn: 235632
TableGen had been nicely generating code to print a number of instructions using
shorter aliases (and PowerPC has plenty of short mnemonics), but we were not
calling it. For some of the aliases we support in the parser, TableGen can't
infer the "inverse" alias relationship, so there is still more to do.
Thus, after some hours of updating test cases...
llvm-svn: 235616
Summary:
Constant stores of f16 vectors can create NvCast nodes from various
operand types to v4f16 or v8f16 depending on patterns in the stored
constants. This patch adds nvcast rules with v4f16 and v8f16 values.
AArchISelLowering::LowerBUILD_VECTOR has the details on which constant
patterns generate the nvcast nodes.
Reviewers: jmolloy, srhines, ab
Subscribers: rengolin, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D9201
llvm-svn: 235610
Summary:
Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32,
v8i8, v8i16 inputs to allow promotion of v4f16 results.
Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32,
and i64 vectors. Only missing tests are for v16i8 and v16i16 as the
shift operations are too complicated to write a proper check sequence.
The conversions from v4i64 to v4f16 do not depend on this patch - v4i64
is split and the conversion gets handled while lowering v2i64. I am
adding a test here for completeness.
Reviewers: aemerson, rengolin, ab, jmolloy, srhines
Subscribers: rengolin, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D9166
llvm-svn: 235609
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.
llvm-svn: 235608
This is a re-commit of r235101, which also fixes the problems with the previous patch:
- Switches with only a default case and non-fallthrough were handled incorrectly
- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.
> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649
llvm-svn: 235560
The CondOpt pass currently uses LiveIntervals to set the dead flag on a def. This patch uses MachineRegisterInfo::use_empty instead as that is equivalent to the def being dead.
This removes an instance of LiveIntervals in the pass manager pipeline and saves 3.8% of compile time on llc conpiled for AArch64.
Reviewed by Chad Rosier and Zhaoshi.
llvm-svn: 235532
This fixes a regression introduced at revision 218263.
On AVX, if we optimize for size, a splat build_vector of a load
is lowered into a VBROADCAST node. This is done even if the value type of the
splat build_vector node is v2i64.
Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two
extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast
where the scalar element comes from a loadi64/loadf64.
However, revision 218263 forgot to add an extra fallback pattern for the case
where we have a X86VBroadcast of a loadi64 with multiple uses.
This patch adds the missing tablegen pattern in X86InstrSSE.td.
This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel
doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern
to select v2i64 X86ISD::BROADCAST nodes.
llvm-svn: 235509
Enough concerns were raised that this optimization is pessimising some code patterns.
The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.
llvm-svn: 235491
X86 backend.
The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.
llvm-svn: 235483
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.
So instead of producing this:
pshufd $229, %xmm0, %xmm1 ## xmm1 = xmm0[1,1,2,3]
movd %xmm0, (%eax)
movd %xmm1, 4(%eax)
We can do:
movq %xmm0, (%eax)
This is a fix for the problem noted in D7296.
Differential Revision: http://reviews.llvm.org/D9134
llvm-svn: 235460
Summary:
With D9096 and D9101, there's no need to run DCE after SLSR and
SeparateConstOffsetFromGEP.
Test Plan: no regression
Reviewers: jholewinski, meheff
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9172
llvm-svn: 235415
There doesn't seem to be a reason to perform this target ISD node matching
in an DAGCombine, moving it to lowering fixes PR23296.
Differential Revision: http://reviews.llvm.org/D9137
llvm-svn: 235394