PC isn't allowed in the source operand of t2MOVr, so change the register class
to one without PC. SP handling is slightly trickier and changes depending on if
we're in ARMv8, so do that in checkTargetMatchPredicate.
Differential Revision: https://reviews.llvm.org/D30199
llvm-svn: 295732
There used to be a check in the IRTranslator that prevented us from having to
deal with atomic loads/stores. That check has been removed in r294993 and the
AArch64 backend was updated accordingly. This commit does the same thing for the
ARM backend.
In general, in the ARM backend we introduce fences during the atomic expand
pass, so we don't have to worry about atomics, *except* for the 32-bit ARMv8
target, which handles atomics more like AArch64. Since we don't want to worry
about that yet, just bail out of instruction selection if we find any atomic
loads.
llvm-svn: 295662
Removed the HasT2ExtractPack feature and replaced its references
with HasDSP. This then allows the Thumb2 extend instructions to be
selected for ARMv8M +dsp. These instruction descriptions have also
been refactored and more target tests have been added for their isel.
Differential Revision: https://reviews.llvm.org/D29623
llvm-svn: 295452
Add some asserts to make sure we're using the mappings that we think we're
using. This is to keep us from accidentally breaking functionality while moving
to TableGen'erated mappings.
llvm-svn: 295441
Start using the Subtarget to make decisions about what's legal. In particular,
we only mark floating point operations as legal if we have VFP2, which is
something we should've done from the very start.
llvm-svn: 295439
Since they're only used for passing around double precision floating point
values into the general purpose registers, we'll lower them to VMOVDRR and
VMOVRRD.
llvm-svn: 295310
For now we just mark them as legal all the time and let the other passes bail
out if they can't handle it. In the future, we'll want to move more of the
brains into the legalizer.
llvm-svn: 295300
For the hard float calling convention, we just use the D registers.
For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.
For pure soft float targets, we still bail out because we don't support the
libcalls yet.
llvm-svn: 295295
Summary:
The attached test case fails with "fatal error: error in backend:
misaligned pc-relative fixup value" as the jump table is misaligned.
The EmitAlignment existed already for ARM and Thumb-1 code, but was
missing for Thumb-2.
The test checks that the fatal error disappears when generating an obj
file, as well as checking the align directive is there when producing an
asm file.
Reviewers: rengolin, grosbach, t.p.northover, jmolloy, SjoerdMeijer, samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D29650
llvm-svn: 294950
We match a sequence of 3-4 instructions into a tTBB pseudo. One of our checks is that
a particular register in that sequence is killed (so it can be clobbered by the pseudo).
We weren't noticing if an errant MOV or other instruction had infiltrated the
sequence we were walking. If it had, and it defined the register we've already
identified as killed, it makes it live across the tBR_JT and thus unclobberable.
Notice this case and bail out.
llvm-svn: 294949
When generating a floating point comparison we currently unconditionally
generate VCMPE. This has the sideeffect of setting the cumulative Invalid
bit in FPSCR if any of the operands are QNaN.
It is expected that use of a relational predicate on a QNaN value should
raise Invalid. Quoting from the C standard:
The relational and equality operators support the usual mathematical
relationships between numeric values. For any ordered pair of numeric
values exactly one of relationships the less, greater, equal and is true.
Relational operators may raise the floating-point exception when argument
values are NaNs.
The standard doesn't explicitly state the expectation for equality operators,
but the implication and obvious expectation is that equality operators
should not raise Invalid on a QNaN input, as those predicates are wholly
defined on unordered inputs (to return not equal).
Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if
QNaN should raise Invalid, and pipe that through to TableGen.
llvm-svn: 294945
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.
Teach the cost model to consider f16 interleaved operations as
expensive. Otherwise, we are all but guaranteed to end up with
a large block of scalarized vector code.
llvm-svn: 294819
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.
Reject f16 interleaved accesses. If we try to emit the f16 intrinsics,
we'll just end up with a selection failure.
llvm-svn: 294818
In the encoding of system registers in the M-class MSR instruction the mask bits
should be 2 for registers that don't take a _<bits> qualifier (the instruction
is unpredictable otherwise), and should also be 2 if the register takes a
_<bits> qualifier but it's not present as no _<bits> is an alias for _nzcvq.
Differential Revision: https://reviews.llvm.org/D29828
llvm-svn: 294762
Gcc supports target armv7ve which is armv7-a with virtualization
extensions. This change adds support for this in llvm for gcc
compatibility.
Also remove redundant FeatureHWDiv, FeatureHWDivARM for a few models as
this is specified automatically by FeatureVirtualization.
Patch by Manoj Gupta.
Differential Revision: https://reviews.llvm.org/D29472
llvm-svn: 294661
Functions that have a dynamic alloca require a base register which is defined to
be X19 on AArch64 and r6 on ARM. We have defined the swifterror register to be
the same register. Use a different callee save register for swifterror instead:
X21 on AArch64
R8 on ARM
rdar://30433803
llvm-svn: 294551
We mark X0 as preserved by a call that passes the returned parameter.
x0 = ...
fun(x0) // no implicit def of x0
This no longer is valid if we pass the parameter in a different register then
the returned value as is the case with a swiftself parameter (passed in x20).
x20 = ...
fun(x20) // there should be an implict def of x8
rdar://30425845
llvm-svn: 294527
Add a register bank for floating point values and select simple instructions
using them (add, copies from GPR).
This assumes that the hardware can cope with a single precision add (VADDS)
instruction, so the legalizer will treat G_FADD as legal and the instruction
selector will refuse to select if the hardware doesn't support it. In the future
we'll want to be more careful about this, and legalize to libcalls if we have to
use soft float.
llvm-svn: 294442
When constructing global address literals while targeting the RWPI
relocation model. LLVM currently only uses literal pools. If MOVW/MOVT
instructions are available we can use these instead. Beside being more
efficient it allows -arm-execute-only to work with
-relocation-model=RWPI as well.
When we generate MOVW/MOVT for global addresses when targeting the RWPI
relocation model, we need to use base relative relocations. This patch
does the needed plumbing in MC to generate these for MOVW/MOVT.
Differential Revision: https://reviews.llvm.org/D29487
Change-Id: I446786e43a6f5aa9b6a5bb2cd216d60d41c7755d
llvm-svn: 294298
Summary:
This patch tablegen-erates the ARM register bank information so that the
static tables added in D27807 no longer need to be maintained.
Depends on D27338
Reviewers: t.p.northover, rovka, ab, qcolombet, aditya_nandakumar
Reviewed By: rovka
Subscribers: aemerson, rengolin, mgorny, dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D28567
llvm-svn: 294124
Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.
Reviewers: rengolin, jmolloy, rovka, olista01
Reviewed By: olista01
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D29020
llvm-svn: 294000
On ELF every section can have a corresponding section symbol. When in
an assembly file we have
.quad .text
the '.text' refers to that symbol.
The way we used to handle them is to leave .text an undefined symbol
until the very end when the object writer would map them to the
actual section symbol.
The problem with that is that anything before the end would see an
undefined symbol. This could result in bad diagnostics
(test/MC/AArch64/label-arithmetic-diags-elf.s), or incorrect results
when using the asm streamer (est/MC/Mips/expansion-jal-sym-pic.s).
Fixing this will also allow using the section symbol earlier for
setting sh_link of SHF_METADATA sections.
This patch includes a few hacks to avoid changing our behaviour when
handling conflicts between section symbols and other symbols. I
reported pr31850 to track that.
llvm-svn: 293936
This is the second in the series of patches to enable adding
of machine sched-models for ARM processors easier and compact.
This patch focuses on integer instructions and adds missing
sched definitions.
Reviewers: rovka, rengolin
Differential Revision: https://reviews.llvm.org/D29127
llvm-svn: 293935
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293893
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.
llvm-svn: 293889
Allow unknown types in TLI.getValueType, otherwise we get asserts for certain
types that we do not support yet (instead of returning that we don't support
them and falling through the normal error path).
llvm-svn: 293888
This patch moves some helper functions related to interleaved access
vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like
to use these functions in a follow-on patch that improves interleaved load and
store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was
already duplicated there and has been removed.
Differential Revision: https://reviews.llvm.org/D29398
llvm-svn: 293788
Add both cores to the target parser and TableGen. Test that eabi
attributes are set correctly for both cores. Additionally, test the
absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively.
Committed on behalf of Sanne Wouda.
Reviewers : rengolin, olista01.
Differential Revision: https://reviews.llvm.org/D29073
llvm-svn: 293761
The Requires class overrides the target requirements of an instruction,
rather than adding to them, so all ARM instructions need to include the
IsARM predicate when they have overwitten requirements.
This caused the swp and swpb instructions to be allowed in thumb mode
assembly, and the ARM encoding of CDP to be selected in codegen (which
is different for conditional instructions).
Differential Revision: https://reviews.llvm.org/D29283
llvm-svn: 293634
Support lowering AEABI TLS access (__aeabi_read_tp) with long calls.
This requires adjusting the call sequence to use an indirect call to get
full addressability.
Resolves PR31769!
llvm-svn: 293433
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
For reference:
- Public headers should just declare the dump() method but not use
LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MyClass::dump() {
// print stuff to dbgs()...
}
#endif
llvm-svn: 293359
The Windows on ARM target uses custom division for normal division as
the backend needs to insert division-by-zero checks. However, it is
designed to only handle non-vectorized division. ARM has custom
lowering for vectorized division as that can avoid loading registers
with the values and invoke a division routine for each one, preferring
to lower using NEON instructions. Fall back to the custom lowering for
the NEON instructions if we encounter a vectorized division.
Resolves PR31778!
llvm-svn: 293259
Summary:
This patch provides more staging for tail calls in XRay Arm32 . When the logging part of XRay is ready for tail calls, its support in the core part of XRay Arm32 may be as easy as changing the number passed to the handler from 1 to 2.
Coupled patch:
- https://reviews.llvm.org/D28674
Reviewers: dberris, rengolin
Reviewed By: dberris
Subscribers: llvm-commits, iid_iunknown, aemerson, rengolin, dberris
Differential Revision: https://reviews.llvm.org/D28673
llvm-svn: 293185
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293184
Add support for loading i1, i8 and i16 arguments from the stack, with or without
the ABI extension flags.
When the ABI extension flags are present, we load a 4-byte value, otherwise we
preserve the size of the load and let the instruction selector replace it with a
LDRB/LDRH. This generates the same thing as DAGISel.
Differential Revision: https://reviews.llvm.org/D27803
llvm-svn: 293163
Refactoring to remove duplications of this method.
New method getOperandsScalarizationOverhead() that looks at the present unique
operands and add extract costs for them. Old behaviour was to just add extract
costs for one operand of the type always, which still happens in
getArithmeticInstrCost() if no operands are provided by the caller.
This is a good start of improving on this, but there are more places
that can be improved by using getOperandsScalarizationOverhead().
Review: Hal Finkel
https://reviews.llvm.org/D29017
llvm-svn: 293155
Summary:
Lifetime extension wasn't triggered on the result of BuildMI because the
reference was non-const. However, instead of adding a const, I've
removed the reference entirely as RVO should kick in anyway.
Reviewers: rovka, bkramer
Reviewed By: bkramer
Subscribers: aemerson, rengolin, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D29124
llvm-svn: 293059
Add support for:
* i1 add
* i1 function arguments, if passed through registers
* i1 returns, with ABI signext/zeroext
Differential Revision: https://reviews.llvm.org/D27706
llvm-svn: 293035
At the moment, this means supporting the signext/zeroext attribute on the return
type of the function. For function arguments, signext/zeroext should be handled
by the caller, so there's nothing for us to do until we start lowering calls.
Note that this does not include support for other extensions (i8 to i16), those
will be added later.
Differential Revision: https://reviews.llvm.org/D27705
llvm-svn: 293034
This is a series of patches to enable adding of machine sched
models for ARM processors easier and compact. They define new
sched-readwrites for groups of ARM instructions. This has been
missing so far, and as a consequence, machine scheduler models
for individual sub-targets have tended to be larger than they
needed to be.
The current patch focuses on floating-point instructions.
Reviewers: Diana Picus (rovka), Renato Golin (rengolin)
Differential Revision: https://reviews.llvm.org/D28194
llvm-svn: 292825
We also want to optimise tests like this: return a*b == 0. The MULS
instruction is flag setting, so we don't need the CMP instruction but can
instead branch on the result of the MULS. The generated instructions sequence
for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the
boolean values resulting from the select instruction, but these MOVS
instructions are flag setting and were thus preventing this optimisation. Now
we first reorder and move the MULS to before the CMP and generate sequence
MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the
MULS and MOVS is safe to do because the subsequent MOVS instructions just set
the CPSR register and don't use it, i.e. the CPSR is dead.
Differential Revision: https://reviews.llvm.org/D27990
llvm-svn: 292608
Hunt down some of the places where we use bare addReg(0) or addImm(AL).addReg(0)
and replace with add(condCodeOp()) and add(predOps()). This should make it
easier to understand what those operands represent (without having to look at
the definition of the instruction that we're adding to).
Differential Revision: https://reviews.llvm.org/D27984
llvm-svn: 292587
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
llvm-svn: 292516
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.
Changes since first commit attempt:
* Added missing guards
* Added more missing guards
* Found and fixed a use-after-free bug involving Twine locals
Reviewers: t.p.northover, ab, rovka, qcolombet
Reviewed By: qcolombet
Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D27338
llvm-svn: 292478
A 64-bit relocation does not exist in 32-bit ARMELF. Report an error
instead of crashing.
PR23870
Patch by Sanne Wouda (sanwou01).
Differential Revision: https://reviews.llvm.org/D28851
llvm-svn: 292373
Summary:
In this function, virtual registers can be introduced (for example
through calls to emitThumbRegPlusImmInReg). doScavengeFrameVirtualRegs
will replace those virtual registers with concrete registers later on
in PrologEpilogInserter, which sets NoVRegs again.
This patch fixes the Codegen/Thumb/segmented-stacks.ll test case which
failed with expensive checks.
https://llvm.org/bugs/show_bug.cgi?id=27484
Reviewers: rnk, bkramer, olista01
Reviewed By: olista01
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D28829
llvm-svn: 292372
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.
Changes since last commit:
The new tablegen pass is now correctly guarded by LLVM_BUILD_GLOBAL_ISEL and
this should fix the buildbots however it may not be the whole fix. The previous
buildbot failures suggest there may be a memory bug lurking that I'm unable to
reproduce (including when using asan) or spot in the source. If they re-occur
on this commit then I'll need assistance from the bot owners to track it down.
Reviewers: t.p.northover, ab, rovka, qcolombet
Reviewed By: qcolombet
Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D27338
llvm-svn: 292367
Enable an ELFObjectFile to read the its arm build attributes to
produce a target triple with a specific ARM architecture.
llvm-objdump now uses this functionality to automatically produce
a more accurate target.
Differential Revision: https://reviews.llvm.org/D28769
llvm-svn: 292366
This reverts commit r292210, as it broke the Thumb buldbot with:
clang-5.0: error: the clang compiler does not support '-fxray-instrument
on thumbv7-unknown-linux-gnueabihf'.
llvm-svn: 292357
Some platforms (notably iOS) use a different calling convention for unnamed vs
named parameters in varargs functions, so we need to keep track of this
information when translating calls.
Since not many platforms are involved, the guts of the special handling is in
the ValueHandler class (with a generic implementation that should work for most
targets).
llvm-svn: 292283
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
llvm-svn: 292210
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.
Reviewers: t.p.northover, ab, rovka, qcolombet
Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D27338
llvm-svn: 292132
Summary:
Revert [ARM] Fix ubig32_t read in ARMAttributeParser
Now using support functions to read data instead of trying to
perform casts.
===========================================================
Revert [ARM] Enable objdump to construct triple for ARM
Now that The ARMAttributeParser has been moved into the library,
it has been modified so that it can parse the attributes without
printing them and stores them in a map. ELFObjectFile now queries
the attributes to fill out the architecture details of a provided
triple for 'arm' and 'thumb' targets. llvm-objdump uses this new
functionality.
Subscribers: llvm-commits, samparker, aemerson, mgorny
Differential Revision: https://reviews.llvm.org/D28683
llvm-svn: 291911
GCC changes the CC between the user-code and the builtins based on the
value of `-target` rather than `-mfloat-abi`. When a HF target is used,
the VFP variant of the AAPCS CC is used. Otherwise, the AAPCS variant
is used. In all cases, the AEABI functions use the AAPCS CC. Adjust
the calling convention based on the target.
Resolves PR30543!
llvm-svn: 291909
Now that The ARMAttributeParser has been moved into the library,
it has been modified so that it can parse the attributes without
printing them and stores them in a map. ELFObjectFile now queries
the attributes to fill out the architecture details of a provided
triple for 'arm' and 'thumb' targets. llvm-objdump uses this new
functionality.
Differential Revision: https://reviews.llvm.org/D28281
llvm-svn: 291898
For AddDefaultT1CC, we add a new helper t1CondCodeOp, which creates the
appropriate register operand. For AddNoT1CC, we use the existing condCodeOp
helper - we only had two uses of AddNoT1CC, so at this point it's probably not
worth having yet another helper just for them.
Differential Revision: https://reviews.llvm.org/D28603
llvm-svn: 291894
Replace all uses of AddDefaultCC with add(condCodeOp()).
The transformation has been done automatically with a custom tool based on Clang
AST Matchers + RefactoringTool.
Differential Revision: https://reviews.llvm.org/D28557
llvm-svn: 291893
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
Replace all uses of AddDefaultPred with MachineInstrBuilder::add(predOps()).
This makes the code building MachineInstrs more readable, because it allows us
to write code like:
MIB.addSomeOperand(blah)
.add(predOps())
.addAnotherOperand(blahblah)
instead of
AddDefaultPred(MIB.addSomeOperand(blah))
.addAnotherOperand(blahblah)
This commit also adds the predOps helper in the ARM backend, as well as the add
method taking a variable number of operands to the MachineInstrBuilder.
The transformation has been done mostly automatically with a custom tool based
on Clang AST Matchers + RefactoringTool.
Differential Revision: https://reviews.llvm.org/D28555
llvm-svn: 291890
Switch some additional library call setup to be table driven. This
makes it more immediately obvious what the library call looks like.
This is important for ARM since the calling conventions for the builtins
change based on the target/libcall name. NFC
llvm-svn: 291789
Summary:
The register bank is now entirely initialized in the constructor. However,
we still have the hardcoded number of register classes which will be
dealt with in the TableGen patch (D27338) since we do not have access
to this information to resolve this at this stage. The number of register
classes is known to the TRI and to TableGen but the RegisterBank
constructor is too early for the former and too late for the latter.
This will be fixed when the data is tablegen-erated.
Reviewers: t.p.northover, ab, rovka, qcolombet
Subscribers: aditya_nandakumar, kristof.beyls, vkalintiris, llvm-commits, dberris
Differential Revision: https://reviews.llvm.org/D27809
llvm-svn: 291770
Summary:
Refactor the RegisterBank initialization to use static data. This requires
GlobalISel implementations to rewrite calls to createRegisterBank() and
addRegBankCoverage() into a call to setRegBankData().
Out of tree targets can use diff 4 of D27807
(https://reviews.llvm.org/D27807?id=84117) to have addRegBankCoverage() dump
the register classes and other data that needs to be provided to
setRegBankData(). This is the method that was used to generate the static data
in this patch.
Tablegen-eration of this static data will follow after some refactoring.
Reviewers: t.p.northover, ab, rovka, qcolombet
Subscribers: aditya_nandakumar, kristof.beyls, vkalintiris, llvm-commits, dberris
Differential Revision: https://reviews.llvm.org/D27807
Differential Revision: https://reviews.llvm.org/D27808
llvm-svn: 291768
The new matchers work after legalization to make them simpler, and to avoid
blocking other optimizations.
Differential Revision: https://reviews.llvm.org/D27779
llvm-svn: 291693
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.
Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.
Differential Revision: https://reviews.llvm.org/D27518
llvm-svn: 291106
Summary:
No need to have this per-architecture. While there, unify 32-bit ARM's
behaviour with what changed elsewhere and start function names lowercase
as per the coding standards. Individual entry emission code goes to the
entry's own class.
Fully tested on amd64, cross-builds on both ARMs and PowerPC.
Reviewers: dberris
Subscribers: aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D28209
llvm-svn: 290858
See https://reviews.llvm.org/D6678 for the history of
isExtractSubvectorCheap. Essentially the same considerations apply
to ARM.
This temporarily breaks the formation of vpadd/vpaddl in certain cases;
AddCombineToVPADDL essentially assumes that we won't form VUZP shuffles.
See https://reviews.llvm.org/D27779 for followup fix.
Differential Revision: https://reviews.llvm.org/D27774
llvm-svn: 290198
This allows lowering i8 and i16 arguments if they can fit in the registers. Note
that the lowering is incomplete - ABI extensions are handled in a subsequent
patch.
(Last part of)
Differential Revision: https://reviews.llvm.org/D27704
llvm-svn: 290106
Teach the instruction selector and legalizer that it's ok to have adds with 8 or
16-bit integers.
This is the second part of https://reviews.llvm.org/D27704
llvm-svn: 290105
Teach the instruction selector that it's ok to copy small values from physical
registers.
First part of https://reviews.llvm.org/D27704
llvm-svn: 290104
This adds support for lowering more than 4 arguments (although still i32 only).
It uses the handleAssignments / ValueHandler infrastructure extracted from
the AArch64 backend in r288658.
Differential Revision: https://reviews.llvm.org/D27195
llvm-svn: 290098
Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit
scalars only). This will be useful for functions that need to pass arguments on
the stack.
First part of https://reviews.llvm.org/D27195.
llvm-svn: 290096
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.
Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).
Recommiting with fix to avoid forming vld1dup if the type of the load
doesn't match the type of the vdup (see
https://llvm.org/bugs/show_bug.cgi?id=31404).
Differential Revision: https://reviews.llvm.org/D27694
llvm-svn: 289972
Add the minimal support necessary to select a function that returns the sum of
two i32 values.
This includes some support for argument/return lowering of i32 values through
registers, as well as the handling of copy and add instructions throughout the
GlobalISel pipeline.
Differential Revision: https://reviews.llvm.org/D26677
llvm-svn: 289940
Add two public methods to ARMTargetLowering: CCAssignFnForCall and
CCAssignFnForReturn, which are just calling the already existing private method
CCAssignFnForNode. These will come in handy for GlobalISel on ARM.
We also replace all calls to CCAssignFnForNode in ARMISelLowering.cpp, because
the new methods are friendlier to the reader.
llvm-svn: 289932
MachineLegalizer used to be the name of both the class and the member,
causing GCC errors. r276522 fixed that by renaming the member to just
'Legalizer'. The 'class' workaround isn't necessary anymore; drop it.
llvm-svn: 289848
This is essentially a recommit of r285893, but with a correctness fix. The
problem of the original commit was that this:
bic r5, r7, #31
cbz r5, .LBB2_10
got rewritten into:
lsrs r5, r7, #5
beq .LBB2_10
The result in destination register r5 is not the same and this is incorrect
when r5 is not dead. So this fix includes checking the uses of the AND
destination register. And also, compared to the original commit, some regression
tests didn't need changing anymore because of this extra check.
For completeness, this was the original commit message:
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more
efficient instruction selection if the bitmask is one consecutive sequence of
set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and
set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and
set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit
into the sign bit with one LSLS and change the condition query from NE/EQ to
MI/PL (we could also implement this by shifting into the carry bit and
branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower
zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two
16-bit instructions but can elide the CMP and doesn't require materializing a
complex immediate, so is also a win.
Differential Revision: https://reviews.llvm.org/D27761
llvm-svn: 289794
This implements execute-only support for ARM code generation, which
prevents the compiler from generating data accesses to code sections.
The following changes are involved:
* Add the CodeGen option "-arm-execute-only" to the ARM code generator.
* Add the clang flag "-mexecute-only" as well as the GCC-compatible
alias "-mpure-code" to enable this option.
* When enabled, literal pools are replaced with MOVW/MOVT instructions,
with VMOV used in addition for floating-point literals. As the MOVT
instruction is required, execute-only support is only available in
Thumb mode for targets supporting ARMv8-M baseline or Thumb2.
* Jump tables are placed in data sections when in execute-only mode.
* The execute-only text section is assigned section ID 0, and is
marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'.
This also overrides selection of ELF sections for globals.
llvm-svn: 289784
Given that INSERT_VECTOR_ELT operates on D registers anyway, combining
64-bit vectors into a 128-bit vector is basically free. Therefore, try
to split BUILD_VECTOR nodes before giving up and lowering them to a series
of INSERT_VECTOR_ELT instructions. Sometimes this allows dramatically
better lowerings; see testcases for examples. Inspired by similar code
in the x86 backend for AVX.
Differential Revision: https://reviews.llvm.org/D27624
llvm-svn: 289706
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.
Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).
Differential Revision: https://reviews.llvm.org/D27694
llvm-svn: 289703
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.
Differential Revision: https://reviews.llvm.org/D26671
llvm-svn: 289647
Summary:
This patch aims to generalize matching of the strided store accesses to more general masks.
The more general rule is to have consecutive accesses based on the stride:
[x, y, ... z, x+1, y+1, ...z+1, x+2, y+2, ...z+2, ...]
All elements in the masks need not form a contiguous space, there may be gaps.
As before, undefs are allowed and filled in with adjacent element loads.
Reviewers: HaoLiu, mssimpso
Subscribers: mkuper, delena, llvm-commits
Differential Revision: https://reviews.llvm.org/D23646
llvm-svn: 289573
When we see a non flag-setting instruction for which only the flag-setting
version is available in Thumb1, we should give a better error message than
"invalid instruction".
Differential Revision: https://reviews.llvm.org/D27414
llvm-svn: 288805
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
Summary:
This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion.
Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality.
Example of the broken C++ code:
```
std::atomic<long long> at(2);
long long ll = 1;
std::atomic_compare_exchange_strong(&at, &ll, 3);
```
Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3.
The patch replaces SBC with CMPEQ.
Reviewers: t.p.northover
Subscribers: aemerson, rengolin, llvm-commits, asl
Differential Revision: https://reviews.llvm.org/D27315
llvm-svn: 288433
Recommitting r288293 with some extra fixes for GlobalISel code.
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.
This is a necessary step to have machine module passes work properly.
Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
because the available MachineFunction pointers are const, but the code
wants to call tidyLandingPads() in between
(markFunctionEnd()/endFunction()).
Differential Revision: https://reviews.llvm.org/D27227
llvm-svn: 288405
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.
This is a necessary step to have machine module passes work properly.
Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
because the available MachineFunction pointers are const, but the code
wants to call tidyLandingPads() in between
(markFunctionEnd()/endFunction()).
Differential Revision: https://reviews.llvm.org/D27227
llvm-svn: 288293
This is per function data so it is better kept at the function instead
of the module.
This is a necessary step to have machine module passes work properly.
Differential Revision: https://reviews.llvm.org/D27185
llvm-svn: 288291
No test case necessary as the problematic condition is checked with the
newly introduced assertAllSuperRegsMarked() function.
Differential Revision: https://reviews.llvm.org/D26648
llvm-svn: 288277
No-one actually had a mangler handy when calling this function, and
getSymbol itself went most of the way towards getting its own mangler
(with a local TLOF variable) so forcing all callers to supply one was
just extra complication.
llvm-svn: 287645
Summary:
Variadic functions can be treated in the same way as normal functions
with respect to the number and types of parameters.
Reviewers: grosbach, olista01, t.p.northover, rengolin
Subscribers: javed.absar, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D26748
llvm-svn: 287219
One half of the shifts obviously needed conditional selection based on whether
the shift amount is more than 32-bits, but leaving the other half as the
natural shift isn't acceptable either: it's undefined behaviour to shift a
32-bit value by more than 31.
llvm-svn: 287149
Move some code inside the proper 'if' block to make sure it is only run once,
when the subtarget is first created. Things can still break if we use different
ARM target machines or if we have functions with different 'target-cpu' or
'target-features', we should fix that too in the future.
llvm-svn: 286974
This patch adds the Sched Machine Model for Cortex-R52.
Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.
Reviewers: rengolin, jmolloy
Differential Revision: https://reviews.llvm.org/D26500
llvm-svn: 286949
For example we were producing
push {r8, r10, r11, r4, r5, r7, lr}
This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.
Fixed usage of std::sort so that we (hopefully) use instantiations that
actually exist in GCC 4.8.
llvm-svn: 286881
For example we were producing
push {r8, r10, r11, r4, r5, r7, lr}
This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.
llvm-svn: 286866
The version of this instruction with the .w suffix already correctly accepts
this, but the alias without the .w did not.
Differential Revision: https://reviews.llvm.org/D26499
llvm-svn: 286446
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.
If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.
In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.
llvm-svn: 286107
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI. Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.
Addresses PR30825!
llvm-svn: 286082
Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls.
Reviewers: rengolin, efriedma, t.p.northover, jmolloy
Subscribers: efriedma, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D26120
llvm-svn: 285969
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 285893
As it stands, the OperandMatchResultTy is only included in the generated
header if there is custom operand parsing. However, almost all backends
make use of MatchOperand_Success and friends from OperandMatchResultTy for
e.g. parseRegister. This is a pain when starting an AsmParser for a new
backend that doesn't yet have custom operand parsing. Move the enum to
MCTargetAsmParser.h.
This patch is a prerequisite for D23563
Differential Revision: https://reviews.llvm.org/D23496
llvm-svn: 285705
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
llvm-svn: 285690
Generate the slowest possible codepath for noopt CodeGen. Even trying to be
clever with the negated jump can cause out-of-range jumps. Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible. This re-enables the
previous optimization as well as hopefully gives us working code in all cases.
Addresses PR30356!
llvm-svn: 285649
The Windows ARM target expects the compiler to emit a division-by-zero check.
The check would use the form of:
cmp r?, #0
cbz .Ltrap
b .Lbody
.Lbody:
...
.Ltrap:
udf #249 @ __brkdiv0
This works great most of the time. However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).
Since this is a matter of correctness, possibly pay a small penalty instead. We
now form this slightly differently:
cbnz .Lbody
udf #249 @ __brkdiv0
.Lbody:
...
The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).
The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.
Addresses PR30532!
llvm-svn: 285312
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Patch by Vadzim Dambrouski.
Differential Revision: https://reviews.llvm.org/D25890
llvm-svn: 285278
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.
Should fix an issue where we were turning something like:
push {r8, r4, r7, lr}
sub sp, #24
into nonsense like:
push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr}
llvm-svn: 285232
Passing a MachineFunction as argument is more natural and avoids an
unnecessary round-trip through the logic determining the correct
Subtarget because MachineFunction already has a reference anyway.
llvm-svn: 285039
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
No functionality change intended.
llvm-svn: 284721
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
llvm-svn: 284580
This renames the function for checking FP function attribute values and also
adds more build attribute tests (which are in separate files because build
attributes are set per file).
Differential Revision: https://reviews.llvm.org/D25625
llvm-svn: 284571
The custom lowering is pretty straightforward: basically, just AND
together the two halves of a <4 x i32> compare.
Differential Revision: https://reviews.llvm.org/D25713
llvm-svn: 284536
This patch assigns cost of the scaling used in addressing for Cortex-R52.
On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.
Differential Revision: http://reviews.llvm.org/D25670
Reviewer: jmolloy
llvm-svn: 284460
This patch adds simplified support for tail calls on ARM with XRay instrumentation.
Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program
#include <cstdio>
#include <cassert>
#include <xray/xray_interface.h>
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
std::printf("In fC()\n");
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
std::printf("In fB()\n");
fC();
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
std::printf("In fA()\n");
fB();
}
// Avoid infinite recursion in case the logging function is instrumented (so calls logging
// function again).
[[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
{
printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
}
int main(int argc, char* argv[]) {
__xray_set_handler(simplyPrint);
printf("Patching...\n");
__xray_patch();
fA();
printf("Unpatching...\n");
__xray_unpatch();
fA();
return 0;
}
gives the following output:
Patching...
XRay: functionId=3 type=0.
In fA()
XRay: functionId=3 type=1.
XRay: functionId=2 type=0.
In fB()
XRay: functionId=2 type=1.
XRay: functionId=1 type=0.
XRay: functionId=1 type=1.
In fC()
Unpatching...
In fA()
In fB()
In fC()
So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().
Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).
Differential Revision: https://reviews.llvm.org/D25030
llvm-svn: 284456
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.
For instance:
LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]
Above, (1) takes less cycles than (2).
By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.
Differential Revision: http://reviews.llvm.org/D24857
Reviewers: jmolloy, rengolin
llvm-svn: 284127
Reverts r283938 to reinstate r283867 with a fix.
The original change had an ArrayRef referring to a destroyed temporary
initializer list. Use plain C arrays instead.
llvm-svn: 283942
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.
This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.
In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.
We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.
There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.
In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.
Differential Revision: https://reviews.llvm.org/D24228
llvm-svn: 283867
Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as
clobbering all registers, including floating-point registers that may
not be present on the target. This is technically true, as we could get
linked against code that does use the FP registers, but that will not
actually work, as the soft-float code cannot save and restore the FP
registers. SjLj exception handling can only work correctly if either all
or none of the code is built for a target with FP registers. Therefore,
we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a
soft-float target, it is only going to be linked against other
soft-float code, and so only clobbers the general-purpose registers.
This allows us to check that no non-savable registers are clobbered when
generating the prologue/epilogue.
Differential Revision: https://reviews.llvm.org/D25180
llvm-svn: 283866
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.
The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.
Differential Revision: https://reviews.llvm.org/D25281
llvm-svn: 283763
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:
va_start(ValueArgs, Desc);
with Desc being a StringRef.
Differential Revision: https://reviews.llvm.org/D25342
llvm-svn: 283671
Reapplying r283383 after revert in r283442. The additional fix
is a getting rid of a stray space in a function name, in the
refactoring part of the commit.
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.
The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).
Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.
Differential Revision: https://reviews.llvm.org/D25332
llvm-svn: 283550
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.
Differential Revision: https://reviews.llvm.org/D24462
llvm-svn: 283530
This reverts commit r283383 because it broke some of the bots:
undefined reference to ` __aeabi_uldivmod'
It affected (at least) clang-cmake-armv7-a15-selfhost,
clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt.
llvm-svn: 283442
Global variables are GlobalValues, so they have explicit alignment. Querying
DataLayout for the alignment was incorrect.
Testcase added.
llvm-svn: 283423
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.
The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).
Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.
Differential Revision: https://reviews.llvm.org/D24076
llvm-svn: 283383
This is not a valid encoding - these instructions cannot do PC-relative addressing.
The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it.
llvm-svn: 283323
This fixes the inconsistency of the fp denormal option names: in LLVM this was
DenormalType, but in Clang this is DenormalMode which seems better.
Differential Revision: https://reviews.llvm.org/D24906
llvm-svn: 283192
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.
Patch mostly by Pablo Barrio.
Differential Revision: https://reviews.llvm.org/D25077
llvm-svn: 283098
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.
It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.
llvm-svn: 282387
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.
It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.
llvm-svn: 282241
The initial mapping symbol state is set from the triple, but we only checked
for the little-endian thumb triple, so could end up with an ARM mapping symbol
for big-endian thumb.
Differential Revision: https://reviews.llvm.org/D24553
llvm-svn: 281894
ldm and stm instructions always require 4-byte alignment on the pointer, but we
weren't checking this before trying to reduce code-size by replacing a
post-indexed load/store with them. Unfortunately, we were also dropping this
incormation in DAG ISel too, but that's easy enough to fix.
llvm-svn: 281893
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:
https://reviews.llvm.org/D23932 (Clang test)
https://reviews.llvm.org/D23933 (compiler-rt)
Differential Revision: https://reviews.llvm.org/D23931
llvm-svn: 281878
Recommitting after fixing AsmParser initialization and X86 inline asm
error cleanup.
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.
Reviewers: rnk, majnemer
Subscribers: aemerson, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D24047
llvm-svn: 281762
(and the same for SREM)
This was causing buildbot failures earlier (time outs in the LNT suite).
However, we haven't been able to reproduce this and are suspecting this
was caused by another (reverted) patch.
llvm-svn: 281719
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.
It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.
llvm-svn: 281715
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.
llvm-svn: 281604
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
llvm-svn: 281484
Recommitting after fixing AsmParser Initialization.
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.
Reviewers: rnk, majnemer
Subscribers: aemerson, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D24047
llvm-svn: 281336
Before, only Thumb functions were marked as ".code 16". These
".code x" directives are effective until the next directive of its
kind is encountered. Therefore, in code with interleaved ARM and
Thumb functions, it was possible to declare a function as ARM and
end up with a Thumb function after assembly. A test has been added.
An existing test has also been fixed to take this change into
account.
Reviewers: aschwaighofer, t.p.northover, jmolloy, rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D24337
llvm-svn: 281324
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 281323
The changes made in r269352, r269353 and r269354 to support the
transformation of the ldr rd,=immediate to mov introduced a regression
from 3.8 (ldr.w rd, =immediate) not supported.
This change puts support back in for ldr.w by means of a t2InstAlias for
the .w form. The .w is ignored in ARM state and propagated to the ldr in
Thumb2.
llvm-svn: 281319
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
llvm-svn: 281314
descriptions now tag add instructions, and the Hexagon backend is using this to
identify loop induction statements.
Patch by Sam Parker and Sjoerd Meijer.
Differential Revision: https://reviews.llvm.org/D23601
llvm-svn: 281304
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.
Reviewers: rnk, majnemer
Subscribers: aemerson, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D24047
llvm-svn: 281249
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 281215
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
llvm-svn: 281213
Now that MachineBasicBlock::reverse_instr_iterator knows when it's at
the end (since r281168 and r281170), implement
MachineBasicBlock::reverse_iterator directly on top of an
ilist::reverse_iterator by adding an IsReverse template parameter to
MachineInstrBundleIterator. This replaces another hard-to-reason-about
use of std::reverse_iterator on list iterators, matching the changes for
ilist::reverse_iterator from r280032 (see the "out of scope" section at
the end of that commit message). MachineBasicBlock::reverse_iterator
now has a handle to the current node and has obvious invalidation
semantics.
r280032 has a more detailed explanation of how list-style reverse
iterators (invalidated when the pointed-at node is deleted) are
different from vector-style reverse iterators like std::reverse_iterator
(invalidated on every operation). A great motivating example is this
commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp.
Note: If your out-of-tree backend deletes instructions while iterating
on a MachineBasicBlock::reverse_iterator or converts between
MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator,
you'll need to update your code in similar ways to r280032. The
following table might help:
[Old] ==> [New]
delete &*RI, RE = end() delete &*RI++
RI->erase(), RE = end() RI++->erase()
reverse_iterator(I) std::prev(I).getReverse()
reverse_iterator(I) ++I.getReverse()
--reverse_iterator(I) I.getReverse()
reverse_iterator(std::next(I)) I.getReverse()
RI.base() std::prev(RI).getReverse()
RI.base() ++RI.getReverse()
--RI.base() RI.getReverse()
std::next(RI).base() RI.getReverse()
(For more details, have a look at r280032.)
llvm-svn: 281172
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
Move the target specific setup into the target specific lowering setup. As
pointed out by Anton, the initial change was moving this too high up the stack
resulting in a violation of the layering (the target generic code path setup
target specific bits). Sink this into the ARM specific setup. NFC.
llvm-svn: 281088
The CMPZ #0 disappears during peepholing, leaving just a tADDi3, tADDi8 or t2ADDri. This avoids having to materialize the expensive negative constant in Thumb-1, and allows a shrinking from a 32-bit CMN to a 16-bit ADDS in Thumb-2.
llvm-svn: 281040
This avoids us doing a completely unneeded "cmp r0, #0" after a flag-setting instruction if we only care about the Z or C flags.
Add LSL/LSR to the whitelist while we're here and add testing. This code could really do with a spring clean.
llvm-svn: 281027
And associated commits, as they broke the Thumb bots.
This reverts commit r280935.
This reverts commit r280891.
This reverts commit r280888.
llvm-svn: 280967
I mised the check that it had to support ARM to work. This commit tries
to fix that, to make sure we don't emit ARM code in Thumb-only mode.
llvm-svn: 280935
Materializing something like "-3" can be done as 2 instructions:
MOV r0, #3
MVN r0, r0
This has a cost of 2, not 3. It looks like we were already trying to detect this pattern in TII::getIntImmCost(), but were taking the complement of the zero-extended value instead of the sign-extended value which is unlikely to ever produce a number < 256.
There were no tests failing after changing this... :/
llvm-svn: 280928
This reverts commit r280808.
It is possible that this change results in an infinite loop. This
is causing timeouts in some tests on ARM, and a Chromebook bot is
failing.
llvm-svn: 280918
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:
1. https://reviews.llvm.org/D23932 (Clang test)
2. https://reviews.llvm.org/D23933 (compiler-rt)
Differential Revision: https://reviews.llvm.org/D23931
llvm-svn: 280888
Summary:
This saves a library call to __aeabi_uidivmod. However, the
processor must feature hardware division in order to benefit from
the transformation.
Reviewers: scott-0, jmolloy, compnerd, rengolin
Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D24133
llvm-svn: 280808
This is a Windows ARM specific issue. If the code path in the if conversion
ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up
with a bundle to ensure that the mov.w/mov.t pair is not split up. This is
normally fine, however, if the branch is also predicated, then we end up trying
to predicate the bundle.
For now, report a bundle as being unpredicatable. Although this is false, this
would trigger a failure case previously anyways, so this is no worse. That is,
there should not be any code which would previously have been if converted and
predicated which would not be now.
Under certain circumstances, it may be possible to "predicate the bundle". This
would require scanning all bundle instructions, and ensure that the bundle
contains only predicatable instructions, and converting the bundle into an IT
block sequence. If the bundle is larger than the maximal IT block length (4
instructions), it would require materializing multiple IT blocks from the single
bundle.
llvm-svn: 280689
types. This is the LLVM counterpart and it adds options that map onto FP
exceptions and denormal build attributes allowing better fp math library
selections.
Differential Revision: https://reviews.llvm.org/D24070
llvm-svn: 280246
Summary:
In fuctions that contained debug info but were empty otherwise,
the ARM load/store optimizer could abort. This was because
function MergeReturnIntoLDM handled the special case where a
Machine Basic BLock is empty by calling MBB.empty(). However, this
returns false in presence of debug info, although the function
should be considered empty in the eyes of the load/store optimizer.
This has been fixed by handling the case where searching through the
block finds only debug instructions.
Reviewers: rengolin, dexonsmith, llvm-commits, jmolloy
Subscribers: t.p.northover, aemerson, rengolin, samparker
Differential Revision: https://reviews.llvm.org/D23847
llvm-svn: 279820
Its existence is largely historical, apparently we tried to make ARM object
files look maybe-almost-possibly runnable by putting our best guess at the
actual value into relocated locations. Of course, the real linker then comes
along and can completely change things.
But it should only be there for word-sized and movw/movt relocations. It can't
be encoded in branch relocations, and I've seen it mess up validity
calculations twice in the last couple of weeks so the default is clearly problematic.
llvm-svn: 279773
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.
Differential Revision: http://reviews.llvm.org/D23850
llvm-svn: 279698
A branch-distance to a Thumb function shouldn't be forced to be odd for
CBZ/CBNZ instructions because (assuming it's within range), it's going to be a
valid, even offset.
llvm-svn: 279665
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.
We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.
To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.
We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.
Differential Revision: https://reviews.llvm.org/D23516
llvm-svn: 279506
This fixes the crash from PR29072, where the MachineBasicBlock::iterator
wasn't being properly checked against MachineBasicBlock::end() before
iterating. This was another bug exposed by the new
ilist::iterator::operator*() assertion from r279314.
This testcase is poor quality. bugpoint couldn't reduce any further,
and I haven't had time to dig into what's going on so I can't invent a
better one. I didn't even get good CHECK lines in: this is just a
crasher.
I'm committing anyway since this is a real crash with an obvious fix,
but I'll leave PR29072 open and ask an ARM maintainer to help improve
the testcase.
llvm-svn: 279391
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
llvm-svn: 278902
llvm::tryFoldSPUpdateIntoPushPop assumes its arguments are valid
MachineInstrs. Update ARMFrameLowering::emitPrologue to respect that;
when LastPush==end(), it can't possibly be a push instruction anyway.
llvm-svn: 278880
Summary:
Fix for the upper bound check that was causing a build failure.
Reviewers: olista01, rengolin, t.p.northover
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23501
llvm-svn: 278789
Summary:
The assembler currently does not check the branch target for CBZ/CBNZ
instructions, which only permit branching forwards with a positive offset. This
adds validation for the branch target to ensure negative PC-relative offsets are
not encoded into the instruction, whether specified as a literal or as an
assembler symbol.
Reviewers: rengolin, t.p.northover
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D23312
llvm-svn: 278788
This currently breaks the greendragon clang-stage1-configure-RA/ and
brotli. It is probably just uncovering a pre-existing problem. Reverting
temporarily to get the buildbots green again. A reduced testcase will
follow shortly.
This reverts commit r278659.
llvm-svn: 278711
Summary:
The assembler currently does not check the branch target for CBZ/CBNZ
instructions, which only permit branching forwards with a positive offset. This
adds validation for the branch target to ensure negative PC-relative offsets are
not encoded into the instruction, whether specified as a literal or as an
assembler symbol.
Reviewers: rengolin, t.p.northover
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D23312
llvm-svn: 278659
Created a Thumb2 predicated pattern matcher that uses Thumb2 and
HasT2ExtractPack and used it to redefine the patterns for sxta{b|h}
and uxta{b|h}. Also used the similar patterns to fill in isel pattern
gaps for the corresponding instructions in the ARM backend.
The patch is mainly changes to tests since most of this functionality
appears not to have been tested.
Differential Revision: https://reviews.llvm.org/D23273
llvm-svn: 278207
This patch adds support for some new relocation models to the ARM
backend:
* Read-only position independence (ROPI): Code and read-only data is accessed
PC-relative. The offsets between all code and RO data sections are known at
static link time. This does not affect read-write data.
* Read-write position independence (RWPI): Read-write data is accessed relative
to the static base register (r9). The offsets between all writeable data
sections are known at static link time. This does not affect read-only data.
These two modes are independent (they specify how different objects
should be addressed), so they can be used individually or together. They
are otherwise the same as the "static" relocation model, and are not
compatible with SysV-style PIC using a global offset table.
These modes are normally used by bare-metal systems or systems with
small real-time operating systems. They are designed to avoid the need
for a dynamic linker, the only initialisation required is setting r9 to
an appropriate value for RWPI code.
I have only added support to SelectionDAG, not FastISel, because
FastISel is currently disabled for bare-metal targets where these modes
would be used.
Differential Revision: https://reviews.llvm.org/D23195
llvm-svn: 278015
Summary: Thumb2 supports encoding immediates with specific patterns into mov.w by splatting the low 8 bits into other bytes.
I'm resubmitting this patch. The test case in the original commit
r277610 does not specify triple, so builds with differnt default triple
will have different output.
This patch fixed trile as thumb-darwin-apple.
Reviewers: john.brawn, jmolloy, bruno
Subscribers: jmolloy, aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D23090
llvm-svn: 277865
Summary: Thumb2 supports encoding immediates with specific patterns into mov.w by splatting the low 8 bits into other bytes.
Reviewers: john.brawn, jmolloy
Subscribers: jmolloy, aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D23090
llvm-svn: 277610
In this particular example we wouldn't want the smmls anyway (the value is
actually unused), but in general smmls does not provide the required flags
register so if that SUBE result is used we can't replace it.
llvm-svn: 277541
Added (sra (shl x, 16), 16) to the sext_16_node PatLeaf for ARM to
simplify some pattern matching. This has allowed several patterns
for smul* and smla* to be removed as well as making it easier to add
the matching for the corresponding instructions for Thumb2 targets.
Also added two Pat classes that are predicated on Thumb2 with the
hasDSP flag and UseMulOps flags. Updated the smul codegen test with
the wider range of patterns plus the ThumbV6 and ThumbV6T2 targets.
Differential Revision: https://reviews.llvm.org/D22908
llvm-svn: 277450
Summary:
Commit 276701 requires that targets have the DSP extensions to use
certain saturating instructions. This requires some corrections.
For ARM ISA the instructions in question are available in all v6*
architectures.
For Thumb2, the instructions in question are available from v6T2.
SSAT and USAT are part of the base architecture while SSAT16 and
USAT16 require the DSP extensions.
Reviewers: rengolin
Subscribers: aemerson, rengolin, samparker, llvm-commits
Differential Revision: https://reviews.llvm.org/D23010
llvm-svn: 277439
Summary:
The MOV/MOVT instructions being chosen for struct_byval predicates was
conditional only on Thumb2, resulting in an ARM MOV/MOVT instruction
being incorrectly emitted in Thumb1 mode. This is especially apparent
with v8-m.base targets. This patch ensures that Thumb instructions are
emitted in both Thumb modes.
Reviewers: rengolin, t.p.northover
Subscribers: llvm-commits, aemerson, rengolin
Differential Revision: https://reviews.llvm.org/D22865
llvm-svn: 277128
Currently, for ARMCOFFMCAsmInfoMicrosoft, no comment character is set, thus the
idefault, '#', is used.
The hash character doesn't work as comment character in ARM assembly, since '#'
is used for immediate values.
The comment character is set to ';', which is the comment character used by MS
armasm.exe. (The microsoft armasm.exe uses a different directive syntax than
what LLVM currently supports though, similar to ARM's armasm.)
This allows inline assembly with immediate constants to be built (and brings the
assembly output from clang -S closer to being possible to assemble).
A test is added that verifies that ';' is correctly interpreted as comments in
this mode, and verifies that assembling code that includes literal constants
with a '#' works.
Patch by Martin Storsjö.
llvm-svn: 276859
- More informative message when extension name is not an identifier token.
- Stop parsing directive if extension is unknown (avoid duplicate error
messages).
- Report unsupported extensions with a source location, rather than
report_fatal_error.
Differential Revision: https://reviews.llvm.org/D22806
llvm-svn: 276748
This option, compatible with gas's -mimplicit-it, controls the
generation/checking of implicit IT blocks in ARM/Thumb assembly.
This option allows two behaviours that were not possible before:
- When in ARM mode, emit a warning when assembling a conditional
instruction that is not in an IT block. This is enabled with
-mimplicit-it=never and -mimplicit-it=thumb.
- When in Thumb mode, automatically generate IT instructions when an
instruction with a condition code appears outside of an IT block. This
is enabled with -mimplicit-it=thumb and -mimplicit-it=always.
The default option is -mimplicit-it=arm, which matches the existing
behaviour (allow conditional ARM instructions outside IT blocks without
warning, and error if a conditional Thumb instruction is outside an IT
block).
The general strategy for generating IT blocks in Thumb mode is to keep a
small list of instructions which should be in the IT block, and only
emit them when we encounter something in the input which means we cannot
continue the block. This could be caused by:
- A non-predicable instruction
- An instruction with a condition not compatible with the IT block
- The IT block already contains 4 instructions
- A branch-like instruction (including ALU instructions with the PC as
the destination), which cannot appear in the middle of an IT block
- A label (branching into an IT block is not legal)
- A change of section, architecture, ISA, etc
- The end of the assembly file.
Some of these, such as change of section and end of file, are parsed
outside of the ARM asm parser, so I've added a new virtual function to
AsmParser to ensure any previously-parsed instructions have been
emitted. The ARM implementation of this flushes the currently pending IT
block.
We now have to try instruction matching up to 3 times, because we cannot
know if the current IT block is valid before matching, and instruction
matching changes depending on the IT block state (due to the 16-bit ALU
instructions, which set the flags iff not in an IT block). In the common
case of not having an open implicit IT block and the instruction being
matched not needing one, we still only have to run the matcher once.
I've removed the ITState.FirstCond variable, because it does not store
any information that isn't already represented by CurPosition. I've also
updated the comment on CurPosition to accurately describe it's meaning
(which this patch doesn't change).
Differential Revision: https://reviews.llvm.org/D22760
llvm-svn: 276747
The saturation instructions appeared in v6T2, with DSP extensions, but they
were being accepted / generated on any, with the new introduction of the
saturation detection in the back-end. This commit restricts the usage to
DSP-enable only cores.
Fixes PR28607.
llvm-svn: 276701