including the amended (no UB anymore) fix for adding/subtracting -2147483648.
This reverts r298328 "[ARM] Revert r297443 and r297820."
and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648""
llvm-svn: 298417
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
Summary:
To support negative immediates for certain arithmetic instructions, the
instruction is converted to the inverse instruction with a negated (or inverted)
immediate. For example, "ADD r0, r1, #FFFFFFFF" cannot be encoded as an ADD
instruction. However, "SUB r0, r1, #1" is equivalent.
These conversions are different from instruction aliases. An alias maps
several assembler instructions onto one encoding. A conversion, however, maps
an *invalid* instruction--e.g. with an immediate that cannot be represented in
the encoding--to a different (but equivalent) instruction.
Several instructions with negative immediates were being converted already, but
this was not systematically tested, nor did it cover all instructions.
This patch implements all possible substitutions for ARM, Thumb1 and
Thumb2 assembler and adds tests. It also adds a feature flag
(-mattr=+no-neg-immediates) to turn these substitutions off. This is
helpful for users who want their code to assemble to exactly what they
wrote.
Reviewers: t.p.northover, rovka, samparker, javed.absar, peter.smith, rengolin
Reviewed By: javed.absar
Subscribers: aadg, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D30571
llvm-svn: 298380
The glueless lowering of addc/adde in Thumb1 has known serious
miscompiles (see https://reviews.llvm.org/D31081), and r297820
causes an infinite loop for certain constructs. It's not
clear when they will be fixed, so let's just take them out
of the tree for now.
(I resolved a small conflict with r297453.)
llvm-svn: 298328
The special case of zero sized values was previously not handled correctly.
This patch handles this by not promoting if the size is zero.
Patch by Tim Neumann.
Differential Revision: https://reviews.llvm.org/D31116
llvm-svn: 298320
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.
Differential Revision: https://reviews.llvm.org/D31039
llvm-svn: 298254
Let targets specialize the pass with the register class so we can get a
parameterless default constructor and can put the pass into the pass
registry to enable testing with -run-pass=.
llvm-svn: 298184
Fixing triple format in the tests added for the branch label fix for Thumb
Targets. Also recommitting previously approved patch, see
https://reviews.llvm.org/D30943.
Reviewed by: samparker
Differential Revision: https://reviews.llvm.org/D30987
llvm-svn: 298056
This allows the optimization to rearrange loads and stores more
aggressively. This doesn't really affect performance, but it helps
codesize.
Differential Revision: https://reviews.llvm.org/D30839
llvm-svn: 298021
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.
In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D31052
llvm-svn: 298010
In fact this default implementation should be the only implementation,
keep it virtual for now to accomodate targets that don't model flags
correctly.
Differential Revision: https://reviews.llvm.org/D30747
llvm-svn: 297980
If we got unlucky with register allocation and actual constpool placement, we
could end up producing a tTBB_JT with an index that's already been clobbered.
Technically, we might be able to fix this situation up with a MOV, but I think
the constant islands pass is complex enough without having to deal with more
weird edge-cases.
llvm-svn: 297871
Different MCInstrAnalysis classes for arm and thumb mode, each with
their own evaluateBranch implementation. I added a test case and
fixed the coff-relocations test to use '<label>:' rather than
'<label>' in the CHECK-LABEL entries, since the ones without the
colon would match branch targets. Might be worth noticing that
llvm-objdump does not lookup the relocation and thus assigns it a
target depending on the encoded immediate which #0, so it thinks it
branches to the next instruction.
Committed on behalf of Andre Vieira (avieira).
Differential Revision: https://reviews.llvm.org/D30943
llvm-svn: 297821
Enable the selection of the 64-bit signed multiply accumulate
instructions which operate on 16-bit operands. These are enabled for
ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb
architectures.
Differential Revision: https://reviews.llvm.org/D30044
llvm-svn: 297809
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.
Depends on D30046, D29712
Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet
Reviewed By: qcolombet
Subscribers: dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D30089
llvm-svn: 297782
Make MCSectionELF::AssociatedSection be a link to a symbol, because
that's how it works in the assembly, and use it in the asm printer.
llvm-svn: 297769
This instruction was missing from the list of opcodes that we check, so we were
hitting an llvm_unreachable in ARMMCCodeEmitter.cpp for the ARM MOVT
instruction, rather than the diagnostic that is emitted for the other MOVW/MOVT
instructions.
Differential revision: https://reviews.llvm.org/D30936
llvm-svn: 297739
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.
Differential Revision: https://reviews.llvm.org/D30708
llvm-svn: 297716
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.
llvm-svn: 297621
Loop over the ARM decode tables; this is a clean-up to reduce some code
duplication.
Differential Revision: https://reviews.llvm.org/D30814
llvm-svn: 297608
ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE].
Summary:
This allows for some simplification because the combines
are no longer limited to just one go at the node before
it gets legalized into an ARM target-specific one.
Reviewers: jmolloy, rogfer01
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30401
llvm-svn: 297453
same as already done for ARM and Thumb2.
Reviewers: jmolloy, rogfer01, efriedma
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30400
llvm-svn: 297443
Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for
vcmp that was actually missing.
Differential Revision: https://reviews.llvm.org/D30745
llvm-svn: 297376
The check for LSL #0 in an IT block was checking if operand 4 was zero, but
operand 4 is the condition code operand so it was actually checking for LSLEQ.
Fix this by checking operand 3, which really is the immediate operand, and add
some tests.
Differential Revision: https://reviews.llvm.org/D30692
llvm-svn: 297142
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.
The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.
Differential Revision: https://reviews.llvm.org/D30645
llvm-svn: 297137
Summary: Previously, it had always been materialized as a push/pop sequence.
Reviewers: labrinea, jroelofs
Reviewed By: jroelofs
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30648
llvm-svn: 297134
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.
llvm-svn: 297100
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.
llvm-svn: 296862
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.
This patch fixes some cases where we would move stores to the wrong
insert point.
Re-commit with a fix to increment NumMove in the right place.
Differential Revision: https://reviews.llvm.org/D30124
llvm-svn: 296815
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.
Differential Revision: https://reviews.llvm.org/D29675
llvm-svn: 296751
This patch teaches (ARM|AArch64)ISelLowering.cpp to match illegal vector types
to interleaved access intrinsics as long as the types are multiples of the
vector register width. A "wide" access will now be mapped to multiple
interleave intrinsics similar to the way in which non-interleaved accesses with
illegal types are legalized into multiple accesses. I'll update the associated
TTI costs (in getInterleavedMemoryOpCost) as a follow-on.
Differential Revision: https://reviews.llvm.org/D29466
llvm-svn: 296750
Original commit message:
[ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
This patch fixes some cases where we would sink stores for no reason.
llvm-svn: 296718
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
This patch fixes some cases where we would sink stores for no reason.
Differential Revision: https://reviews.llvm.org/D30124
llvm-svn: 296708
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
Differential Revision: https://reviews.llvm.org/D30368
llvm-svn: 296701
Lower i1, i8 and i16 call parameters by extending them before storing them on
the stack. Also make sure we encode the correct, extended size in the
corresponding memory operand, and that we compute the correct stack size in the
end.
The latter is a bit more complicated because we used to compute the stack size
in the getStackAddress method, based on the Size and Offset of the parameters.
However, if the last parameter is sign extended, we'd be using the wrong,
non-extended size, and we'd end up with a smaller stack than we need to hold the
extended value. Instead of hacking this up based on the value of Size in
getStackAddress, we move our stack size handling logic to assignArg, where we
have access to the CCState which knows everything we could possibly want to know
about the stack. This way we don't need to duplicate any knowledge or resort to
any ugly hacks.
On this same occasion, update the IRTranslator test to check the sizes of the
stores everywhere, not just for sign extended paramteres.
llvm-svn: 296631
This parsing code was incorrectly checking for invalid characters, so an
invalid instruction like:
msr spsr_w, r0
would be emitted as:
msr spsr_cxsf, r0
Differential revision: https://reviews.llvm.org/D30462
llvm-svn: 296607
This prevents generating stm r1!, {r0, r1} on Thumb1, where value
stored for r1 is UNKONWN.
Patch by Zhaoshi Zheng.
Differential Revision: https://reviews.llvm.org/D27910
llvm-svn: 296538
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296476
Lower i32, float and double parameters that need to live on the stack. This
boils down to creating some G_GEPs starting from the stack pointer and storing
the values there. During the process we also keep track of the stack size and
use the final value in the ADJCALLSTACKDOWN/UP instructions.
We currently assert for smaller types, since they usually require extensions.
They will be handled in a separate patch.
llvm-svn: 296473
In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in
an IT block but not the last instruction in the block.
Previously, we only diagnosed this for LDM instructions, this patch extends the
diagnostic to cover all of the relevant instructions.
Differential Revision: https://reviews.llvm.org/D30398
llvm-svn: 296459
The transform in question claims to be doing:
// fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c))
...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node
for the sext/zext patterns.
This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(),
so I was seeing infinite loops with my draft of a patch applied.
The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll
is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's
not affected by this code change.
Differential Revision:
https://reviews.llvm.org/D30355
llvm-svn: 296389
Currently we handle this correctly in arm, but in thumb we don't which leads to
an unpredictable instruction being emitted for LSL #0 in an IT block and SP not
being permitted in some cases when it should be.
For the thumb2 LSL we can handle this by making LSL #0 an alias of MOV in the
.td file, but for thumb1 we need to handle it in checkTargetMatchPredicate to
get the IT handling right. We also need to adjust the handling of
MOV rd, rn, LSL #0 to avoid generating the 16-bit encoding in an IT block. We
should also adjust it to allow SP in the same way that it is allowed in
MOV rd, rn, but I haven't done that here because it looks like it would take
quite a lot of work to get right.
Additionally correct the selection of the 16-bit shift instructions in
processInstruction, where it was checking if the two registers were equal when
it should have been checking if they were low. It appears that previously this
code was never executed and the 16-bit encoding was selected by default, but
the other changes I've done here have somehow made it start being used.
Differential Revision: https://reviews.llvm.org/D30294
llvm-svn: 296342
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296252
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.
The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.
llvm-svn: 296031
Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).
llvm-svn: 295973
Add support for lowering calls with parameters than can fit into regs. Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.
llvm-svn: 295971
The ARMConstantIslandPass didn't have support for handling accesses to
constant island objects through ARM::t2LDRBpci instructions. This adds
support for that.
This fixes PR31997.
llvm-svn: 295964
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.
Thus, remove the tPUSH as well in this case.
Differential Revision: https://reviews.llvm.org/D30207
llvm-svn: 295816
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.
Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953
llvm-svn: 295811
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.
Differential Revision: https://reviews.llvm.org/D26722
llvm-svn: 295803
PC isn't allowed in the source operand of t2MOVr, so change the register class
to one without PC. SP handling is slightly trickier and changes depending on if
we're in ARMv8, so do that in checkTargetMatchPredicate.
Differential Revision: https://reviews.llvm.org/D30199
llvm-svn: 295732
There used to be a check in the IRTranslator that prevented us from having to
deal with atomic loads/stores. That check has been removed in r294993 and the
AArch64 backend was updated accordingly. This commit does the same thing for the
ARM backend.
In general, in the ARM backend we introduce fences during the atomic expand
pass, so we don't have to worry about atomics, *except* for the 32-bit ARMv8
target, which handles atomics more like AArch64. Since we don't want to worry
about that yet, just bail out of instruction selection if we find any atomic
loads.
llvm-svn: 295662
Removed the HasT2ExtractPack feature and replaced its references
with HasDSP. This then allows the Thumb2 extend instructions to be
selected for ARMv8M +dsp. These instruction descriptions have also
been refactored and more target tests have been added for their isel.
Differential Revision: https://reviews.llvm.org/D29623
llvm-svn: 295452
Add some asserts to make sure we're using the mappings that we think we're
using. This is to keep us from accidentally breaking functionality while moving
to TableGen'erated mappings.
llvm-svn: 295441
Start using the Subtarget to make decisions about what's legal. In particular,
we only mark floating point operations as legal if we have VFP2, which is
something we should've done from the very start.
llvm-svn: 295439
Since they're only used for passing around double precision floating point
values into the general purpose registers, we'll lower them to VMOVDRR and
VMOVRRD.
llvm-svn: 295310
For now we just mark them as legal all the time and let the other passes bail
out if they can't handle it. In the future, we'll want to move more of the
brains into the legalizer.
llvm-svn: 295300
For the hard float calling convention, we just use the D registers.
For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.
For pure soft float targets, we still bail out because we don't support the
libcalls yet.
llvm-svn: 295295
Summary:
The attached test case fails with "fatal error: error in backend:
misaligned pc-relative fixup value" as the jump table is misaligned.
The EmitAlignment existed already for ARM and Thumb-1 code, but was
missing for Thumb-2.
The test checks that the fatal error disappears when generating an obj
file, as well as checking the align directive is there when producing an
asm file.
Reviewers: rengolin, grosbach, t.p.northover, jmolloy, SjoerdMeijer, samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D29650
llvm-svn: 294950
We match a sequence of 3-4 instructions into a tTBB pseudo. One of our checks is that
a particular register in that sequence is killed (so it can be clobbered by the pseudo).
We weren't noticing if an errant MOV or other instruction had infiltrated the
sequence we were walking. If it had, and it defined the register we've already
identified as killed, it makes it live across the tBR_JT and thus unclobberable.
Notice this case and bail out.
llvm-svn: 294949
When generating a floating point comparison we currently unconditionally
generate VCMPE. This has the sideeffect of setting the cumulative Invalid
bit in FPSCR if any of the operands are QNaN.
It is expected that use of a relational predicate on a QNaN value should
raise Invalid. Quoting from the C standard:
The relational and equality operators support the usual mathematical
relationships between numeric values. For any ordered pair of numeric
values exactly one of relationships the less, greater, equal and is true.
Relational operators may raise the floating-point exception when argument
values are NaNs.
The standard doesn't explicitly state the expectation for equality operators,
but the implication and obvious expectation is that equality operators
should not raise Invalid on a QNaN input, as those predicates are wholly
defined on unordered inputs (to return not equal).
Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if
QNaN should raise Invalid, and pipe that through to TableGen.
llvm-svn: 294945
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.
Teach the cost model to consider f16 interleaved operations as
expensive. Otherwise, we are all but guaranteed to end up with
a large block of scalarized vector code.
llvm-svn: 294819