This patch lets the register scavenger make use of multiple spill slots in
order to guarantee that it will be able to provide multiple registers
simultaneously.
To support this, the RS's API has changed slightly: setScavengingFrameIndex /
getScavengingFrameIndex have been replaced by addScavengingFrameIndex /
isScavengingFrameIndex / getScavengingFrameIndices.
In forthcoming commits, the PowerPC backend will use this capability in order
to implement the spilling of condition registers, and some special-purpose
registers, without relying on r0 being reserved. In some cases, spilling these
registers requires two GPRs: one for addressing and one to hold the value being
transferred.
llvm-svn: 177774
NEON is not IEEE 754 compliant, so we should avoid lowering single-precision
floating point operations with NEON unless unsafe-math is turned on. The
equivalent VFP instructions are IEEE 754 compliant, but in some cores they're
much slower, so some archs/OSs might still request it to be on by default,
such as Swift and Darwin.
llvm-svn: 177651
The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.
This partially addresses PR14867.
Patch by Pete Couperus
llvm-svn: 177380
The default logic marks them as too expensive.
For example, before this patch we estimated:
cost of 16 for instruction: %r = uitofp <4 x i16> %v0 to <4 x float>
While this translates to:
vmovl.u16 q8, d16
vcvt.f32.u32 q8, q8
All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.
radar://13445992
llvm-svn: 177334
Fix cost of some "cheap" cast instructions. Before this patch we used to
estimate for example:
cost of 16 for instruction: %r = fptoui <4 x float> %v0 to <4 x i16>
While we would emit:
vcvt.s32.f32 q8, q8
vmovn.i32 d16, q8
vuzp.8 d16, d17
All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.
radar://13434072
llvm-svn: 177333
I was too pessimistic in r177105. Vector selects that fit into a legal register
type lower just fine. I was mislead by the code fragment that I was using. The
stores/loads that I saw in those cases came from lowering the conditional off
an address.
Changing the code fragment to:
%T0_3 = type <8 x i18>
%T1_3 = type <8 x i1>
define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2,
%T1_3* %blend, %T0_3* %storeaddr) {
%v0 = load %T0_3* %loadaddr
%v1 = load %T0_3* %loadaddr2
==> FROM:
;%c = load %T1_3* %blend
==> TO:
%c = icmp slt %T0_3 %v0, %v1
==> USE:
%r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1
store %T0_3 %r, %T0_3* %storeaddr
ret void
}
revealed this mistake.
radar://13403975
llvm-svn: 177170
This is a generic function (derived from PEI); moving it into
MachineFrameInfo eliminates a current redundancy between the ARM and AArch64
backends, and will allow it to be used by the PowerPC target code.
No functionality change intended.
llvm-svn: 177111
By terrible I mean we store/load from the stack.
This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.
LV: Found an estimated cost of 2 for VF 8 For instruction: icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction: select i1, i32, i32
The bug that tracks the CodeGen part is PR14868.
radar://13403975
llvm-svn: 177105
Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend
currently lowers those using stack accesses.
This was responsible for a significant degradation on
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1
where we vectorize one loop to a vector factor of 16. After this patch we select
a vector factor of 4 which will generate reasonable code.
unsigned char cle[32];
void test(short c) {
unsigned short compte;
for (compte = 0; compte <= 31; compte++) {
cle[compte] = cle[compte] ^ c;
}
}
radar://13220512
llvm-svn: 176898
The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.
rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963
llvm-svn: 176413
dispatch code. As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352
llvm-svn: 176363
This fixes an issue where trying to assemlbe valid ADR instructions would cause
LLVM to hit a failed assertion.
Patch by Keith Walker.
llvm-svn: 176189
to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.
There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.
The refactoring was OK'd by Anton Korobeynikov on llvmdev.
Note: this touches the target interfaces, so out-of-tree targets may
be affected.
llvm-svn: 175788
It is possible that frame pointer is not found in the
callee saved info, thus FramePtrSpillFI may be incorrect
if we don't check the result of hasFP(MF).
Besides, if we enable the stack coloring algorithm, there
will be an assertion to ensure the slot is live. But in
the test case, %var1 is not live in the prologue of the
function, and we will get the assertion failure.
Note: There is similar code in ARMFrameLowering.cpp.
llvm-svn: 175616
In my previous commit:
"Merge a f32 bitcast of a v2i32 extractelt
A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers."
I added a pattern containing a copy_to_regclass. The copy_to_regclass is
actually not needed.
radar://13191881
llvm-svn: 175555
When creating an allocation hint for a register pair, make sure the hint
for the physical register reference is still in the allocation order.
rdar://13240556
llvm-svn: 175541
A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers.
The patch fixes this by using a COPY_TO_REGCLASS and a EXTRACT_SUBREG to extract
the element from the vector instead.
radar://13191881
llvm-svn: 175520
assembler should also accept a two arg form, as the docuemntation specifies that
the first (destination) register is optional.
This patch uses TwoOperandAliasConstraint to add the two argument form.
It also fixes an 80-column formatting problem in:
test/MC/ARM/neon-bitwise-encoding
<rdar://problem/12909419> Clang rejects ARM NEON assembly instructions
llvm-svn: 175221
The parser will now accept instructions with alignment specifiers written like
vld1.8 {d16}, [r0:64]
, while also still accepting the incorrect syntax
vld1.8 {d16}, [r0, :64]
llvm-svn: 175164