Since r207518 they are printed exactly like non-hidden stubs on x86 and
since r207517 on ARM.
This means we can use a single set for all stubs in those platforms.
llvm-svn: 269776
The movw instruction is only available in ARM state for V6T2 and above.
The MOVi16 instruction has requirement HasV6T2 but the InstAlias
for mov rd, imm where the operand is imm0_65535_expr:$imm does not.
This means that movw can incorrectly be used in ARMv4 and ARMv5 by
writing mov rd, 0x1234. The simple fix is to the requirement HasV6T2
to the InstAlias. Tests added to not-armv4.s.
Patch by Peter Smith.
llvm-svn: 269761
It seems that cl will emit the export directives for Windows ARM targets. The
fact that it did this had originally been missed and this functionality was
never implemented. This makes it possible to rely solely on the source code for
indicating what the exported interfaces are and brings us more compatibility
with cl.
llvm-svn: 269574
When setting the frame pointer, the offset from SP is calculated based on the
stack slot it gets allocated, but this slot is in turn based on the order of
the CSR list so that list should match the order we actually save the registers
in. Mostly it did, but in the edge-case of MachO AAPCS targets it was wrong.
llvm-svn: 269459
This change implements the transformation in processInstruction() for the
LDR rt, =expression to MOV rt, expression when the expression can be evaluated
and can fit into the immediate field of the MOV or a MVN.
Across the ARM and Thumb instruction sets there are several cases to consider,
each with a different range of representatble constants.
In ARM we have:
* Modified immediate (All ARM architectures)
* MOVW (v6t2 and above)
In Thumb we have:
* Modified immediate (v6t2, v7m and v8m.mainline)
* MOVW (v6t2, v7m, v8.mainline and v8m.baseline)
* Narrow Thumb MOV that can be used in an IT block (non flag-setting)
If the immediate fits any of the available alternatives then we make the transformation.
Fixes 25722.
Patch by Peter Smith.
llvm-svn: 269354
This change adds a new constant pool kind to ARMOperand. When parsing the
operand for =immediate we create an instance of this operand rather than
creating a constant pool entry and rewriting the operand.
As the new operand kind is only created for ldr rt,= we can make ldr rt,=
an explicit pseudo instruction in ARM, Thumb and Thumb2
The pseudo instruction is expanded in processInstruction(). This creates the
constant pool and transforms the pseudo instruction into a pc-relative ldr to
the constant pool.
There are no functional changes and no modifications needed to existing tests.
Required by the patch that fixes PR25722.
Patch by Peter Smith.
llvm-svn: 269352
Fix "Logic error" warnings of the type "Called C++ object pointer is
null" reported by Clang Static Analyzer.
Patch by Apelete Seketeli.
llvm-svn: 269285
This is a large change, but it's pretty mechanical:
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
the method to try* and return a bool for success.
- Where we were calling SelectNodeTo, just return afterwards.
Part of llvm.org/pr26808.
llvm-svn: 269258
I'm really not sure why we were in the first place, it's the linker's job to
convert between BL/BLX as necessary. Even worse, using BLX left Thumb calls
that could be locally resolved completely unencodable since all offsets to BLX
are multiples of 4.
rdar://26182344
llvm-svn: 269101
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.
llvm-svn: 269011
(re-apply r268810 as it exposed an uninitialized variable in ARM MFI.
Patch 268868 should fix that.)
Summary:
Currently, when checking if a stack is "BigStack" or not, it doesn't count into spills and arguments. Therefore, LLVM won't reserve spill slot for this actually "BigStack". This may cause scavenger failure.
Reviewers: rengolin
Subscribers: vitalybuka, aemerson, rengolin, tberghammer, danalbert, srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D19896
llvm-svn: 268869
(this is resubmit of r268529 with minor refactoring. r268529 was reverted
at r268536 due a memory sanitizer failure. I have not been able to
reproduce that failure and I checked all the variable used in my change
but I could not spot an issue. I did some refactoring and see if it will
give a clearer hint)
Summary:
Currently, when checking if a stack is "BigStack" or not, it doesn't count into spills and arguments. Therefore, LLVM won't reserve spill slot for this actually "BigStack". This may cause scavenger failure.
Reviewers: rengolin
Subscribers: vitalybuka, aemerson, rengolin, tberghammer, danalbert, srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D19896
llvm-svn: 268810
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.
We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.
Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.
llvm-svn: 268693
Given something like:
ldr r0, .LCPI0_0 (== pc-rel var)
add r0, pc
ldr r1, .LCPI0_1 (== pc-rel var)
add r1, pc
we cannot combine the 2 ldr instructions and litpools because they get added to
a different pc to form the correct address. I think the original logic came
from a time when we fused the LDRpci/PICADD instructions into one
pseudo-instruction so the PC was always immediately at-hand. That's no longer
the case.
Should fix general-dynamic TLS access on Linux, and quite possibly other -fPIC
code that relies on litpools (e.g. v6m and -Oz compilations) though trivial
tweaks of the .ll test didn't provoke anything.
llvm-svn: 268662
The code here is recursively Select-ing a new Node to avoid issues
where N is CSE'd during replaceDAGValue and stops being valid. We can
accomplish the same goal in a more principled way by using a
HandleSDNode.
This is essentially a less dodgy fix for PR25733 than the original
attempt back in r255120.
llvm-svn: 268590
Summary:
Currently, when checking if a stack is "BigStack" or not, it doesn't count into spills and arguments. Therefore, LLVM won't reserve spill slot for this actually "BigStack". This may cause scavenger failure.
Reviewers: rengolin
Subscribers: aemerson, rengolin, tberghammer, danalbert, srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D19896
llvm-svn: 268529
Remove the AddPristinesAndCSRs parameters from
addLiveIns()/addLiveOuts().
We need to respect pristine registers after prologue epilogue insertion,
Seeing that we got this wrong in at least two commits already, we should
rather pay the small price to query MachineFrameInfo for it.
There are three cases that did not set AddPristineAndCSRs to true even
after register allocation:
- ExecutionDepsFix: live-out registers are used as a hint that the
register is used soon. This is not true for pristine registers so
use the new addLiveOutsNoPristines() to maintain this behaviour.
- SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like
a bug, should do the right thing automatically now.
- StackMapLivenessAnalysis: Not adding pristine registers looks like a
bug to me. Added a FIXME comment but maintain the current behaviour
as a change may need to get coordinated with GC runtimes.
llvm-svn: 268336
We were negating an immediate that was going to be used in a SUBri form
unnecessarily. Since ADD/SUB are very similar we *can* do that, but we have to
change the SUB to an ADD at the same time. This also applies to ADD, and allows
us to handle a slightly larger range of immediates for those two operations.
rdar://25992245
llvm-svn: 268276
Summary:
Historically, we had a switch in the Makefiles for turning on "expensive
checks". This has never been ported to the cmake build, but the
(dead-ish) code is still around.
This will also make it easier to turn it on in buildbots.
Reviewers: chandlerc
Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits
Differential Revision: http://reviews.llvm.org/D19723
llvm-svn: 268050
transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas
it should be a successor of the original MBB.
The testcase changes are caused by Thumb2SizeReduction, which
was previously confused by the broken CFG.
Follow-up to r266679.
Unfortunately, it's tricky to catch this in the verifier.
llvm-svn: 267778
Summary:
We don't use MinLatency any more since r184032.
Reviewers: atrick, hfinkel, mcrosier
Differential Revision: http://reviews.llvm.org/D19474
llvm-svn: 267502
Summary:
This patch adds support for the X asm constraint.
To do this, we lower the constraint to either a "w" or "r" constraint
depending on the operand type (both constraints are supported on ARM).
Fixes PR26493
Reviewers: t.p.northover, echristo, rengolin
Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19061
llvm-svn: 267411
This corrects the MI annotations for the stack adjustment following the __chkstk
invocation. We were marking the original SP usage as a Def rather than Kill.
The (new) assigned value is the definition, the original reference is killed.
Adjust the ISelLowering to mark Kills and FrameSetup as well.
This partially resolves PR27480.
llvm-svn: 267361
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.
Also includes a bonus feature: addends for COFF image-relative references.
Differential Revision: http://reviews.llvm.org/D17938
llvm-svn: 267211
WIN__DBZCHK will insert a CBZ instruction into the stream. This instruction
reserves 3 bits for the condition register (rn). As such, we must ensure that
we restrict the register to a low register. Use the tGPR class instead of GPR
to ensure that this is properly constrained. In debug builds, we would attempt
to use lr as a condition register which would silently get truncated with no
hint that the register selection was incorrect.
llvm-svn: 267080
We'd disabled them on x86 because back in the early days some host tools
couldn't handle the new load commands. This no longer holds: anyone capable of
deploying Clang should be able to deploy its copies of ar/ranlib/etc.
rdar://25254790
llvm-svn: 267075
Because lowering of CMP_SWAP_64 occurs during type legalization, there can be
i64 types produced by more than just a BUILD_PAIR or similar. My initial tests
used just incoming function args.
llvm-svn: 266828
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer. I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.
This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.
Differential Revision: http://reviews.llvm.org/D19098
llvm-svn: 266818
The fast register-allocator cannot cope with inter-block dependencies without
spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions
where any value produced within a block is dead by the end, but not for
cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded
after regalloc.
Fortunately this is at -O0 so we don't have to care about performance. This
simplifies the various axes of expansion considerably: we assume a strong
seq_cst operation and ensure ordering via the always-present DMB instructions
rather than v8 acquire/release instructions.
Should fix the 32-bit part of PR25526.
llvm-svn: 266679
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
Divisions by a constant can be converted into multiplies which are usually
cheaper, but this isn't possible if the constant gets separated (particularly
in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is
cheap.
I considered making the check generic, but neither AArch64 (strangely) nor x86
showed any benefit on the tests I had.
llvm-svn: 266464
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.
This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).
For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.
This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).
As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.
Fixes PR16275.
llvm-svn: 266363
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:
+ ConstantHoisting was modifying switch statements. This is simply invalid,
the cases must remain integer constants no matter the notional cost.
+ ConstantHoisting was mangling alloca instructions in the entry block. These
should be handled by FrameLowering, so constants actually have a cost of 0.
Worse, the resulting bitcasts meant they became dynamic allocas.
rdar://25707382
llvm-svn: 266260
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18901
llvm-svn: 266253
This is better for a few reasons:
+ It matches the other tooling for iOS.
+ It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't
use ARM mode).
+ It leads to infinitesimally smaller code (0.2%, yay!).
rdar://25369506
llvm-svn: 266003
When we see a .arch or .cpu directive, we should try to avoid switching
ARM/Thumb mode if possible.
If we do have to switch modes, we also need to emit the correct mapping
symbol for the new ISA. We did not do this previously, so could emit
ARM code with Thumb mapping symbols (or vice-versa).
The GAS behaviour is to always stay in the same mode, and to emit an
error on any instructions seen when the current mode is not available on
the current target. We can't represent that situation easily (we assume
that Thumb mode is available if ModeThumb is set), so we differ from the
GAS behaviour when switching to a target that can't support the old
mode. I've added a warning for when this implicit mode-switch occurs.
Differential Revision: http://reviews.llvm.org/D18955
llvm-svn: 265936
Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt,
smulwb and smulwt instructions for the ARM backend. Also updated the smul
CodeGen test and removed the smulw one.
Differential Revision: http://reviews.llvm.org/D18892
llvm-svn: 265793
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.
The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).
This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.
As a follow-up I'll add:
bool operator<(AtomicOrdering, AtomicOrdering) = delete;
bool operator>(AtomicOrdering, AtomicOrdering) = delete;
bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.
Reviewers: jyknight, reames
Subscribers: jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D18775
llvm-svn: 265602
Removed the SDNode argument passed to the AI_smul and AI_smla multiclass
definitions as they are always mul.
Differential Revision: http://reviews.llvm.org/D18791
llvm-svn: 265409
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.
This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.
Related to rdar://24207743
Differential Revision: http://reviews.llvm.org/D18680
llvm-svn: 265329
Summary:
This adds the same checks that were added in r264593 to all
target-specific passes that run after register allocation.
Reviewers: qcolombet
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18525
llvm-svn: 265313
ThreadModel::Single is already handled already by ARMPassConfig adding
LowerAtomicPass to the pass list, which lowers all atomics to non-atomic
ops and deletes fences.
So by the time we get to ISel, there's no atomic fences left, so they
don't need special handling.
llvm-svn: 265178
Some ARM instructions encode 32-bit immediates as a 8-bit integer (0-255)
and a 4-bit rotation (0-30, even) in its least significant 12 bits. The
original fixup, FK_Data_4, patches the instruction by the value bit-to-bit,
regardless of the encoding. For example, assuming the label L1 and L2 are
0x0 and 0x104 respectively, the following instruction:
add r0, r0, #(L2 - L1) ; expects 0x104, i.e., 260
would be assembled to the following, which adds 1 to r0, instead of 260:
e2800104 add r0, r0, #4, 2 ; equivalently 1
The new fixup kind fixup_arm_mod_imm takes care of the encoding:
e2800f41 add r0, r0, #260
Patch by Ting-Yuan Huang!
llvm-svn: 265122
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.
It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.
Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.
Differential Revision: http://reviews.llvm.org/D18627
llvm-svn: 265036
These checks are redundant and can be removed
Reviewers: hans
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D18564
llvm-svn: 264872
It is possible to have a fallthrough MBB prior to MBB placement. The original
addition of the BB would result in reordering the BB as not preceding the
successor. Because of the fallthrough nature of the BB, we could end up
executing incorrect code or even a constant pool island! Insert the spliced BB
into the same location to avoid that.
Thanks to Tim Northover for invaluable hints and Fiora for the discussion on
what may have been occurring!
llvm-svn: 264454
We did not have an explicit branch to the continuation BB. When the check was
hoisted, this could permit control follow to fall through into the division
trap. Add the explicit branch to the continuation basic block to ensure that
code execution is correct.
llvm-svn: 264370
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.
Before:
push {r7, lr}
cmp r0, r2
mov.w r0, #0
mov.w r12, #0
it hs
movhs r0, #1
cmp r1, r3
it ge
movge.w r12, #1
it eq
moveq r12, r0
cmp.w r12, #0
bne .LBB1_2
@ BB#1: @ %bb1
bl f
pop {r7, pc}
.LBB1_2: @ %bb2
bl g
pop {r7, pc}
After:
push {r7, lr}
subs r0, r0, r2
sbcs.w r0, r1, r3
bge .LBB1_2
@ BB#1: @ %bb1
bl f
pop {r7, pc}
.LBB1_2: @ %bb2
bl g
pop {r7, pc}
Saves around 80KB in Chromium's libchrome.so.
Some notes on this patch:
- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
introduced (nothing else needs them). However, they are necessary in
order to avoid poor codegen, and they seem similar to existing combines
in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
(brcond Compare)).
- No support for Thumb-1. This is in principle possible, but we'd need
to implement ARMISD::SUBE for Thumb-1.
Differential Revision: http://reviews.llvm.org/D15256
llvm-svn: 263962
We need to be careful on which registers can be explicitly handled
via copies. Prologue, Epilogue use physical registers and if one belongs
to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites
it after the explicit copies.
llvm-svn: 263857
The two changes together weakened the test and caused a regression with division
handling in MSVC mode. They were applied to avoid an assertion being triggered
in the block frequency analysis. However, the underlying problem was simply
being masked rather than solved properly. Address the actual underlying problem
and revert the changes. Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.
The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK). We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them. This would result in assertions being triggered
in the block frequency analysis pass.
Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code. Update this with new tests that actually ensure that we do not
regress on the code generation.
llvm-svn: 263714
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
'__sync' libcalls and '__atomic' libcalls, and this function is for
the '__sync' ones.
- getInsertFencesForAtomic() has been replaced with
shouldInsertFencesForAtomic(Instruction), so that the decision can be
made per-instruction. This functionality will be used soon.
- emitLeadingFence/emitTrailingFence are no longer called if
shouldInsertFencesForAtomic returns false, and thus don't need to
check the condition themselves.
llvm-svn: 263665
`MCSymbolRefExpr` variant kind for TLSCALL is prefixed with
_ARM_ since this is how it was originally implemented.
The X86_64 version is exactly the same so there's no reason
to create a new variant, we can just rename the existing
one to be machine-independent.
This generalization is the first step to implement support
for GNU2 TLS dialect in MC.
Differential Revision: http://reviews.llvm.org/D18160
llvm-svn: 263515
This patch adds Cortex-R8 to Target Parser and TableGen.
It also adds CodeGen tests for the build attributes.
Patch by Pablo Barrio.
Differential Revision: http://reviews.llvm.org/D17925
llvm-svn: 263132
The initial change was insufficiently complete for always getting the semantics
of __builtin_longjmp correct. The builtin is translated into a
`tInt_eh_sjlj_longjmp` DAG node. This node set R7 as clobbered. However, the
code would then follow up with a clobber of R11. I had failed to notice the
imp-def,kill on R7 in the isel. Unfortunately, it seems that it is not possible
to conditionalise the Defs list via an !if. Instead, construct a new parallel
WIN node and prefer that when targeting windows. This ensures that we now both
correctly model the __builtin_longjmp as well as construct the frame in a more
ABI conformant manner.
llvm-svn: 263123
WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast
to AAPCS which states r7. This adjusts __builtin_longjmp to not clobber r7 and
to properly restore the frame pointer on execution.
llvm-svn: 263118
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.
There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.
Differential Revision: http://reviews.llvm.org/D18016
llvm-svn: 263092
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262738
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262507
Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which
means that unaligned loads/stores do not trap and even extensive testing
will not catch these bugs. However the multi/double variants are not
affected by this bit and will still trap. In effect a more aggressive
load/store optimization will break existing (bad) code.
These bugs do not necessarily manifest in the broken code where the
misaligned pointer is formed but often later in perfectly legal code
where it is accessed. This means recompiling system libraries (which
have no alignment bugs) with a newer compiler will break existing
applications (with alignment bugs) that worked before.
So (under protest) I implemented this safe mode which limits the
formation of multi/double operations to cases that are not affected by
user code (stack operations like spills/reloads) or cases where the
normal operations trap anyway (floating point load/stores). It is
disabled by default.
Differential Revision: http://reviews.llvm.org/D17015
llvm-svn: 262504
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:
- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model
Typical steps necessary to complete a model:
- Ensure all pseudo instructions that are expanded before machine
scheduling (usually everything handled with EmitYYY() functions in
XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.
Differential Revision: http://reviews.llvm.org/D17747
llvm-svn: 262384
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null. Slowly inching
toward being able to fix PR26753.
llvm-svn: 262149
We were emitting only one half of a the paired relocations needed for these
instructions because we decided that an offset needed a scattered relocation.
In fact, movw/movt relocations can be paired without being scattered.
llvm-svn: 261679
Summary:
Currently, the ARM Constant Island may not converge (or not converge quickly).
This patch let it move to the closest water after the user if it doesn't converge after 15 iterations.
This address https://llvm.org/bugs/show_bug.cgi?id=25339
Reviewers: t.p.northover, srhines, kristof.beyls, aadg, rengolin
Subscribers: weimingz, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D16890
llvm-svn: 261665
Summary:
If we want classify OoO or not, using getSchedModel().isOutOfOrder()
could be more proper way than using Subtarget->isLikeA9().
Reviewers: jmolloy, rengolin
Differential Revision: http://reviews.llvm.org/D17433
llvm-svn: 261623
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest). All of these
functions require non-null parameters already, so references are more
clear. As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.
No functionality change intended.
llvm-svn: 261605
This is a little embarrassing.
When I reverted r261504 (getIterator() => getInstrIterator()) in
r261567, I did a `git grep` to see if there were new calls to
`getInstrIterator()` that I needed to migrate. There were 10-20 hits,
and I blindly did a `sed ...` before calling `ninja check`.
However, these were `MachineInstrBundleIterator::getInstrIterator()`,
which predated r261567. Perhaps coincidentally, these had an identical
name and return type.
This commit undoes my careless sed and restores
`MachineBasicBlock::iterator::getInstrIterator()`.
llvm-svn: 261577
Delete MachineInstr::getIterator(), since the term "iterator" is
overloaded when talking about MachineInstr.
- Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so
that ilist_node::getIterator() is still available.
- Add it back as MachineInstr::getInstrIterator(). This matches the
naming in MachineBasicBlock.
- Add MachineInstr::getBundleIterator(). This is explicitly called
"bundle" (not matching MachineBasicBlock) to disintinguish it clearly
from ilist_node::getIterator().
- Update all calls. Some of these I switched to `auto` to remove
boiler-plate, since the new name is clear about the type.
There was one call I updated that looked fishy, but it wasn't clear what
the right answer was. This was in X86FrameLowering::inlineStackProbe(),
added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to
leave the behaviour unchanged, but I'll reply to the original commit on
the list in a moment.
llvm-svn: 261504
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
llvm-svn: 261498
Add support for TLS access for Windows on ARM. This generates a similar access
to MSVC for ARM.
The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call. The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.
llvm-svn: 259676
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.
Fixes PR26450.
llvm-svn: 259657
description and changed the regression test accordingly.
The default configuration of a Cortex-R7 is to implement the
VFPv3-D16 architecture and the feature line as it was is too
restrictive.
llvm-svn: 259480
The basic optimisation was to convert (mul $LHS, $complex_constant) into
roughly "(shl (mul $LHS, $simple_constant), $simple_amt)" when it was expected
to be cheaper. The original logic checks that the mul only has one use (since
we're mangling $complex_constant), but when used in even more complex
addressing modes there may be an outer addition that can pick up the wrong
value too.
I *think* the ARM addressing-mode problem is actually unreachable at the
moment, but that depends on complex assessments of the profitability of
pre-increment addressing modes so I've put a real check in there instead of an
assertion.
llvm-svn: 259228
The trap instruction is emitted as a data-in-text rather
than an instruction. This patch uses the .inst directive
for emitting trap.
Differential Revision: http://reviews.llvm.org/D16684
llvm-svn: 259182
Various bits we want to use the new ABI actually compile with "-arch armv7k
-miphoneos-version-min=9.0". Not ideal, but also not ridiculous given how
slices work.
llvm-svn: 258975
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
This patch was originally committed as r257885, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258683
This patch was originally committed as r257884, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258682
This patch was originally committed as r257883, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258681
This was originally committed as r255762, but reverted as it broke windows
bots. Re-commitiing the exact same patch, as the underlying cause was fixed by
r258677.
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.
These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.
Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.
New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.
Differential Revision: http://reviews.llvm.org/D15038
llvm-svn: 258678
When the shift immediate is zero, PKHTB is an alias for PKHBT, but the order of
the input operands needs to be swapped.
Differential Revision: http://reviews.llvm.org/D16288
llvm-svn: 258044
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.
PR26136
llvm-svn: 257930
# The first commit's message is:
Revert "[ARM] Add DSP build attribute and extension targeting"
This reverts commit b11cc50c0b4a7c8cdb628abc50b7dc226ff583dc.
# This is the 2nd commit message:
Revert "[ARM] Add new system registers to ARMv8-M Baseline/Mainline"
This reverts commit 837d08454e3e5beb8581951ac26b22fa07df3cd5.
llvm-svn: 257916
Summary:
BFC instructions are available in ARMv6T2 and above.
Reviewers: t.p.northover
Subscribers: aemerson
Differential Revision: http://reviews.llvm.org/D16076
llvm-svn: 257546
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.
rdar://problem/23754176
llvm-svn: 257545
Summary:
This fixes three bugs, in all of which state is not or incorrecly reset between
objects (i.e. when reusing the same pass manager to create multiple object
files):
1) AttributeSection needs to be reset to nullptr, because otherwise the backend
will try to emit into the old object file's attribute section causing a
segmentation fault.
2) MappingSymbolCounter needs to be reset, otherwise the second object file
will start where the first one left off.
3) The MCStreamer base class resets the Streamer's e_flags settings. Since
EF_ARM_EABI_VER5 is set on streamer creation, we need to set it again
after the MCStreamer was rest.
Also rename Reset (uppser case) to EHReset to avoid confusion with
reset (lower case).
Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D15950
llvm-svn: 257473
Summary:
r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse.
RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5.
Patch by Z. Zheng <zhaoshiz@codeaurora.org>
Reviewers: apazos, jmolloy, weimingz
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D15932
llvm-svn: 257188
Summary: In ARMConstantIslandPass, which runs after Shrink Wrap pass, long jumps will be fixed up as BL (tBfar) which depends on spilling LR in epilogue. However, shrink-wrap may remove the LR, which causes issues when the function returns.
Reviewers: qcolombet, rengolin
Subscribers: aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D15984
llvm-svn: 257187
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.
llvm-svn: 257039
In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder.
I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface.
Differential Revision: http://reviews.llvm.org/D15757
llvm-svn: 256909
Summary:
* avoid generating POP {LR} in Thumb1 epilogues
* combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass
* combine POP {LR} + B + BX LR -> POP {PC} on v5T+
Test cases by Ana Pazos
Differential Revision: http://reviews.llvm.org/D15707
llvm-svn: 256523
Today, we always take into account the possibility that object files
produced by MC may be consumed by an incremental linker. This results
in us initialing fields which vary with time (TimeDateStamp) which harms
hermetic builds (e.g. verifying a self-host went well) and produces
sub-optimal code because we cannot assume anything about the relative
position of functions within a section (call sites can get redirected
through incremental linker thunks).
Let's provide an MCTargetOption which controls this behavior so that we
can disable this functionality if we know a-priori that the build will
not rely on /incremental.
llvm-svn: 256203
Summary:
r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb.
r250697: http://reviews.llvm.org/rL250697
Reviewers: rmaprath
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15653
llvm-svn: 256115
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.
Differential Revision: http://reviews.llvm.org/D15039
llvm-svn: 255764
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.
These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.
Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.
New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.
Differential Revision: http://reviews.llvm.org/D15038
llvm-svn: 255762
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).
Differential revision: http://reviews.llvm.org/D15259
llvm-svn: 255455
EABI attributes should only be emitted on EABI targets. This prevents the
emission of the optimization goals EABI attribute on Windows ARM.
llvm-svn: 255448
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.
This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).
Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.
This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033
Differential Revision: http://reviews.llvm.org/D15320
llvm-svn: 255362
We mutated the DAG, which invalidated the node we were trying to use
as a base register. Sometimes we got away with it, but other times the
node really did get deleted before it was finished with.
Should fix PR25733
llvm-svn: 255120
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).
DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.
One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.
Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.
Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!
llvm-svn: 255089
Summary:
Before ARMv5T, Thumb1 code could not pop PC, as described at D14357 and D14986;
so we need the special fixup in the epilogue.
Reviewers: jroelofs, qcolombet
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15126
llvm-svn: 255047
AND/BIC instructions do accept SP/PC, so the register class should be
more generic (rGPR -> GPR) to cope with that case. Adding more tests.
llvm-svn: 255034
Summary: This reverts r254234, and adds a simple fix for the annoying case of use-after-free.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15236
llvm-svn: 254912
with its source instead of forcing the values on GPRs.
This improves the lowering of vector code when such bitcasts happen in the
middle of vector computations.
rdar://problem/23691584
llvm-svn: 254684
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.
llvm-svn: 254524
The values in this field are compared against getAvailableFeatures()
which returns an uint64_t. This was causing problems in an internal
branch.
llvm-svn: 254462
Summary:
This had been broken for a very long time, but nobody noticed until
D14357 enabled shrink-wrapping by default.
Reviewers: jroelofs, qcolombet
Subscribers: tyomitch, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14986
llvm-svn: 254444
Add ARMv8.2-A to TargetParser, so that it can be used by the clang
command-line options and the .arch directive.
Most testing of this will be done in clang, checking that the
command-line options that this enables work.
Differential Revision: http://reviews.llvm.org/D15037
llvm-svn: 254400
This adds subtarget features for ARMv8.2-A, which builds on (and
requires the features from) ARMv8.1-A. Most assembler-visible features
of ARMv8.2-A are system instructions, and are all required parts of the
architecture, so just depend on the HasV8_2aOps subtarget feature.
There is also one large, optional feature, which adds 16-bit floating
point versions of all existing floating-point instructions (VFP and
SIMD), this is represented by the FeatureFullFP16 subtarget feature.
Differential Revision: http://reviews.llvm.org/D15036
llvm-svn: 254399
(This is the second attempt to submit this patch. The first caused two assertion
failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.
All uses of weight-based interfaces are now updated to use probability-based
ones.
Differential revision: http://reviews.llvm.org/D14973
llvm-svn: 254377
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."
Asserts were firing in Chromium builds. See PR25687.
llvm-svn: 254366
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.
All uses of weight-based interfaces are now updated to use probability-based
ones.
Differential revision: http://reviews.llvm.org/D14973
llvm-svn: 254348
Summary:
Since this build attribute corresponds to a whole module, and
different functions in a module may differ in the optimizations
enabled for them, this attribute is emitted after all functions,
and only in the case that the optimization goals for all
functions match.
Reviewers: logan, hans
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D14934
llvm-svn: 254201
Building on r253865 the crash is not limited to signed overflows.
Disable custom handling of unsigned 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit unsigned integer overflow.
llvm-svn: 254158
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
llvm-svn: 254085
Disable custom handling of signed 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit integer overflow crashes.
llvm-svn: 253865
Summary:
This follows D14577 to treat ARMv6-J as an alias for ARMv6,
instead of an architecture in its own right.
The functional change is that the default CPU when targeting ARMv6-J
changes from arm1136j-s to arm1136jf-s, which is currently used as
the default CPU for ARMv6; both are, in fact, ARMv6-J CPUs.
The J-bit (Jazelle support) is irrelevant to LLVM, and it doesn't
affect code generation, attributes, optimizations, or anything else,
apart from selecting the default CPU.
Reviewers: rengolin, logan, compnerd
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14755
llvm-svn: 253675
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Reviewed by Hal Finkel.
llvm-svn: 253511
It turns out we decide whether to use SjLj exceptions or some alternative in
two separate places in the backend, and they disagreed with each other. This
led to inconsistent code and is generally a terrible idea.
So make them consistent and add an assert that they *do* match (unfortunately
MCAsmInfo isn't available in opt, so it can't be used to initialise the CodeGen
version directly).
llvm-svn: 253502
If a section is rw, it is irrelevant if the dynamic linker will write to
it or not.
It looks like llvm implemented this because gcc was doing it. It looks
like gcc implemented this in the hope that it would put all the
relocated items close together and speed up the dynamic linker.
There are two problem with this:
* It doesn't work. Both bfd and gold will map .data.rel to .data and
concatenate the input sections in the order they are seen.
* If we want a feature like that, it can be implemented directly in the
linker since it knowns where the dynamic relocations are.
llvm-svn: 253436
The underlying issues surrounding codegen for 32-bit vselects have been resolved. The pessimistic costs for 64-bit vselects remain due to the bad
scalarization that is still happening there.
I tested this on A57 in T32, A32 and A64 modes. I saw no regressions, and some improvements.
From my benchmarks, I saw these improvements in A57 (T32)
spec.cpu2000.ref.177_mesa 5.95%
lnt.SingleSource/Benchmarks/Shootout/strcat 12.93%
lnt.MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 11.89%
I also measured A57 A32, A53 T32 and A9 T32 and found no performance regressions. I see much bigger wins in third-party benchmarks with this change
Differential Revision: http://reviews.llvm.org/D14743
llvm-svn: 253349
Currently, if the assembler encounters an error after parsing (such as an
out-of-range fixup), it reports this as a fatal error, and so stops after the
first error. However, for most of these there is an obvious way to recover
after emitting the error, such as emitting the fixup with a value of zero. This
means that we can report on all of the errors in a file, not just the first
one. MCContext::reportError records the fact that an error was encountered, so
we won't actually emit an object file with the incorrect contents.
Differential Revision: http://reviews.llvm.org/D14717
llvm-svn: 253328
The way prelink used to work was
* The compiler decides if a given section only has relocations that
are know to point to the same DSO. If so, it names it
.data.rel.ro.local<something>.
* The static linker puts all of these together.
* The prelinker program assigns addresses to each library and resolves
the local relocations.
There are many problems with this:
* It is incompatible with address space randomization.
* The information passed by the compiler is redundant. The linker
knows if a given relocation is in the same DSO or not. If could sort
by that if so desired.
* There are newer ways of speeding up DSO (gnu hash for example).
* Even if we want to implement this again in the compiler, the previous
implementation is pretty broken. It talks about relocations that are
"resolved by the static linker". If they are resolved, there are none
left for the prelinker. What one needs to track is if an expression
will require only dynamic relocations that point to the same DSO.
At this point it looks like the prelinker is an historical curiosity.
For example, fedora has retired it because it failed to build for two
releases
(http://pkgs.fedoraproject.org/cgit/prelink.git/commit/?id=eb43100a8331d91c801ee3dcdb0a0bb9babfdc1f)
This patch removes support for it. That is, it stops printing the
".local" sections.
llvm-svn: 253280
Function ARMConstantIslands::doInitialJumpTablePlacement() iterates over all
basic blocks in a machine function. It calls `MI = MBB.getLastNonDebugInstr()`
to get the last instruction in each block and then uses MI->getOpcode() to
decide what to do. If getLastNonDebugInstr() returns MBB.end() (for example,
when the block does not contain any instructions) then calling getOpcode() on
this value is incorrect. Avoid this problem by checking the result of
getLastNonDebugInstr().
Differential Revision: http://reviews.llvm.org/D14694
llvm-svn: 253222
Storing the source location of the expression that created a constant pool
entry allows us to emit better error messages if we later discover that the
expression cannot be represented by a relocation.
Differential Revision: http://reviews.llvm.org/D14646
llvm-svn: 253220
The MCValue class can store a SMLoc to allow better error messages to be
emitted if an error is detected after parsing. The ARM and AArch64 assembly
parsers were not setting this, so error messages did not have source
information.
Differential Revision: http://reviews.llvm.org/D14645
llvm-svn: 253219
Summary:
* ARMv6KZ is the "canonical" name, given in the ARMARM
* ARMv6Z is an "official abbreviation" for it, mentioned in the ARMARM
* ARMv6ZK is a popular misspelling, which we should support as an alias.
The patch corrects the handling of the names.
Functional changes:
* ARMv6Z no longer treated as an architecture in its own right
* ARMv6ZK renamed to ARMv6KZ, accepting ARMv6ZK as an alias
* arm1176jz-s and arm1176jzf-s recognized as ARMv6ZK, instead of ARMv6K
* default ARMv6K CPU changed to arm1176j-s
Reviewers: rengolin, logan, compnerd
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14568
llvm-svn: 253206
This allows for accurate architecture targeting as well as removing
duplicate information (hardcoded feature strings) from MCTargetDesc.
llvm-svn: 253196
This was left implicit and never ever checked, which means we could have a CMPZ against some non-zero value and we were carrying on with BFI conversion regardless.
Caught by Oliver Stannard using csmith; regression test added.
llvm-svn: 253195
MCRelaxableFragment previously kept a copy of MCSubtargetInfo and
MCInst to enable re-encoding the MCInst later during relaxation. A copy
of MCSubtargetInfo (instead of a reference or pointer) was needed
because the feature bits could be modified by the parser.
This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment
with a constant reference to MCSubtargetInfo. The copies of
MCSubtargetInfo are kept in MCContext, and the target parsers are now
responsible for asking MCContext to provide a copy whenever the feature
bits of MCSubtargetInfo have to be toggled.
With this patch, I saw a 4% reduction in peak memory usage when I
compiled verify-uselistorder.lto.bc using llc.
rdar://problem/21736951
Differential Revision: http://reviews.llvm.org/D14346
llvm-svn: 253127
MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a
member function getSTI.
This is done in preparation for making changes to shrink the size of
MCRelaxableFragment. (see http://reviews.llvm.org/D14346).
llvm-svn: 253124
Summary:
This patch changes ARMV5, ARMV5E, ARMV6SM, ARMV6HL, ARMV7, ARMV7L,
ARMV7HL, ARMV7EM to be treated as aliases for the corresponding
standard architectures, instead of as actual architectures.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14577
llvm-svn: 252903
I completely misunderstood what ARMISD::CMPZ means. It's not "compare equal to zero", it's "compare, only setting the zero/Z flag". It can either be equal-to-zero or not-equal-to-zero, and we weren't checking what sense it was.
If it's equal-to-zero, we can swap the operands around and pretend like it is not-equal-to-zero, which is both a bug fix and lets us handle more cases.
llvm-svn: 252891
I missed the side-effects of ParseBFI in my previous attempt (r252748).
Thanks dblaikie for the suggestion of adding a void use of the unused
variable instead.
llvm-svn: 252751
If we have a chain of BFIs, we may be able to combine several together into one merged BFI. We can do this if the "from" bits from one BFI OR'd with the "from" bits from the other BFI form a contiguous range, and the same with the "to" bits.
llvm-svn: 252740
ARM V6T2 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any ARM V6T2
implementation.
The net result of allowing this speculation for the regression tests in this patch is
that we get this code:
ctlz:
clz r0, r0
bx lr
cttz:
rbit r0, r0
clz r0, r0
bx lr
Instead of:
ctlz:
cmp r0, #0
moveq r0, #32
clzne r0, r0
bx lr
cttz:
cmp r0, #0
moveq r0, #32
rbitne r0, r0
clzne r0, r0
bx lr
This will help solve a general speculation/despeculation problem noted in PR24818:
https://llvm.org/bugs/show_bug.cgi?id=24818
Differential Revision: http://reviews.llvm.org/D14469
llvm-svn: 252639
Added fixes for stage2 failures: CMOV is not commutable; commuting the operands results in the condition being flipped! d'oh!
Original commit message:
If we have a CMOV, OR and AND combination such as:
if (x & CN)
y |= CM;
And:
* CN is a single bit;
* All bits covered by CM are known zero in y;
Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).
llvm-svn: 252606
This fixes a bug in ARMAsmPrinter::EmitUnwindingInstruction where
llvm_unreachable was reached because t2ADDri wasn't handled.
Test case provided by Tim Northover.
rdar://problem/23270609
http://reviews.llvm.org/D14518
llvm-svn: 252557
"GCC requires the freestanding environment provide memcpy, memmove, memset
and memcmp": https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Standards.html
Hence in GNUEABI targets LLVM should not convert 'memops' to their equivalent
'__aeabi_memops'. This convertion violates GCC contract.
The -meabi flag controls whether or not LLVM will modify 'memops' in GNUEABI
targets.
Without -meabi: use the triple default EABI.
With -meabi=default: use the triple default EABI.
With -meabi=gnu: use 'memops'.
With -meabi=4 or -meabi=5: use '__aeabi_memops'.
With -meabi set to an unknown value: same as -meabi=default.
Patch by Vinicius Tinti.
llvm-svn: 252462
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.
Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer
Subscribers: MatzeB, llvm-commits
Differential Revision: http://reviews.llvm.org/D14407
llvm-svn: 252318
Summary:
This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it.
D11268 is quite old and has merge conflicts against the current trunk.
This request
- rebases D11268 onto the new trunk;
- resolves the merge conflicts;
- fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct.
Reviewers: echristo, rengolin, kubabrecka
Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl
Differential Revision: http://reviews.llvm.org/D14338
llvm-svn: 252177
We can conservatively know that CMOV's known bits are the intersection of known bits for each of its operands. This helps PerformCMOVToBFICombine find more opportunities.
I tried hard to create a testcase for this and failed - we have to sufficiently confuse DAG.computeKnownBits which can see through all the cheap tricks I tried to narrow my larger testcase down :(
This code is actually exercised in CodeGen/ARM/bfi.ll, there's just no functional difference because DAG.computeKnownBits gets the right answer in that case.
llvm-svn: 252168
The generic infrastructure already did a lot of work to decide if the
fixup value is know or not. It doesn't make sense to reimplement a very
basic case: same fragment.
llvm-svn: 252090
If we have a CMOV, OR and AND combination such as:
if (x & CN)
y |= CM;
And:
* CN is a single bit;
* All bits covered by CM are known zero in y;
Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).
llvm-svn: 252057
Summary:
ARMv6KZ cores were set up incorrectly in ARM.td; also, the SMI mnemonic
(the old name for SMC, as defined in ARMv6KZ) wasn't supported.
Reviewers: jmolloy, rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D14154
llvm-svn: 251627
At the LLVM level this ABI is essentially a minimal modification of AAPCS to
support 16-byte alignment for vector types and the stack.
llvm-svn: 251570
These MachO file directives are used by linkers and other tools to provide
compatibility information, much like the existing .ios_version_min and
.macosx_version_min.
llvm-svn: 251569
Summary:
This patch handles assembly and disassembly, but not codegen, as of yet.
Additionally, it fixes a bug whereby SP and PC as shifted-reg operands
were treated as predictable in ARMv7 Thumb; and it enables the tests
for invalid and unpredictable instructions to run on both ARMv7 and ARMv8.
Reviewers: jmolloy, rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D14141
llvm-svn: 251516
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.
llvm-svn: 251490
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.
Reviewers: rengolin, t.p.northover
Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14082
llvm-svn: 251401
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.
While there make use of ArrayRef and std::find_if.
llvm-svn: 251382