When a fixup that can be resolved by the assembler is out of range, we should
report an error in the source, rather than crashing.
Differential Revision: http://reviews.llvm.org/D18402
llvm-svn: 265120
Summary:
This change will allow loads with imp-def to be clustered in machine-scheduler pass.
areMemAccessesTriviallyDisjoint() can also handle loads with imp-def.
Reviewers: mcrosier, jmolloy, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18665
llvm-svn: 265051
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.
It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.
Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.
Differential Revision: http://reviews.llvm.org/D18627
llvm-svn: 265036
Summary:
This change will handle missing store pair opportunity where the first store
instruction stores zero followed by the non-zero store. For example, this change
will convert :
str wzr, [x8]
str w1, [x8, #4]
into:
stp wzr, w1, [x8]
Reviewers: jmolloy, t.p.northover, mcrosier
Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18570
llvm-svn: 265021
This is effectively NFC, minus the renaming of the options
(-cyclone-prefetch-distance -> -prefetch-distance).
The change was requested by Tim in D17943.
llvm-svn: 264806
If we can't handle a relocation type, report it as an error in the source,
rather than asserting. I've added a more descriptive message and a test for the
only cases of this that I've been able to trigger.
Differential Revision: http://reviews.llvm.org/D18388
llvm-svn: 264156
Avoid modifying other modules in `AArch64PromoteConstant` when the
constant is `ConstantData` (a horrible accident, I'm sure, caught by an
experimental follow-up to r261464).
Previously, this walked through all the users of a constant, but that
reaches into other modules when the constant doesn't depend transitively
on a `GlobalValue`! Since we're walking instructions anyway, just
modify the instructions we actually see.
As a drive-by, instead of storing `Use` and getting the instructions
again via `Use::getUser()` (which is not a constantant time lookup),
store `std::pair<Instruction, unsigned>`. Besides being cheaper, this
makes it easier to drop use-lists form `ConstantData` in the future.
(I threw this in because I was touching all the code anyway.)
Because the patch completely changes the traversal logic, it looks
like a rewrite of the pass, but the core logic is all the same (or
should be, minus the out-of-module changes). In other words, there
should be NFC as long as the LLVMContext only has a single Module.
I didn't think of a good way to test this, but I hope to submit a patch
eventually that makes walking these use-lists illegal/impossible.
llvm-svn: 263853
This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing. I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.
Differential Revision: http://reviews.llvm.org/D18048
llvm-svn: 263819
Summary:
It can hurt performance to prefetch ahead too much. Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.
Reviewers: hfinkel
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17949
llvm-svn: 263772
Summary:
And use this TTI for Cyclone. As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.
I am also adding tests for this and the previous change (D17943):
* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either
Reviewers: hfinkel
Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17945
llvm-svn: 263771
Summary:
This wires up the pass for Cyclone but keeps it off for now because we
need a few more TTIs.
The getPrefetchMinStride value is not very well tuned right now but it
works well with CFP2006/433.milc which motivated this.
Tests will be added as part of the upcoming large-stride prefetching
patch.
Reviewers: t.p.northover
Subscribers: llvm-commits, aemerson, hfinkel, rengolin
Differential Revision: http://reviews.llvm.org/D17943
llvm-svn: 263770
When the SP in not changed because of realignment/VLAs etc., we restore the SP
by using the previous value of SP and not the FP. Breaking the dependency will
help in cases when the epilog of a callee is close to the epilog of the caller;
for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of
the callee.
http://reviews.llvm.org/D18060
Patch by Aditya Kumar and Evandro Menezes.
llvm-svn: 263458
Summary:
Peephole optimization that generates a single TBZ/TBNZ instruction
for test and branch sequences like in the example below. This handles
the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC
Examples:
and w8, w8, #0x400
cbnz w8, L1
to
tbnz w8, #10, L1
Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17942
llvm-svn: 263136
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.
There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.
Differential Revision: http://reviews.llvm.org/D18016
llvm-svn: 263092
Summary:
This change enables frame pointer elimination in non-leaf functions.
The -fomit-frame-pointer option still needs to be used when compiling
via clang (or an equivalent method of not setting the
'no-frame-pointer-elim*' function attributes if generating llvm IR via
some other method) to take advantage of this optimization.
This change should be NFC when compiling via clang without
-fomit-frame-pointer.
Reviewers: t.p.northover
Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines
Differential Revision: http://reviews.llvm.org/D17730
llvm-svn: 262495
This adds some missing generic schedule info definitions, enables
completeness checking for cyclone and fixes a typo uncovered by that.
Differential Revision: http://reviews.llvm.org/D17748
llvm-svn: 262393
Revert r262248 in an attempt to fix the clang-native-aarch64-full
bot and to investigate a performance regression in
SingleSource/Benchmarks/CoyoteBench/huffbench
llvm-svn: 262388
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:
- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model
Typical steps necessary to complete a model:
- Ensure all pseudo instructions that are expanded before machine
scheduling (usually everything handled with EmitYYY() functions in
XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.
Differential Revision: http://reviews.llvm.org/D17747
llvm-svn: 262384
Update APIs in MachineInstrBundle.h to take and return MachineInstr&
instead of MachineInstr* when the instruction cannot be null. Besides
being a nice cleanup, this is tacking toward a fix for PR26753.
llvm-svn: 262141
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are
never null, so this cleans up the API a bit. It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).
At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.
llvm-svn: 262115
Instead of the convoluted if-statment we can just use getColor. This also fixes
a bug where we relied upon the parity of tablegen-generated register indexes
(instead of using the machine encoding).
llvm-svn: 261990
Summary:
Avoid special case for FP, LR CFI emission and just allow general
AArch64FrameLowering::emitCalleeSavedFrameMoves() to handle them. Also,
stop recalculating the stack offsets in emitCalleeSavedFrameMoves()
since we can just reuse the previously calculated offset stored in the
MachineFrameInfo.
Depends on D17000
Reviewers: t.p.northover, rengolin, mcrosier, jmolloy
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17004
llvm-svn: 261885
Looks like the global rename last year was a bit over-zealous. These things
really are referred to with ARM64 elsewhere (ld64, libunwind, ...).
llvm-svn: 261698
Summary:
Fix a bug in epilog generation where the incoming stack arguments were
not being popped for fastcc functions when -tailcallopt was passed.
Reviewers: t.p.northover, mcrosier, jmolloy, rengolin
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16894
llvm-svn: 261650
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest). All of these
functions require non-null parameters already, so references are more
clear. As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.
No functionality change intended.
llvm-svn: 261605
This reverts commit r261510, effectively reapplying r261509. The
original commit missed a caller in AArch64ConditionalCompares.
Original commit message:
Pass non-null arguments by reference in MachineTraceMetrics::Trace,
simplifying future work to remove implicit iterator => pointer
conversions.
llvm-svn: 261511
Delete MachineInstr::getIterator(), since the term "iterator" is
overloaded when talking about MachineInstr.
- Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so
that ilist_node::getIterator() is still available.
- Add it back as MachineInstr::getInstrIterator(). This matches the
naming in MachineBasicBlock.
- Add MachineInstr::getBundleIterator(). This is explicitly called
"bundle" (not matching MachineBasicBlock) to disintinguish it clearly
from ilist_node::getIterator().
- Update all calls. Some of these I switched to `auto` to remove
boiler-plate, since the new name is clear about the type.
There was one call I updated that looked fishy, but it wasn't clear what
the right answer was. This was in X86FrameLowering::inlineStackProbe(),
added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to
leave the behaviour unchanged, but I'll reply to the original commit on
the list in a moment.
llvm-svn: 261504
After r261154, we were only clearing flags if the known-zero register was
originally live-in to the basic block, but we have to do it even if not when
more than one COPY has been eliminated, otherwise the user of the first COPY
may still have <kill> marked.
E.g.
BB#N:
%X0 = COPY %XZR
STRXui %X0<kill>, <fi#0>
%X0 = COPY %XZR
STRXui %X0<kill>, <fi#1>
We can eliminate both copies, X0 is not live-in, but we must clear the kill on
the first store.
Unfortunately, I've been unable to come up with a non-fragile test for this.
I've only seen it in the wild with regalloc-created spills, and attempts to
reproduce that in a reasonable way run afoul of COPY coalescing. Even volatile
asm clobbers were moved around. Should fix the aarch64 bot though.
llvm-svn: 261175
Mostly, this fixes the bug that if the CBZ guaranteed Xn but Wn was used, we
didn't sort out the use-def chain properly.
I've also made it check more than just the last instruction for a compatible
CBZ (so it can cope without fallthroughs). I'd have liked to do that
separately, but it's helps writing the test.
Finally, I removed some custom loops in favour of MachineInstr helpers and
refactored the control flow to flatten it and avoid possibly quadratic
iterations in blocks with many copies. NFC for these, just a general tidy-up.
llvm-svn: 261154
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
BB0:
cbz x0, .BB1
BB1:
mov x0, xzr
Jun
Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier
Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D16203
llvm-svn: 261004
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260998
Summary:
Before this change, callee-save registers would be rounded up to even
pairs of GPRs and FPRs. This change eliminates these extra padding
load/stores, though it does keep the stack allocation the same size
unless both the GPR and FPR sets have an odd size, in which case one
full pair stack slot (16 bytes) is saved.
This optimization cannot currently be done for MachO targets since they
rely on a fast-path .debug_frame equivalent that can only encode
callee-save registers as pairs.
Reviewers: t.p.northover, rengolin, mcrosier, jmolloy
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17000
llvm-svn: 260689
This is just a trivial implementation:
- Support only arguments passed in registers.
- Support only "plain" arguments, i.e., no sext/zext attribute.
At this point, it is possible to play with the IRTranslator on AArch64:
llc -mtriple arm64-<vendor>-<os> -print-machineinstrs <input.ll> -o - -global-isel
For now, we only support the translation of program with adds and returns.
Follow-up patches are on their way to add a test case (the MIRParser is
not ready as it is).
llvm-svn: 260600
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
This is a reapplication of r259812, which had an incorrect assert. The
test_stur_str_no_assert() test is a reduced version of the issue hit in
the AArch64 self-host.
PR24465
llvm-svn: 260523
Summary:
Fix case where a pre-inc/dec load/store would not be formed if the
add/sub that forms the inc/dec part of the operation was the first
instruction in the block being examined.
Reviewers: mcrosier, jmolloy, t.p.northover, junbuml
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16785
llvm-svn: 260275
The logic to pair instructions and merge narrow instructions has become cloogy
and error prone. This patch beings to unravel these two similar, but distinct
optimizations.
llvm-svn: 260242
This only impacts the creation of pre-/post-index instructions. The bound was
set high enough such that it did not change code generation for SPEC200X.
llvm-svn: 259828
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769 and r259790. The tramp3d failure was caused
by an incorrect refactoring in the patch. Specifically, we weren't always
properly clearing the SExtIdx flag.
llvm-svn: 259812
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:
(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:
(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.
Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.
Patch by Chris Diamand!
llvm-svn: 259800
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure. I'm unable to reproduce the issue at this time.
llvm-svn: 259790
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.
Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB
Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16291
llvm-svn: 259387
Summary:
Factor out common code for callee-save register pair calculation. This
is intended to simplify follow-on changes that reduce the number of
registers saved/restored.
Depends on D16732
Reviewers: mcrosier, jmolloy, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16734
llvm-svn: 259384
Summary:
Simplify callee-save register save/restore code generation by
remembering the size of the callee-save area when it is computed so we
don't have to scan the prologue/epilogue instructions again later to
reconstruct it.
This is intended to simplify follow-on changes that reduce the number of
registers saved/restored.
Reviewers: mcrosier, jmolloy, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16732
llvm-svn: 259365
Since we only have pair - not single - nontemporal store instructions,
we have to extract the high part into a separate register to be able
to use them.
When the initial nontemporal codegen support was added, I wrote the
extract using the nonsensical UBFX [0,32[.
Use the correct LSR form instead.
llvm-svn: 259134
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.
Also rename some variable to better reflect their usage.
llvm-svn: 258605
The current behavior is incorrect, as the two CCs returned by
changeFPCCToAArch64CC, intended to be OR'ed, are instead used
in an AND ccmp chain.
Consider:
define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
%cc1 = fcmp one float %a, %b
%cc2 = fcmp olt float %c, %d
%and = and i1 %cc1, %cc2
%r = select i1 %and, i32 %e, i32 %f
ret i32 %r
}
Assuming (%a < %b) and (%c < %d); we used to do:
fcmp s0, s1 # nzcv <- 1000
orr w8, wzr, #0x1 # w8 <- 1
csel w9, w8, wzr, mi # w9 <- 1
csel w8, w8, w9, gt # w8 <- 1
fcmp s2, s3 # nzcv <- 1000
cset w9, mi # w9 <- 1
tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000
csel w0, w0, w1, ne # w0 <- w0
We now do:
fcmp s2, s3 # nzcv <- 1000
fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000
fccmp s0, s1, #8, le # !le, so: nzcv <- 1000
csel w0, w0, w1, pl # !pl, so: w0 <- w1
In other words, we transformed:
(c < d) && ((a < b) || (a > b))
into:
(c < d) && (a u>= b) && (a u<= b)
whereas, per De Morgan's, we wanted:
(c < d) && !((a u>= b) && (a u<= b))
Note that this problem doesn't occur in the test-suite.
changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
We can't represent that in the fccmp chain; it can't express
arbitrary OR sequences, as one comment explains:
In general we can create code for arbitrary "... (and (and A B) C)"
sequences. We can also implement some "or" expressions, because
"(or A B)" is equivalent to "not (and (not A) (not B))" and we can
implement some negation operations. [...] However there is no way
to negate the result of a partial sequence.
Instead, introduce changeFPCCToANDAArch64CC, which produces the
conjunct cond codes:
- (a one b)
== ((a olt b) || (a ogt b))
== ((a ord b) && (a une b))
- (a ueq b)
== ((a uno b) || (a oeq b))
== ((a ule b) && (a uge b))
Note that, at first, one might think that, when PushNegate is true,
we should use the disjunct CCs, in effect doing:
(a || b)
= !(!a && !(b))
= !(!a && !(b1 || b2)) <- changeFPCCToAArch64CC(b, b1, b2)
= !(!a && !b1 && !b2)
However, we can take advantage of the fact that the CC is already
negated, which lets us avoid special-casing PushNegate and doing
the simpler to reason about:
(a || b)
= !(!a && (!b))
= !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2)
= !(!a && b1 && b2)
This makes both emitConditionalCompare cases behave identically,
and produces correct ccmp sequences for the 2-CC fcmps.
llvm-svn: 258533
We verify that the op tree is eligible for CCMP emission in
isConjunctionDisjunctionTree, but it's also possible that
emitConjunctionDisjunctionTree fails later.
The initial check is useful, as it avoids building nodes
that will get discarded.
Still, make sure that inconsistencies don't happen with
an assert.
llvm-svn: 258532
Summary:
SETCC with f16 vectors has OperationAction set to Expand but still gets
lowered to FCM* intrinsics based on its result type. This patch skips
lowering of VSETCC if the operand is an f16 vector.
v4 and v8 tests included.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15361
llvm-svn: 258471
There was support for writing the AArch64 big endian data fixup entries in
the .eh_frame section in BE. This is changed to write all such fixup
entries in BE with no restriction on the section. This is similar to
the existing support for fixup entries for ARM.
A test is added to check the length field in the .debug_line section as
this is an example of where such a fixup occurs.
Differential Revision: http://reviews.llvm.org/D16064
llvm-svn: 258320
The AArch64 .inst directive was implemented using EmitIntValue, which resulted
in both $x and $d (code and data) mapping symbols being emitted at the same
address. This fixes it to only emit the $x mapping symbol.
EmitIntValue also emits the value in big-endian order when targeting big-endian
systems, but instructions are always emitted in little-endian order for
AArch64.
Differential Revision: http://reviews.llvm.org/D16349
llvm-svn: 258308
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.
I will commit fixes to other platforms as well.
PR26136
llvm-svn: 257929
Summary:
This pass may modify the Cmp operands. However, the flag reg may be used by both the branch and CSEL.
Modifying CMP will have side effect on CSEL.
Reviewers: t.p.northover
Subscribers: llvm-commits, aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D16147
llvm-svn: 257844
In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder.
I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface.
Differential Revision: http://reviews.llvm.org/D15757
llvm-svn: 256909
This is a recommit of r256004 which was reverted in r256160. The issue was the
incorrect promotion for half and byte loads transformed into mov instructions.
This fix will replace half and byte type loads only with bit field extracts.
Original commit message:
This change promotes load instructions which directly read from stored by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
STRWui %W1, %X0, 1
%W0 = LDRHHui %X0, 3
becomes
STRWui %W1, %X0, 1
%W0 = UBFMWri %W1, 16, 31
llvm-svn: 256249
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.
This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.
llvm-svn: 256176
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.
http://reviews.llvm.org/D15652
llvm-svn: 256158
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
STRWui %W1, %X0, 1
%W0 = LDRHHui %X0, 3
becomes
STRWui %W1, %X0, 1
%W0 = UBFMWri %W1, 16, 31
llvm-svn: 256004
This patch enables PostRAScheduler specifically for AArch64 generic build,
which is beneficial from the performance perspective.
Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed.
Also benchmarks from LLVM test-suite did not regress.
Differential Revision: http://reviews.llvm.org/D15557
llvm-svn: 255896
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.
Differential Revision: http://reviews.llvm.org/D15515
llvm-svn: 255895
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given machine function and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
The target independent portion was committed as r255353.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15341
llvm-svn: 255821
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).
Differential revision: http://reviews.llvm.org/D15259
llvm-svn: 255455
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.
This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).
Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.
This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033
Differential Revision: http://reviews.llvm.org/D15320
llvm-svn: 255362
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.
Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.
Example from AArch64:
def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
(i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;
Which is trying to do sext_inreg i8, i8.
llvm-svn: 255359
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).
DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.
One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.
Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.
Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!
llvm-svn: 255089
Summary:
This fixes failure when trying to select
insertelement <4 x half> undef, half %a, i64 0
which gets transformed to a scalar_to_vector node.
The accompanying v4 and v8 tests fail instruction selection without this
patch.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15322
llvm-svn: 255072
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.
The ".2h" vector type specifier is now legal (for the scalar pairwise
reduction instructions), so some unrelated tests have been modified as
different error messages are emitted. This is not a problem as the
invalid operands are still caught.
llvm-svn: 255010
When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access. Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing. Note that the actual implementation was always bailing if the load or store wasn't simple.
Reminder:
- "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered
- "ordered" - imposes ordering constraints on other nearby memory operations
- "atomic" - can't be split or sheared. In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used.
- "simple" - a load which is none of the above. These are normal loads and what most of the optimizer works with.
llvm-svn: 254805
In the case of a conditional branch without a preceding cmp we used to emit
a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead.
Differential Revision: http://reviews.llvm.org/D15122
llvm-svn: 254621
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).
The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.
rdar://22491037
llvm-svn: 254475
Summary:
When not useful bits, BitWidth becomes 0 and APInt will not be happy.
See https://llvm.org/bugs/show_bug.cgi?id=25571
We can just mark the operand as IMPLICIT_DEF is none bits of it is used.
Reviewers: t.p.northover, jmolloy
Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14803
llvm-svn: 254440
The Statistical Profiling Extension is an optional extension to
ARMv8.2-A. Since it is an optional extension, I have added the
FeatureSPE subtarget feature to control it. The assembler-visible parts
of this extension are the new "psb csync" instruction, which is
equivalent to "hint #17", and a number of system registers.
Differential Revision: http://reviews.llvm.org/D15021
llvm-svn: 254401
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Most of these instructions are the same as the 32- and 64-bit versions,
but with the type field (bits 23-22) set to 0b11. Previously the top bit
of the size field was always 0, so the instruction classes only provided
a 1-bit size field, which I have widened to 2 bits.
Differential Revision: http://reviews.llvm.org/D15014
llvm-svn: 254198
ARMv8.2-A adds new variants of the "at" (address translate) system
instruction, which take the PSTATE.PAN bit (added in ARMv8.1-A). These
are a required part of ARMv8.2-A, so no additional subtarget features
are required.
Differential Revision: http://reviews.llvm.org/D15018
llvm-svn: 254159
ARMv8.2-A adds a new PSTATE bit, PSTATE.UAO, which allows the LDTR/STTR
instructions to behave the same as LDR/STR with respect to execute-only
pages at higher privilege levels. New variants of the MSR/MRS
instructions are added to allow reading and writing this bit. It is a
required part of ARMv8.2-A, so no additional subtarget features are
required.
Differential Revision: http://reviews.llvm.org/D15020
llvm-svn: 254157
ARMv8.2-A adds the "dc cvap" instruction, which is a system instruction
that cleans caches to the point of persistence (for systems that have
persistent memory). It is a required part of ARMv8.2-A, so no additional
subtarget features are required.
Differential Revision: http://reviews.llvm.org/D15016
llvm-svn: 254156
ARMv8.2-A adds a new ID register, ID_A64MMFR2_EL1, which behaves in the
same way as ID_A64MMFR0_EL1 and ID_A64MMFR1_EL1. It is a required part
of ARMv8.2-A, so no additional subtarget features are required.
Differential Revision: http://reviews.llvm.org/D15017
llvm-svn: 254155
This adds subtarget features for ARMv8.2-A, which builds on (and
requires the features from) ARMv8.1-A. Most assembler-visible features
of ARMv8.2-A are system instructions, and are all required parts of the
architecture, so just depend on the HasV8_2aOps subtarget feature. There
is also one large, optional feature, which adds 16-bit floating point
versions of all existing floating-point instructions (VFP and SIMD),
this is represented by the FeatureFullFP16 subtarget feature.
Differential Revision: http://reviews.llvm.org/D15013
llvm-svn: 254154
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
llvm-svn: 254085
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.
Differential revision: http://reviews.llvm.org/D14361
llvm-svn: 253965
This change merges adjacent zero stores into a wider single store.
For example :
strh wzr, [x0]
strh wzr, [x0, #2]
becomes
str wzr, [x0]
This will fix PR25410.
llvm-svn: 253711
Summary :
* Rename isSmallTypeLdMerge() to isNarrowLoad().
* Rename NumSmallTypeMerged to NumNarrowTypePromoted.
* Use Subtarget defined as a member variable.
llvm-svn: 253587
This change extends r251438 to handle more narrow load promotions
including byte type, unscaled, and signed. For example, this change will
convert :
ldursh w1, [x0, #-2]
ldurh w2, [x0, #-4]
into
ldur w2, [x0, #-4]
asr w1, w2, #16
and w2, w2, #0xffff
llvm-svn: 253577
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Reviewed by Hal Finkel.
llvm-svn: 253511
This patch adds a cost estimate for some missing sign and zero extensions. The
costs were determined by counting the number of shift instructions generated
without context for each new extension.
Differential Revision: http://reviews.llvm.org/D14730
llvm-svn: 253482
SELECT_CC has the nasty property of having operands with unrelated
types. So if you do something like:
f32 = select_cc f16, f16, f32, f32, cc
You'd only look for the action for <select_cc, f32>, but never f16.
If the types are all legal, but the op isn't (as for f16 on AArch64,
or for f128 on x86_64/AArch64?), then you get into trouble.
For f128, we have softenSetCCOperands to handle this case.
Similarly, for f16, we can directly promote the CC operands.
llvm-svn: 253344
Currently, if the assembler encounters an error after parsing (such as an
out-of-range fixup), it reports this as a fatal error, and so stops after the
first error. However, for most of these there is an obvious way to recover
after emitting the error, such as emitting the fixup with a value of zero. This
means that we can report on all of the errors in a file, not just the first
one. MCContext::reportError records the fact that an error was encountered, so
we won't actually emit an object file with the incorrect contents.
Differential Revision: http://reviews.llvm.org/D14717
llvm-svn: 253328
Storing the source location of the expression that created a constant pool
entry allows us to emit better error messages if we later discover that the
expression cannot be represented by a relocation.
Differential Revision: http://reviews.llvm.org/D14646
llvm-svn: 253220
The MCValue class can store a SMLoc to allow better error messages to be
emitted if an error is detected after parsing. The ARM and AArch64 assembly
parsers were not setting this, so error messages did not have source
information.
Differential Revision: http://reviews.llvm.org/D14645
llvm-svn: 253219
The AArch64 assembler was silently ignoring instructions like this:
ldr foo, =bar
AArch64AsmParser::parseOperand was returning true as the parse failed, but was
not calling AArch64AsmParser::Error to report this to the user, so the
instruction was ignored without printing an error message.
Differential Revision: http://reviews.llvm.org/D14651
llvm-svn: 253193
MCRelaxableFragment previously kept a copy of MCSubtargetInfo and
MCInst to enable re-encoding the MCInst later during relaxation. A copy
of MCSubtargetInfo (instead of a reference or pointer) was needed
because the feature bits could be modified by the parser.
This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment
with a constant reference to MCSubtargetInfo. The copies of
MCSubtargetInfo are kept in MCContext, and the target parsers are now
responsible for asking MCContext to provide a copy whenever the feature
bits of MCSubtargetInfo have to be toggled.
With this patch, I saw a 4% reduction in peak memory usage when I
compiled verify-uselistorder.lto.bc using llc.
rdar://problem/21736951
Differential Revision: http://reviews.llvm.org/D14346
llvm-svn: 253127
MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a
member function getSTI.
This is done in preparation for making changes to shrink the size of
MCRelaxableFragment. (see http://reviews.llvm.org/D14346).
llvm-svn: 253124
Darwin reserves x18, so it's never ABI compliant to generate code that
uses it. Set the default value based on the OS part of the triple
rather than forcing front-ends to set the +reserve-x18 target feature
in order to build correct code for Darwin.
This will make r243310 redundant, so I'll revert that shortly.
llvm-svn: 253102
AArch64 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any AArch64
implementation.
The net result of allowing this speculation for the regression tests in this
patch is that we get this code:
ctlz:
clz w0, w0
ret
cttz:
rbit w8, w0
clz w0, w8
ret
Instead of:
ctlz:
cbz w0, .LBB0_2
clz w0, w0
ret
.LBB0_2:
orr w0, wzr, #0x20
ret
cttz:
cbz w0, .LBB1_2
rbit w8, w0
clz w0, w8
ret
.LBB1_2:
orr w0, wzr, #0x20
ret
See D14469 for the larger motivation.
Differential Revision: http://reviews.llvm.org/D14505
llvm-svn: 252625
For big-endian targets, when we merge two halfword loads into a word load, the
order of the halfwords in the loaded value is reversed compared to
little-endian, so the load-store optimiser needs to swap the destination
registers.
This does not affect merging of two word loads, as we use ldp, which treats the
memory as two separate 32-bit words.
llvm-svn: 252597
AArch64 has the ability to use the top 8-bits of an "address" for extra
information, with the memory subsystem automatically masking them off for loads
and stores. When that's happening, we can sometimes skip masks on memory
operations in the compiler.
However, this requires the host OS and support stack to preserve those bits so
it can't be enabled everywhere. In principle iOS 8.0 and above do take the
required precautions and but we'll put it under a flag for now.
llvm-svn: 252573
Summary:
This matches the sum-of-absdiff patterns emitted by the vectoriser using log2 shuffles.
Relies on D14207 to be able to match the `extract_subvector(..., 0)`
Reviewers: t.p.northover, jmolloy
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14208
llvm-svn: 252465
Summary:
Lowering this pattern early to an `EXTRACT_SUBREG` was making it impossible to match larger patterns in tblgen that use `extract_subvector(..., 0)` as part of the their input pattern.
It seems like there will exist somewhere a better way of specifying this pattern over all relevant register value types, but I didn't manage to find it.
Reviewers: t.p.northover, jmolloy
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14207
llvm-svn: 252464
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
We used to try to constant-fold them to i32 immediates.
Given that fast-isel doesn't otherwise support vNi1, when selecting
the result users, we'd fallback to SDAG anyway.
However, if the users were in another block, we'd insert broken
cross-class copies (GPR32 to FPR64).
Give up, let SDAG agree with itself on a vNi1 legalization strategy.
llvm-svn: 252364
The benefit from converting narrow loads into a wider load (r251438) could be
micro-architecturally dependent, as it assumes that a single load with two bitfield
extracts is cheaper than two narrow loads. Currently, this conversion is
enabled only in cortex-a57 on which performance benefits were verified.
llvm-svn: 252316
Also, remove an enum hack where enum values were used as indexes into an array.
We may want to make this a real class to allow pattern-based queries/customization (D13417).
llvm-svn: 252196
Summary:
This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it.
D11268 is quite old and has merge conflicts against the current trunk.
This request
- rebases D11268 onto the new trunk;
- resolves the merge conflicts;
- fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct.
Reviewers: echristo, rengolin, kubabrecka
Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl
Differential Revision: http://reviews.llvm.org/D14338
llvm-svn: 252177
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.
llvm-svn: 251490
This recommits r250719, which caused a failure in SPEC2000.gcc
because of the incorrect insert point for the new wider load.
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
ldrh w0, [x2]
ldrh w1, [x2, #2]
becomes
ldr w0, [x2]
ubfx w1, w0, #16, #16
and w0, w0, #ffff
llvm-svn: 251438
When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights.
We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled.
In this patch, a new interface addSuccessorWithoutWeight(MBB*) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list.
Differential revision: http://reviews.llvm.org/D13963
llvm-svn: 251429
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.
Reviewers: rengolin, t.p.northover
Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14082
llvm-svn: 251401
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.
While there make use of ArrayRef and std::find_if.
llvm-svn: 251382
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.
This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.
This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.
The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.
llvm-svn: 251324
It turned out not to improve any of our benchmarks but occasionally led
to increased register pressure and spilling.
Only enabling for the Cyclone CPU as the results on the cortex CPUs
give mixed results.
Differential Revision: http://reviews.llvm.org/D13708
llvm-svn: 251038
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
ldrh w0, [x2]
ldrh w1, [x2, #2]
becomes
ldr w0, [x2]
ubfx w1, w0, #16, #16
and w0, w0, #ffff
llvm-svn: 250719
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.
This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.
This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.
llvm-svn: 250456
After r249764, if you didn't see the full context, it looked like
`std::next(I)` would get the same result as
`++MachineBasicBlock::iterator(I)`. However, `I` is a `MachineInstr*`
(not a `MachineBasicBlock::iterator`).
Use the `getIterator()` helper I added later (r249782) to make this code
more clear.
llvm-svn: 249852
Stop using `getNextNode()` to get an insertion point (at least, in this
one place). Instead, use iterator logic directly.
The `getNextNode()` interface isn't actually supposed to work for
creating iterators; it's supposed to return `nullptr` (not a real
iterator) if this is the last node. It's currently broken and will
"happen" to work, but if we ever fix the function, we'll get some
strange failures in places like this.
llvm-svn: 249764
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.
Differential Revision: http://reviews.llvm.org/D13508
llvm-svn: 249550
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.
Differential Revision: http://reviews.llvm.org/D13011
llvm-svn: 249313
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.
With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.
In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.
This patch is a joint work by Maxim Ostapenko andy myself.
llvm-svn: 249303
Support for pairing unscaled loads and stores has been enabled since the
original ARM64 port. This feature is no longer experimental, AFAICT.
llvm-svn: 249049
Previously, the index was constrained to the size of the memory operation for
no apparent reason. This change removes that constraint so that we can form
pre-index instructions with any valid offset.
llvm-svn: 248931
The immediate in the load/store should be scaled by the size of the memory
operation, not the size of the register being loaded/stored. This change gets
us one step closer to forming LDPSW instructions. This change also enables
pre- and post-indexing for halfword and byte loads and stores.
llvm-svn: 248804
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).
The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.
This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.
The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.
Differential Revision: http://reviews.llvm.org/D12635
llvm-svn: 248622
The pre- and post-increment version update the base register, but the post-
version was defined incorrectly. There is no test case as we don't currently
generate these instructions, but I plan on changing that in the near future.
llvm-svn: 248528
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
llvm-svn: 248291
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice.
Differential Revision: http://reviews.llvm.org/D12969
llvm-svn: 248266
Summary:
For bitfield insert OR matching, check both operands for larger pattern
first before checking for smaller pattern.
Add pattern for unsigned bitfield insert-in-zero done with SHL+AND.
Resolves PR21631.
Reviewers: jmolloy, t.p.northover
Subscribers: aemerson, rengolin, llvm-commits, mcrosier
Differential Revision: http://reviews.llvm.org/D12908
llvm-svn: 248006
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247692
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247683
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).
Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)
Differential Revision: http://reviews.llvm.org/D12557
llvm-svn: 247429
First, we need to teach isFrameOffsetLegal about STNP.
It already knew about the STP/LDP variants, but those were probably
never exercised, because it's only the load/store optimizer that
generates STP/LDP, and the only user of the method is frame lowering,
which runs earlier.
The STP/LDP cases were wrong: they didn't take into account the fact
that they return two results, not one, so the immediate offset will be
the 4th operand, not the 3rd.
Follow-up to r247234.
llvm-svn: 247236
We could go through the load/store optimizer and match STNP where
we would have matched a nontemporal-annotated STP, but that's not
reliable enough, as an opportunistic optimization.
Insetad, we can guarantee emitting STNP, by matching them at ISel.
Since there are no single-input nontemporal stores, we have to
resort to some high-bits-extracting trickery to generate an STNP
from a plain store.
Also, we need to support another, LDP/STP-specific addressing mode,
base + signed scaled 7-bit immediate offset.
For now, only match the base. Let's make it smart separately.
Part of PR24086.
llvm-svn: 247231
Summary:
We are not scalarizing the wide selects in codegen for i16 and i32 and
therefore we can remove the amortization factor. We still have issues
with i64 vectors in codegen though.
Reviewers: mcrosier
Subscribers: mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12724
llvm-svn: 247156
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
llvm-svn: 246769
Some of the instructions use ' ', which drives OCD-me nuts.
Let's put an end to this.
NFC-ish: hopefully nobody cares about whitespace.
llvm-svn: 246686
This matches the ARM behavior. In both cases, the register is part
of the optional Performance Monitors extension, so, add the feature,
and enable it for the A-class processors we support.
Differential Revision: http://reviews.llvm.org/D12425
llvm-svn: 246555
Summary:
This change turns on by default interleaved access vectorization
for AArch64.
We also clean up some tests which were spedifically enabling this
behaviour.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12149
llvm-svn: 246542
The ISelLowering code turned insertion turned the element for the
lowest lane of a BUILD_VECTOR into an INSERT_SUBREG, this prohibited
the patterns for SCALAR_TO_VECTOR(Load) to match later. Restrict this
to cases without a load argument.
Reported in rdar://22223823
Differential Revision: http://reviews.llvm.org/D12467
llvm-svn: 246462
more than 2 instructions.
I introduced this regression a while back and did not noticed it because I
somehow forgot to push the initial test cases for the pass!
Fix that as well!
llvm-svn: 246239
Summary:
This change lowers the aarch64 integer vector min/max intrinsic nodes to
generic min/max nodes and replaces the intrinsic selection patterns with
the generic ones.
There should already be testing in place for this, so no further tests
were added.
Reviewers: jmolloy
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12276
llvm-svn: 246030
This should be no functional change but for the record: For three cases
in X86FastISel this will change the order in which the FalseMBB and
TrueMBB of a conditional branch is addedd to the successor/predecessor
lists.
llvm-svn: 245997
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.
This fixes http://llvm.org/PR24459
llvm-svn: 245641
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.
llvm-svn: 245640
We are already falling back to SelectionDAG when encountering an shift with UB.
This adds the same checks for shifts with UB that get folded into arithmetic or
logical operations.
This fixes rdar://problem/22345295.
llvm-svn: 245499
Since r244955, we try to use the short-form ErrorInfo when both
tries failed, and the long-form match failed on a suffix operand.
However, this means we sometimes mix ErrorInfo and MatchResult
(one manifestation of this being PR24498). Instead, restore both.
llvm-svn: 245469
This commit adds support for bit mask target flag serialization to the MIR
printer and the MIR parser. It also adds support for the machine operand's
target flag serialization to the AArch64 target.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 245383
Summary:
Increase the estimated costs for insert/extract element operations on
AArch64. This is motivated by results from benchmarking interleaved
accesses.
Add missing costs for zext/sext/trunc instructions and some integer to
floating point conversions. These costs were previously calculated
by scalarizing these operation and were affected by the cost increase of
the insert/extract element operations.
Reviewers: rengolin
Subscribers: mcrosier, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11939
llvm-svn: 245226
function.
This was the same as getFrameIndexReference, but without the FrameReg
output.
Differential Revision: http://reviews.llvm.org/D12042
llvm-svn: 245148
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.
llvm-svn: 245107
This reverts commit r245047.
It was failing on the darwin bots. The problem was that when running
./bin/llc -march=msp430
llc gets to
if (TheTriple.getTriple().empty())
TheTriple.setTriple(sys::getDefaultTargetTriple());
Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.
That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.
llvm-svn: 245062
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.
It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.
llvm-svn: 245047
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.
I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.
llvm-svn: 245035
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.
Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.
llvm-svn: 244955
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.
Follow-up to r243924, r243926, and r244858.
llvm-svn: 244860
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.
This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.
This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.
This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.
Reviewers: Akira Hatanaka
llvm-svn: 244693
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:
- Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
- f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!
llvm-svn: 244595
At this point the given Opc must be valid, otherwise we should
not look for a matching pair to form paired load or store.
Thanks to Chad to point out this piece of code!
llvm-svn: 244366
When we are not emitting the condition for the branch, because the condition is
in another BB or SDAG did the selection for us, then we have to mask the flag in
the register with AND.
This is required when the condition comes from a truncate, because SDAG only
truncates down to a legal size of i32.
This fixes rdar://problem/22161062.
llvm-svn: 244291
This reverts commit r243198 and 243304.
Turns out this wasn't the correct fix for this problem. It works only within
FastISel, but fails when the truncate is selected by SDAG.
llvm-svn: 244287
rather than 'unsigned' for their costs.
For something like costs in particular there is a natural "negative"
value, that of savings or saved cost. As a consequence, there is a lot
of code that subtracts or creates negative values based on cost, all of
which is prone to awkwardness or bugs when dealing with an unsigned
type. Similarly, we *never* want these values to wrap, as that would
cause Very Bad code generation (likely percieved as an infinite loop as
we try to emit over 2^32 instructions or some such insanity).
All around 'int' seems a much better fit for these basic metrics. I've
added asserts to ensure that at least the TTI interface never returns
negative numbers here. If we ever have a use case for negative numbers,
we can remove this, but this way a bug where someone used '-1' to
produce a 'very large' cost will be caught by the assert.
This passes all tests, and is also UBSan clean.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D11741
llvm-svn: 244080
To get the successors of a BB we currently do successors(BB) which
ultimately walks the successors of the BB's terminator.
This moves the iterator to TerminatorInst as thats what we're actually
using to do the iteration, and adds a member function to TerminatorInst
to allow us to iterate directly over successors given an instruction.
For example, we can now do
for (auto *Succ : BI->successors())
instead of
for (unsigned i = 0, e = BI->getNumSuccessors(); i != e; ++i)
Reviewed by Tobias Grosser.
llvm-svn: 244074
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.
IIRC, this pass is off by default, but it's still helpful when debugging.
llvm-svn: 244056
Summary: Among other things, this allows -print-after-all/-print-before-all to
dump IR around this pass.
This is the AArch64 version of r243052.
llvm-svn: 244041
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
Some are named "FP", others "SD", others still "FP*SD".
Rename all this to just use "FP", which, except for conversions
(which don't use this format naming scheme), implies "SD" anyway.
llvm-svn: 243936
It's already in SysRegMappings, no need to also have it in MSRMappings:
the latter is only used if we didn't find a match in the former.
llvm-svn: 243933
There's a bunch of code in LowerFCOPYSIGN that does smart lowering, and
is actually already vector-aware; let's use it instead of scalarizing!
The only interesting change is that for v2f32, we previously always used
use v4i32 as the integer vector type.
Use v2i32 instead, and mark FCOPYSIGN as Custom.
llvm-svn: 243926