Support for scalars was committed in r311154, this adds support for allowing
v4f16 vector types (thus avoiding conversions from/to single precision for
these types).
Differential Revision: https://reviews.llvm.org/D37145
llvm-svn: 312104
Add new predicate to more accurately model the scheduling around branches
and function calls and of loads and stores of pairs and integer
multiplications.
llvm-svn: 311944
Add new predicate to more accurately model the cost of arithmetic and
logical operations shifted left.
Differential revision: https://reviews.llvm.org/D37151
llvm-svn: 311943
Summary:
STRQro* instructions are slower than the alternative ADD/STRQui expanded
instructions on Falkor, so avoid generating them unless we're optimizing
for code size.
Reviewers: t.p.northover, mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D37020
llvm-svn: 311931
Instead of loading 0 from a constant pool, it's of course much better to
materialize it using an fmov and the zero register.
Thanks to Ahmed Bougacha for the suggestion.
Differential Revision: https://reviews.llvm.org/D37102
llvm-svn: 311662
This is a follow up patch of r311154 and introduces custom lowering of copysign
f16 to avoid promotions to single precision types when the subtarget supports
fullfp16.
Differential Revision: https://reviews.llvm.org/D36893
llvm-svn: 311646
Fix for copy-paste mistake in r311154; setOperationAction for fcos and frem f16
operands appeared twice (and it should be set to 'promote').
Differential Revision: https://reviews.llvm.org/D37071
llvm-svn: 311635
G_PHI has the same semantics as PHI but also has types.
This lets us verify that the types in the G_PHI are consistent.
This also allows specifying legalization actions for G_PHIs.
https://reviews.llvm.org/D36990
llvm-svn: 311596
Debugging AArch64 instruction legalization and custom lowering is really an
unpleasant experience because it shows nodes that appear out of thin air.
In commit r311444, some debug messages have been added to SelectionDAG, the
target independent part, and this patch adds some AArch64 specific messages.
Differential Revision: https://reviews.llvm.org/D36964
llvm-svn: 311533
Armv8.3-A adds instructions that convert a double-precision floating
point number to a signed 32-bit integer with round towards zero,
designed for improving Javascript performance.
Differential Revision: https://reviews.llvm.org/D36785
llvm-svn: 311448
This is a clean up of commit r311154; it's not necessary to pass HasFullFP16 as
an argument, instead just query the DAG.
Differential Revision: https://reviews.llvm.org/D36978
llvm-svn: 311438
The calling convention can be specified by the user in IR. Failing to support
a particular calling convention isn't a programming error, and so relying on
llvm_unreachable to catch and report an unsupported calling convention is not
appropriate.
Differential Revision: https://reviews.llvm.org/D36830
llvm-svn: 311435
If a struct would end up half in GPRs and half on SP the ABI says it should
actually go entirely on the stack. We were getting this wrong in GlobalISel
before, causing compatibility issues.
llvm-svn: 311388
Armv8.2-A adds FP16 support, i.e. f16 is not only a storage-only type, but it
also supports performing data processing on 16-bit floating-point quantities.
All the necessary (tablegen) groundwork of adding the ARMv8.2-A FP16 (scalar)
instructions was done in D15014. To take advantage of this, this patch avoids
promotion of f16 to f32 types when the subtarget supports FullFP16, which
enables instruction selection of these FP16 instructions.
Differential Revision: https://reviews.llvm.org/D36396
llvm-svn: 311154
This reverts commit e8fd20964798ca6d46d2729dd3a789707a6416da in an
attempt to appease the GlobalISel buildbot, which fails in the
test-suite with errors like
fpcmp: files differ without tolerance allowance
llvm-svn: 311151
The BaseAuthLoad instruction class was incorrectly passing an empty
constraint string to its parent, so I have corrected this. This makes
the DecodeAuthLoadWriteback function redundant, so I've also removed
it.
Differential Revision: https://reviews.llvm.org/D36741
llvm-svn: 311148
If a struct would end up half in GPRs and half on SP the ABI says it should
actually go entirely on the stack. We were getting this wrong in GlobalISel
before, causing compatibility issues.
llvm-svn: 311137
Summary:
Support the case where an operand of a pattern is also the whole of the
result pattern. In this case the original result and all its uses must be
replaced by the operand. However, register class restrictions can require
a COPY. This patch handles both cases by always emitting the copy and
leaving it for the register allocator to optimize.
The previous commit failed on Windows machines due to a flaw in the sort
predicate which allowed both A < B < C and B == C to be satisfied
simultaneously. The cause of this was some sloppiness in the priority order of
G_CONSTANT instructions compared to other instructions. These had equal priority
because it makes no difference, however there were operands had higher priority
than G_CONSTANT but lower priority than any other instruction. As a result, a
priority order between G_CONSTANT and other instructions must be enforced to
ensure the predicate defines a strict weak order.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D36084
llvm-svn: 311076
Summary:
Mark LoopDataPrefetch and AArch64FalkorHWPFFix passes as preserving
ScalarEvolution since they do not alter loop structure and should not
alter any SCEV values (though LoopDataPrefetch may introduce new
instructions that won't have cached SCEV values yet).
This can result in slight code differences, mainly w.r.t. nsw/nuw flags
on SCEVs, since these are computed somewhat lazily when a zext/sext
instruction is encountered. As a result, passes after the modified
passes may see SCEVs with more nsw/nuw flags present.
Reviewers: sanjoy, anemet
Subscribers: aemerson, rengolin, mzolotukhin, javed.absar, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D36716
llvm-svn: 311032
This reverts commit r310425, thus reapplying r310335 with a fix for link
issue of the AArch64 unittests on Linux bots when BUILD_SHARED_LIBS is ON.
Original commit message:
[GlobalISel] Remove the GISelAccessor API.
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.
NFC.
----
The fix for the link issue consists in adding the GlobalISel library in
the list of dependencies for the AArch64 unittests. This dependency
comes from the use of AArch64Subtarget that needs to know how
to destruct the GISel related APIs when being detroyed.
Thanks to Bill Seurer and Ahmed Bougacha for helping me reproducing and
understand the problem.
llvm-svn: 310969
As expected, this failed on the windows bots but the instrumentation showed
something interesting. The ADD8ri and INC8r rules are never directly compared
on the windows machines. That implies that the issue lies in transitivity of
the Compare predicate. I believe I've already verified that but maybe I missed
something.
llvm-svn: 310922
Summary:
Support the case where an operand of a pattern is also the whole of the
result pattern. In this case the original result and all its uses must be
replaced by the operand. However, register class restrictions can require
a COPY. This patch handles both cases by always emitting the copy and
leaving it for the register allocator to optimize.
The previous commit failed on the windows bots and this one is likely to fail
on those same bots. However, the added instrumentation should reveal a particular
isHigherPriorityThan() evaluation which I'm expecting to expose that
these machines are weighing priority of two rules differently from the
non-windows machines.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D36084
llvm-svn: 310919
This allows using semicolons for bundling up more than one
statement per line. This is used within the mingw-w64 project in some
assembly files that contain code for multiple architectures.
Differential Revision: https://reviews.llvm.org/D36366
llvm-svn: 310797
Two of the Windows bots are failing test\CodeGen\X86\GlobalISel\select-inc.mir
which should not have been affected by the change. Reverting while I investigate.
Also reverted r310735 because it builds on r310716.
llvm-svn: 310745
Summary:
Support the case where an operand of a pattern is also the whole of the
result pattern. In this case the original result and all its uses must be
replaced by the operand. However, register class restrictions can require
a COPY. This patch handles both cases by always emitting the copy and
leaving it for the register allocator to optimize.
Depends on D35833
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D36084
llvm-svn: 310716
Add assembler and disassembler support for the ARMv8.3-A pointer
authentication instructions.
Differential Revision: https://reviews.llvm.org/D36517
llvm-svn: 310709
The liveness-tracking code assumes that the registers that were saved
in the function's prolog are live outside of the function. Specifically,
that registers that were saved are also live-on-exit from the function.
This isn't always the case as illustrated by the LR register on ARM.
Differential Revision: https://reviews.llvm.org/D36160
llvm-svn: 310619
Added assembler and disassembler support for the new Release
Consistent processor consistent instructions, introduced with ARM
v8.3-A for AArch64.
Differential Revision: https://reviews.llvm.org/D36522
llvm-svn: 310575
This reverts commit r310115.
It causes a linker failure for the one of the unittests of AArch64 on one
of the linux bot:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429
: && /home/fedora/gcc/install/gcc-7.1.0/bin/g++ -fPIC
-fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long
-Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -O2
-L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined
-Wl,-O3 -Wl,--gc-sections
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o -o
unittests/Target/AArch64/AArch64Tests
lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn
lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn
lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn
lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn
lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread
lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread
-Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib
&& :
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0):
undefined reference to `vtable for llvm::LegalizerInfo'
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8):
undefined reference to `vtable for llvm::RegisterBankInfo'
The particularity of this bot is that it is built with
BUILD_SHARED_LIBS=ON
However, I was not able to reproduce the problem so far.
Reverting to unblock the bot.
llvm-svn: 310425
Before, the outliner would mark all instructions that read from/modify LR as
illegal. This doesn't handle W30, which overlaps with LR. This shouldn't be
outlined.
This commit fixes that by making modifiesRegister() and readsRegister() look at
W30 + take in a TRI argument. This makes sure that modifiesRegister() and
readsRegister() won't outline either of W30 and LR.
https://reviews.llvm.org/D36435
llvm-svn: 310422
Summary:
This patch enables the import of rules containing 'imm' operands that do not
constrain the acceptable values using predicates. Support for ImmLeaf will
arrive in a later patch.
Depends on D35681
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35833
llvm-svn: 310343
Add memory synchronization semantics to LSE Atomics.
The memory semantics feature will be added in a subsequent patch.
In this patch, several corrections were added to the existing LSE Atomics
implementation, based on the ARM Errata D11904 from 05/12/2017.
Patch by: steleman
Differential Revision: https://reviews.llvm.org/D35319
llvm-svn: 310167
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.
NFC.
llvm-svn: 310115
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.
llvm-svn: 309990
This reverts commit r309821.
My suggestion was wrong because it left the MachineOperands tied which
confused the verifier. Since there's no easy way to untie operands, the
original BuildMI solution is probably best.
llvm-svn: 309962
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
llvm-svn: 309911
The previous attempt, which made do with a single offset in
computeCalleeSaveRegisterPairs, wasn't quite enough. The previous
attempt only worked as long as CombineSPBump == true (since the
offset would be adjusted later in fixupCalleeSaveRestoreStackOffset).
Instead include the size for the fixed stack area used for win64
varargs in calculations in emitPrologue/emitEpilogue. The stack
consists of mainly three parts;
- AFI->getLocalStackSize()
- AFI->getCalleeSavedStackSize()
- FixedObject
Most of the places in the code which previously used the CSStackSize
now use PrologueSaveSize instead, which is the sum of the latter
two, while some cases which need exactly the middle one use
AFI->getCalleeSavedStackSize() explicitly instead of a local variable.
In addition to moving the offsetting into emitPrologue/emitEpilogue
(which fixes functions with CombineSPBump == false), also set the
frame pointer to point to the right location, where the frame pointer
and link register actually are stored. In addition to the prologue/epilogue,
this also requires changes to resolveFrameIndexReference.
Add tests for a function that keeps a frame pointer and another one
that uses a VLA.
Differential Revision: https://reviews.llvm.org/D35919
llvm-svn: 309744
Summary:
Most CPUs implementing AES fusion require instruction pairs of the form
AESE Vn, _
AESMC Vn, Vn
and
AESD Vn, _
AESIMC Vn, Vn
The constraint is added to AES(I)MC instructions which use the result of
an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which
constraint source and destination registers to be the same.
A nice side effect of this change is that now all possible pairs are
scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll
test case.
I had to update aes_load_store. The version I added initially was very
reduced and with the new constraint, AESE/AESMC could not be scheduled
back-to-back. I updated the test to be more realistic and still expose
the same scheduling problem as the initial test case.
Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga
Reviewed By: t.p.northover, evandro
Subscribers: aemerson, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35299
llvm-svn: 309495
Summary:
This change gives a 0.25% speedup on execution time, a 0.82% improvement
in benchmark scores and a 0.20% increase in binary size on a Cortex-A53.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite and a range of proprietary suites.
Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin
Reviewed By: rengolin
Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35568
llvm-svn: 309494
This commit
- Removes IsTailCall and replaces it with a target-defined unsigned
- Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall
- Adds a call class + frame class classification to OutlinedFunction and Candidate respectively
This accomplishes a couple things.
Firstly, we don't need the notion of *tail call* in the general outlining algorithm.
Secondly, we now can have different "outlining classes" for each candidate within a set of candidates.
This will make it easy to add new ways to outline sequences for certain targets and dynamically choose
an appropriate cost model for a sequence depending on the context that that sequence lives in.
Ultimately, this should get us closer to being able to do something like, say avoid saving the link
register when outlining AArch64 instructions.
llvm-svn: 309475
This NFC changeset standardizes the suffixes used for LSE Atomics
instructions.
It changes the existing suffixes - 'b', 'h', 's', 'd' - to the existing
standard 'B', 'H', 'W' and 'X'.
This changeset is the result of the code review discussion for D35319.
Patch by: steleman
Differential Revision: https://reviews.llvm.org/D35927
llvm-svn: 309384
This is some more cleanup in preparation for some actual
functional changes. This splits getOutliningBenefit into
two cost functions: getOutliningCallOverhead and
getOutliningFrameOverhead. These functions return the
number of instructions that would be required to call
a specific function and the number of instructions
that would be required to construct a frame for a
specific funtion. The actual outlining benefit logic
is moved into the outliner, which calls these functions.
The goal of refactoring getOutliningBenefit is to:
- Get us closer to getting rid of the IsTailCall flag
- Further split up "target-specific" things and
"general algorithm" things
llvm-svn: 309356
The (seldom-used) TBI-aware optimization had a typo lying dormant since
it was first introduced, in r252573: when asking for demanded bits, it
told TLI that it was running after legalize, where the opposite was
true.
This is an important piece of information, that the demanded bits
analysis uses to make assumptions about the node. r301019 added such an
assumption, which was broken by the TBI combine.
Instead, pass the correct flags to TLO.
llvm-svn: 309323
Summary:
Using c++11 enum classes ensures that only valid enum values are used
for ArchKind, ProfileKind, VersionKind and ISAKind. This removes the
need for checks that the provided values map to a proper enum value,
allows us to get rid of AK_LAST and prevents comparing values from
different enums. It also removes a bunch of static_cast
from unsigned to enum values and vice versa, at the cost of introducing
static casts to access AArch64ARCHNames and ARMARCHNames by ArchKind.
FPUKind and ArchExtKind are the only remaining old-style enum in
TargetParser.h. I think it's beneficial to keep ArchExtKind as old-style
enum, but FPUKind can be converted too, but this patch is quite big, so
could do this in a follow-up patch. I could also split this patch up a
bit, if people would prefer that.
Reviewers: rengolin, javed.absar, chandlerc, rovka
Reviewed By: rovka
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35882
llvm-svn: 309287
In COFF, a symbol offset can't be stored in the relocation (as is
done in ELF or MachO), but is stored as the immediate in the
instruction itself. The immediate in the ADRP thus is the symbol
offset in bytes, not in pages. For the PAGEOFFSET_12A/L relocations,
ignore any offset outside of the lowest 12 bits; they won't have any
effect on the ADD/LDR/STR instruction itself but only on the associated
ADRP.
This is similar to how the same issue is handled for MOVW/MOVT
instructions in ELF (see e.g. SVN r307713, and r307728 in lld).
This fixes "fixup out of range" errors while building larger object
files, where temporary symbols end up as a plain section symbol and
an offset, and fixes any cases where the symbol offset mean that
the actual target ended up on a different page than the symbol
itself.
Differential Revision: https://reviews.llvm.org/D35791
llvm-svn: 309105
Changing mask argument type from const SmallVectorImpl<int>& to
ArrayRef<int>.
This came up in D35700 where a mask is received as an ArrayRef<int> and
we want to pass it to TargetLowering::isShuffleMaskLegal().
Also saves a few lines of code.
llvm-svn: 309085
Create a dummy 8 byte fixed object for the unused slot below the first
stored vararg.
Alternative ideas tested but skipped: One could try to align the whole
fixed object to 16, but I haven't found how to add an offset to the stack
frame used in LowerWin64_VASTART.
If only the size of the fixed stack object size is padded but not the offset, via
MFI.CreateFixedObject(alignTo(GPRSaveSize, 16), -(int)GPRSaveSize, false),
PrologEpilogInserter crashes due to "Attempted to reset backwards range!".
This fixes misconceptions about where registers are spilled, since
AArch64FrameLowering.cpp assumes the offset from fixed objects is
aligned to 16 bytes (and the Win64 case there already manually aligns
the offset to 16 bytes).
This fixes cases where local stack allocations could overwrite callee
saved registers on the stack.
Differential Revision: https://reviews.llvm.org/D35720
llvm-svn: 308950
This patch removes unnecessary zero copies in BBs that are targets of b.eq/b.ne
and we know the result of the compare instruction is zero. For example,
BB#0:
subs w0, w1, w2
str w0, [x1]
b.ne .LBB0_2
BB#1:
mov w0, wzr ; <-- redundant
str w0, [x2]
.LBB0_2
Differential Revision: https://reviews.llvm.org/D35075
llvm-svn: 308849
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
It revealed a bug in the Localizer pass which has now been fixed.
This includes the fix for SUBREG_TO_REG committed separately last time.
llvm-svn: 308688
This generalizes an existing fix from ELF to MachO and COFF.
Test that an ADRP to a local symbol whose offset is known at assembly
time still produces relocations, both for MachO and COFF. Test that
an ADRP without a @page modifier on MachO fails (previously it
didn't).
Differential Revision: https://reviews.llvm.org/D35544
llvm-svn: 308518
Summary:
G_FMA was recently added to GlobalISel which enables the import of rules
involving fma. Add the mapping to allow it.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35130
llvm-svn: 308308
Rename the enum value from X86_64_Win64 to plain Win64.
The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.
Differential Revision: https://reviews.llvm.org/D34474
llvm-svn: 308208
Prevent store merge from merging stores into an invalid 128-bit store
(realized as a f128 value in the context of the noimplicitfloat
attribute). Previously, such stores are immediately split back into
valid stores.
llvm-svn: 308184
Restricting register class to PointerRegClass for memory operands.
Also fix the PointerRegClass for AArch64 from GPR64 to GPR64sp, since
XZR cannot hold a memory pointer while SP is.
Fixes PR33134.
Differential Revision: https://reviews.llvm.org/D34999
llvm-svn: 308060
Summary:
This patch is the first step in reducing HW prefetcher instruction tag
collisions in inner loops for Falkor. It adds a pass that annotates IR
loads with metadata to indicate that they are known to be strided loads,
and adds a target lowering hook that translates this metadata to a
target-specific MachineMemOperand flag.
A follow on change will use this MachineMemOperand flag to re-write
instructions to reduce tag collisions.
Reviewers: mcrosier, t.p.northover
Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34963
llvm-svn: 308059
Pass parameters properly in calls to such functions (pass all
floats in integer registers), and handle va_start properly (allocate
stack immediately below the arguments on the stack, to save the
register arguments into a single continuous array).
Differential Revision: https://reviews.llvm.org/D35006
llvm-svn: 307928
The AsmParser mnemonic spell checker was introduced in r307148 and enabled only
for ARM. This patch enables it for AArch64.
Differential Revision: https://reviews.llvm.org/D35357
llvm-svn: 307918
Summary: Add target hooks for printing and parsing target MMO flags.
Targets may override getSerializableMachineMemOperandTargetFlags() to
return a mapping from string to flag value for target MMO values that
should be serialized/parsed in MIR output.
Add implementation of this hook for AArch64 SuppressPair MMO flag.
Reviewers: bogner, hfinkel, qcolombet, MatzeB
Subscribers: mcrosier, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D34962
llvm-svn: 307877
Summary:
isFusion returns true if the subtarget supports any kind of instruction
fusion, similar to ARMSubtarget::isFusion. This was suggested in D34142.
This changes the current behavior slightly, because the macro fusion mutation
is now added to the PostRA MachineScheduler in case the subtarget supports
any kind of fusion. I think that makes sense because if the PostRA
MachineScheduler is run, there is potential that instructions scheduled back to
back are re-scheduled.
Reviewers: evandro, t.p.northover, joelkevinjones, joel_k_jones, steleman
Reviewed By: joelkevinjones
Subscribers: joel_k_jones, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34958
llvm-svn: 307842
A generic variant of IMPLICIT_DEF was added in r306875, but this
survives to selection and hits a `Cannot Select`. Add handling that
converts the note to a regular IMPLICIT_DEF.
llvm-svn: 307817
The issue is not if the value is pcrel. It is whether we have a
relocation or not.
If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.
llvm-svn: 307730
TreePatternNode considers them to be plain integers but MachineInstr considers
them to be a distinct kind of operand.
The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for
everything except GlobalISelEmitter (confirmed by diffing the tablegenerated
files). GlobalISelEmitter is currently unable to infer the type of operands in
the Dst pattern from the operands in the Src pattern.
llvm-svn: 307634
Add breaks - doesn't affect results as both GPR/FPU both check for 32/64 bit sizes. So will still default to GenericOps in the same way.
llvm-svn: 307484
Summary:
This change gives a 0.89% speed on execution time, a 0.94% improvement
in benchmark scores and a 0.62% increase in binary size on a Cortex-A57.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
The software optimization guide for the Cortex-A57 recommends 16 byte
branch alignment.
Reviewers: t.p.northover, mcrosier, javed.absar, kristof.beyls, sbaranga
Reviewed By: kristof.beyls
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34954
llvm-svn: 307389
Summary:
This change gives a 0.34% speed on execution time, a 0.61% improvement
in benchmark scores and a 0.57% increase in binary size on a Cortex-A72.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
The software optimization guide for the Cortex-A72 recommends 16 byte
branch alignment.
Reviewers: t.p.northover, kristof.beyls, rengolin, sbaranga, mcrosier, javed.absar
Reviewed By: kristof.beyls
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D34961
llvm-svn: 307380
Contrary to the stepForward()/stepBackward() method accumulate() doesn't
have a direction as defs, uses and clobbers all have the same effect.
Also improve the documentation comment.
llvm-svn: 307351
This fixes calls to external functions starting with a capital L,
fixing errors like this:
fatal error: error in backend: assembler label 'LocalFree' can not be undefined
Differential Revision: https://reviews.llvm.org/D35079
llvm-svn: 307317
Allows the MachineIRBuilder APIs to directly create registers (based on
LLT or TargetRegisterClass) as well as accept MachineInstrBuilders
and implicitly converts to register(with getOperand(0).getReg()).
Eg usage:
LLT s32 = LLT::scalar(32);
auto C32 = Builder.buildConstant(s32, 32);
auto Tmp = Builder.buildInstr(TargetOpcode::G_SUB, s32, C32,
OtherReg);
auto Tmp2 = Builder.buildInstr(Opcode, DstReg,
Builder.buildConstant(s32, 31)); ....
Only a few methods added for now.
Reviewed by Tim
llvm-svn: 307302
Summary:
Replace the matcher if-statements for each rule with a state-machine. This
significantly reduces compile time, memory allocations, and cumulative memory
allocation when compiling AArch64InstructionSelector.cpp.o after r303259 is
recommitted.
The following patches will expand on this further to fully fix the regressions.
Reviewers: rovka, ab, t.p.northover, qcolombet, aditya_nandakumar
Reviewed By: ab
Subscribers: vitalybuka, aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33758
llvm-svn: 307079
It looks like there are two target-independent but not GISel instructions that
need legalization, IMPLICIT_DEF and PHI. These are already anomalies since
their operands have important LLTs attached, so to make things more uniform it
seems like a good idea to add generic variants. Starting with G_IMPLICIT_DEF.
llvm-svn: 306875
Some conditional branch instructions generated by this pass are checking
the wrong condition code. The instructions TBZ and TBNZ are transformed
into B.GE and B.LT instead of B.PL and B.MI respectively. They should
only be checking the Negative bit.
Differential Revision: https://reviews.llvm.org/D34743
llvm-svn: 306550
Summary:
This is the llvm part of the initial implementation to support Windows ARM64 COFF format.
I will gradually add more functionality in subsequent patches.
Reviewers: ruiu, rnk, t.p.northover, compnerd
Reviewed By: ruiu, compnerd
Subscribers: aemerson, mgorny, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D34705
llvm-svn: 306490
This patch enables significant performance enhancements to the
Cavium ThunderX2T99 LLVM backend, as observed by running SPEC2K6,
by adding more detailed scheduling information.
Related Bugzilla bug: http://bugs.llvm.org/show_bug.cgi?id=32562
Patch by: steleman
Differential Revision: https://reviews.llvm.org/D31801
llvm-svn: 306462
This patch modifies the conditional compares pass so that it keeps successor
probabilities up-to-date after the conversion. Previously, successor
probabilities were being normalized to a uniform distribution, even though they
may have been heavily biased prior to the conversion (e.g., if one of the edges
was the back edge of a loop). This loss of information affected passes later in
the pipeline.
Differential Revision: https://reviews.llvm.org/D34109
llvm-svn: 306412
Summary:
After this patch, we finally have test cases that require multiple
instruction emission.
Depends on D33590
Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls
Subscribers: javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D33596
llvm-svn: 306388
When we forward a stored value to a load and eliminate it entirely we need to
make sure the liveness of the register is maintained all the way to its use.
Previously we only cleared liveness on the store doing the forwarding, but
there could be other killing uses in between.
We already do the right thing when the load has to be converted into something
else, it was just this one path that skipped it.
llvm-svn: 306318
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
llvm-svn: 306177
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.
A few examples:
add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS.
cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed.
add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses.
cbz w8, .LBB1_2 -> b.eq .LBB1_2
sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses.
tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2
In looking at all current sub-target machine descriptions, this transformation
appears to be either positive or neutral.
Differential Revision: https://reviews.llvm.org/D34220.
llvm-svn: 306144
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.
llvm-svn: 306120
Implemented support to AArch64 codegen for ARMv8.1 Large System
Extensions atomic instructions. Where supported, these instructions can
provide atomic operations with higher performance.
Currently supported operations include: fetch_add, fetch_or, fetch_xor,
fetch_smin, fetch_min/max (signed and unsigned), swap, and
compare_exchange.
This implementation implies sequential-consistency ordering, more
relaxed ordering is under development.
Subtarget->hasLSE is currently supported for Cavium ThunderX2T99.
Patch by Ananth Jasty.
Differential Revision: https://reviews.llvm.org/D33586
Change-Id: I82f6d3d64255622791ceb0715b7ab9f4dc4d4b2c
llvm-svn: 305893
There should be at most a single kill flag for the
promoted operand between the store/load pair.
Discussed in https://reviews.llvm.org/D34402.
llvm-svn: 305889
Summary:
This patch updates promoteLoadFromStore to use the store MachineOperand as the
source operand of the of the new instruction instead of creating a new
register MachineOperand. This way, the existing register flags are
preserved.
This fixes PR33468 (https://bugs.llvm.org/show_bug.cgi?id=33468).
Reviewers: MatzeB, t.p.northover, junbuml
Reviewed By: MatzeB
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34402
llvm-svn: 305885
Use llvm::make_unique to avoid ambiguity with MSVC.
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Differential Revision: https://reviews.llvm.org/D34144
llvm-svn: 305690
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB
Reviewed By: MatzeB
Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34144
llvm-svn: 305677
Summary:
Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back
gives a double digit speedup on benchmarks using those instructions on
Cortex-A processors. In GCC, this optimization is part of the generic
processor model as well.
This change should not have a major performance impact on processors
that do not optimize AES instruction pairs, although I only had access
to Cortex-A processors for benchmarking.
Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover
Reviewed By: evandro
Subscribers: sbaranga, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D33836
llvm-svn: 305457
The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr
rather than the SP, so we need to use different variants.
Situations where this actually comes up are rare enough (see test-case) that I
think falling back to DAG is fine.
llvm-svn: 305230
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.
Reviewers: chandlerc, rnk, reames
Reviewed By: reames
Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D33903
llvm-svn: 305189
Summary:
- Fix assertion failures on F16 to/from int types in FastISel by falling
back to regular ISel
- Add a testcase of various conversion cases with FastISel (-O0)
Reviewers: kristof.beyls, jmolloy, SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33734
llvm-svn: 305127
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.
Differential Revision: https://reviews.llvm.org/D33843
llvm-svn: 304864
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.
While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.
llvm-svn: 304247
Summary:
Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".
This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.
Reviewers: spatel, RKSimon, efriedma
Reviewed By: RKSimon
Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits
Differential Revision: https://reviews.llvm.org/D33530
llvm-svn: 304215
- Remove all uses of base sched model entries and set them all to
Unsupported so all the opcodes are described in
AArch64SchedFalkorDetails.td.
- Remove entries for unsupported half-float opcodes.
- Remove entries for unsupported LSE extension opcodes.
- Add entry for MOVbaseTLS (and set Sched in base td file entry to
WriteSys) and a few other pseudo ops.
- Fix a few FP load/store with reg offset entries to use the LSLfast
predicates.
- Add Q size BIF/BIT/BSL entries.
- Fix swapped Q/D sized CLS/CLZ/CNT/RBIT entires.
- Fix pre/post increment address register latency (this operand is
always dest 0).
- Fix swapped FCVTHD/FCVTHS/FCVTDH/FCVTDS entries.
- Fix XYZ resource over usage on LD[1-4] opcodes.
llvm-svn: 304108
- Rewrite livein calculation to use the computeLiveIns() helper
function. This is slightly less efficient but easier to reason about
and doesn't unnecessarily add pristine and reserved registers[1]
- Zero the status register at the beginning of the loop to make sure it
has a defined value.
- Remove kill flags of values that need to stay alive throughout the loop.
[1] An upcoming commit of mine will tighten the MachineVerifier to catch
these.
llvm-svn: 304048
Summary:
This is used in the Linux kernel, and effectively just means "print an
address". This brings back r193593.
Reviewed by: Renato Golin
Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls
Subscribers: aemerson, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D33558
llvm-svn: 303901
Summary:
This patch makes instruction fusion more aggressive by
* adding artificial edges between the successors of FirstSU and
SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps.
* updating PostGenericScheduler::tryCandidate to keep clusters together,
similar to GenericScheduler::tryCandidate.
This change increases the number of AES instruction pairs generated on
Cortex-A57 and Cortex-A72. This doesn't change code at all in
most benchmarks or general code, but we've seen improvement on kernels
using AESE/AESMC and AESD/AESIMC.
Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB
Reviewed By: evandro
Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33230
llvm-svn: 303618
This commit fixes a bug introduced in r301019 where optimizeLogicalImm
would replace a logical node's immediate operand that was CSE'd and
was also an operand of another node.
This commit fixes the bug by replacing the logical node instead of its
immediate operand.
rdar://problem/32295276
llvm-svn: 303607
Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were raised post-commit on r301750.
Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls
Reviewed By: qcolombet
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32861
llvm-svn: 303418
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handling a null TargetMachine call
`getAnalysisIfAvailable<TargetPassConfig>`.
* Passes not handling a null TargetMachine
`addRequired<TargetPassConfig>` and call
`getAnalysis<TargetPassConfig>`.
* MachineFunctionPasses now use MF.getTarget().
* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.
This fixes a crash when running `llc -start-before prologepilog`.
PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.
Related to PR30324.
Differential Revision: https://reviews.llvm.org/D33222
llvm-svn: 303360
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).
llvm-svn: 303118
ARM Neon has native support for half-sized vector registers (64 bits). This
is beneficial for example for 2D and 3D graphics. This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.
*** Performance Analysis
This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.
The results are with -O3 and PGO. A negative percentage is an improvement.
The testsuite was run with a sample size of 4.
** SPEC
* CFP2006/482.sphinx3 -3.34%
A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.
* CFP2000/177.mesa -3.34%
* CINT2000/256.bzip2 +6.97%
My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%. There are also other problems with the codegen in
this loop so there is further room for improvement.
** LLVM testsuite
* SingleSource/Benchmarks/Misc/ReedSolomon -10.75%
There are multiple small SLP vectorizations outside the hot code. It's a bit
surprising that it adds up to 10%. Some of this may be code-layout noise.
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%
The opt-viewer screenshot can be seen at F3218284. We start at a colder store
but the tree leads us into the hottest loop.
* MultiSource/Applications/lambda-0.1.3/lambda -2.68%
* MultiSource/Benchmarks/Bullet/bullet -2.18%
This is using 3D vectors.
* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%
Noise, binary is unchanged.
* MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90%
There is an additional SLP in the cold code. The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.
* MultiSource/Applications/aha/aha +1.63%
* MultiSource/Applications/JM/lencod/lencod +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark +1.15%
Differential Revision: https://reviews.llvm.org/D31965
llvm-svn: 303116
This caused PR33053.
Original commit message:
> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 303115
We were silently ignoring any features we couldn't match up, which led to
errors in an inline asm block missing the conventional "\n\t".
llvm-svn: 303108
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.
llvm-svn: 303073