In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved
moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo.
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Moving the default TargetTransformInfoImplBase implementation to a default
MCSubtarget implementation
Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC
and SystemZ. AArch64 already has a custom subtarget implementation, so its
custom TTI implementation is migrated to use the new facilities in BasicTTIImpl
to invoke its custom subtarget implementation. The custom TTI implementations
continue to exist for the other targets with this change. They are not moved
over to subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to the system
model defined by the target. With this change, the default MCSubtargetInfo
implementation essentially returns the defaults TargetTransformInfoImplBase used
to return. Existing users of TTI defaults will hit the defaults now in
MCSubtargetInfo. Targets that define their own custom TTI implementations won't
use the BasicTTIImpl implementations that route to the subtarget.
Once system models are in place for the targets that use these interfaces, their
custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 374205
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"
This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.
The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.
llvm-svn: 374091
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374017
SystemZPostRewrite needs to be run before (it may emit COPYs) the Post-RA
pseudo pass also at -O0, so it should be added in addPostRegAlloc().
Review: Ulrich Weigand
llvm-svn: 373182
With -pg -mfentry -mnop-mcount, a nop is emitted instead of the call to
fentry.
Review: Ulrich Weigand
https://reviews.llvm.org/D67765
llvm-svn: 372950
Merge more Select pseudo instructions in emitSelect() by allowing other
instructions between them as long as they do not clobber CC.
Debug value instructions are now moved down to below the new PHIs instead of
erasing them.
Review: Ulrich Weigand
https://reviews.llvm.org/D67619
llvm-svn: 372873
The recently announced IBM z15 processor implements the architecture
already supported as "arch13" in LLVM. This patch adds support for
"z15" as an alternate architecture name for arch13.
The patch also uses z15 in a number of places where we used arch13
as long as the official name was not yet announced.
llvm-svn: 372435
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
Since MBB was split *before* MI, the MI(s) will reside in JoinMBB (MBB) at
the point of erasing them, so calling StartMBB->erase() is actually wrong,
although it is "working" by all appearances.
Review: Ulrich Weigand
llvm-svn: 371995
SystemZExpandPseudo:s only job was to expand LOCRMux instructions into jump
sequences. This needs to be done if expandLOCRPseudo() or expandSELRPseudo()
fails to find a legal opcode (all registers "high" or "low"). This task has
now been moved to SystemZPostRewrite while removing the SystemZExpandPseudo
pass.
It is in fact preferred to expand these pseudos directly after register
allocation in SystemZPostRewrite since the hinted register combinations are
then not subject to later optimizations.
Review: Ulrich Weigand
https://reviews.llvm.org/D67432
llvm-svn: 371959
The result integer does not need to be the same width as the input.
AMDGPU, NVPTX, and Hexagon all have patterns working around the types
matching. GlobalISel defines these as being different type indexes.
llvm-svn: 371797
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67433
llvm-svn: 371608
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67267
llvm-svn: 371212
Summary:
This was discovered while introducing the llvm::Align type.
The original setMinFunctionAlignment used to take alignment as log2, looking at the comment it seems like instructions are to be 2-bytes aligned and not 4-bytes aligned.
Reviewers: uweigand
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67271
llvm-svn: 371204
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67229
llvm-svn: 371200
Handle the remaining cases also by handling asm goto in
SystemZInstrInfo::getBranchInfo().
Review: Ulrich Weigand
https://reviews.llvm.org/D67151
llvm-svn: 371048
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
Now that constrained fpto[su]i intrinsic are available,
add codegen support to the SystemZ backend.
In addition to pure back-end changes, I've also needed
to add the strict_fp_to_[su]int and any_fp_to_[su]int
pattern fragments in the obvious way.
llvm-svn: 370674
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
This fixes https://bugs.llvm.org/show_bug.cgi?id=42606 by extending
D64213. Instead of only checking if the carry comes from a matching
operation, we now check the full chain of carries. Otherwise we might
custom lower the outermost addcarry, but then generically legalize
an inner addcarry.
Differential Revision: https://reviews.llvm.org/D64658
llvm-svn: 365949
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Assembler/disassembler support for new instructions.
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of arch13 as host processor.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365932
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Adding a default "no information" subtarget implementation
Only a handful of targets use these interfaces currently: AArch64,
Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget
implementation, so its custom TTI implementation is migrated to use
the new facilities in BasicTTIImpl to invoke its custom subtarget
implementation. The custom TTI implementations continue to exist for
the other targets with this change. They are not moved over to
subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to
the system model defined by the target. With this change, the default
subtarget implementation essentially returns "no information" for
these interfaces. None of the existing users of TTI will hit that
implementation because they define their own custom TTI
implementations and won't use the BasicTTIImpl implementations.
Once system models are in place for the targets that use these
interfaces, their custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 365676
Only custom lower uaddo+addcarry or usubo+subcarry chains and leave
mixtures like usubo+addcarry or uaddo+subcarry to the generic
legalizer. Otherwise we run into issues because SystemZ uses
different CC values for carries and borrows.
Fixes https://bugs.llvm.org/show_bug.cgi?id=42512.
Differential Revision: https://reviews.llvm.org/D64213
llvm-svn: 365242
This implements a small enhancement to https://reviews.llvm.org/D55506
Specifically, while we were able to match strict FP nodes for
floating-point extend operations with a register as source, this
did not work for operations with memory as source.
That is because from regular operations, this is represented as
a combined "extload" node (which is a variant of a load SD node);
but there is no equivalent using a strict FP operation.
However, it turns out that even in the absence of an extload
node, we can still just match the operations explicitly, e.g.
(strict_fpextend (f32 (load node:$ptr))
This patch implements that method to match the LDEB/LXEB/LXDB
SystemZ instructions even when the extend uses a strict-FP node.
llvm-svn: 364450
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
Vector load/store instructions support an optional alignment field
that the compiler can use to provide known alignment info to the
hardware. If the field is used (and the information is correct),
the hardware may be able (on some models) to perform faster memory
accesses than otherwise.
This patch adds support for alignment hints in the assembler and
disassembler, and fills in known alignment during codegen.
llvm-svn: 363806
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
This reverts r362990 (git commit 374571301d)
This was causing linker warnings on Darwin:
ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)'
from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol
'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&),
std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)'
means the weak symbol cannot be overridden at runtime. This was likely caused by different translation
units being compiled with different visibility settings.
llvm-svn: 363028
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
llvm-svn: 362990
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
When the tiny code model is requested for a target machine that does not
support this, we get an error message (which is nice) but also this diagnostic
and request to submit a bug report:
fatal error: error in backend: Target does not support the tiny CodeModel
[Inferior 2 (process 31509) exited with code 0106]
clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
(gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74)
Target: arm-arm-none-eabi
Thread model: posix
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c
clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh
clang-9: note: diagnostic msg:
But this is not a bug, this is a feature. :-) Not only is this not a bug, this
is also pretty confusing. This patch causes just to print the fatal error and
not the diagnostic:
fatal error: error in backend: Target does not support the tiny CodeModel
Differential Revision: https://reviews.llvm.org/D62236
llvm-svn: 361370
Make sure to not unroll a vector division/remainder (with a constant splat
divisor) after type legalization, since the scalar type may then be illegal.
Review: Ulrich Weigand
https://reviews.llvm.org/D62036
llvm-svn: 360965
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360734
This adds the FPC (floating-point control register) as a reserved
physical register and models its use by SystemZ instructions.
Note that only the current rounding modes and the IEEE exception
masks are modeled. *Changes* of the FPC due to exceptions (in
particular the IEEE exception flags and the DXC) are not modeled.
At this point, this patch is mostly NFC, but it will prevent
scheduling of floating-point instructions across SPFC/LFPC etc.
llvm-svn: 360570
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360510
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.
Refactors a few subclasses to support the target independent %a, %c, and
%n.
The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.
It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.
Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449
Reviewers: echristo, void
Reviewed By: void
Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60887
llvm-svn: 359337
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).
The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60856
llvm-svn: 358744
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.
Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.
This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.
Reviewers: craig.topper
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60488
llvm-svn: 358101
This function is responsible for checking the legality of fusing an instance
of load -> op -> store into a single operation. In the SystemZ backend the
check was incomplete and a test case emerged with a cycle in the instruction
selection DAG as a result.
Instead of using the NodeIds to determine node relationships,
hasPredecessorHelper() now is used just like in the X86 backend. This handled
the failing tests and as well gave a few additional transformations on
benchmarks.
The SystemZ isFusableLoadOpStorePattern() is now a very near copy of the X86
function, and it seems this could be made a utility function in common code
instead.
Review: Ulrich Weigand
https://reviews.llvm.org/D60255
llvm-svn: 357688
When performing an add-with-overflow with an immediate in the
range -2G ... -4G, code currently loads the immediate into a
register, which generally takes two instructions.
In this particular case, it is preferable to load the negated
immediate into a register instead, which always only requires
one instruction, and then perform a subtract.
llvm-svn: 357597
For shift and rotate instructions that only use the last 6 bits of the shift
amount, a shift amount of (x*64-s) can be substituted with (-s). This saves
one instruction and a register:
lhi %r1, 64
sr %r1, %r3
sllg %r2, %r2, 0(%r1)
=>
lcr %r1, %r3
sllg %r2, %r2, 0(%r1)
Review: Ulrich Weigand
llvm-svn: 357481
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389
AMDGPU target run out of Subtarget feature flags hitting the limit of 64.
AssemblerPredicates uses at most uint64_t for their representation.
At the same time CodeGen has exhausted this a long time ago and switched
to a FeatureBitset with the current limit of 192 bits.
This patch completes transition to the bitset for feature bits extending
it to asm matcher and MC code emitter.
Differential Revision: https://reviews.llvm.org/D59002
llvm-svn: 355839
Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher.
Original commit message:
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355784
This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures.
llvm-svn: 355433
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355224
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.
This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58650
llvm-svn: 355099
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).
This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.
To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..
Previously landed
D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
D57596 [CodeGen] Be conservative about atomic accesses as for volatile
D57802 Be conservative about unordered accesses for the moment
rL353959: [Tests] First batch of cornercase tests for unordered atomics.
rL353966: [Tests] RMW folding tests w/unordered atomic operations.
rL353972: [Tests] More unordered atomic lowering tests.
rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
rL354740: [Hexagon, SystemZ] Be super conservative about atomics
rL354800: [Lanai] Be super conservative about atomics
rL354845: [ARM] Be super conservative about atomics
Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)
Differential Revision: https://reviews.llvm.org/D57601
llvm-svn: 355025
Since there is no "Load-and-Test-High" instruction, the 32 bit load of a
register to be compared with 0 can only be implemented with LT if the virtual
GRX32 register ends up in a low part (GR32 register).
This patch detects these cases and passes the GR32 registers (low parts) as
(soft) hints in getRegAllocationHints().
Review: Ulrich Weigand.
llvm-svn: 354935
This patch aims to make sure that any such constant that can be generated
with a vector instruction (for example VGBM) is recognized as such during
legalization and kept as a target independent node through post-legalize
DAGCombining.
Two new functions named isVectorConstantLegal() and loadVectorConstant()
replace old ways of handling vector/FP constants.
A new struct named SystemZVectorConstantInfo is used to cache the results of
isVectorConstantLegal() and pass them onto loadVectorConstant().
Support for fp128 constants in the presence of FeatureVectorEnhancements1
(z14) has been added.
Review: Ulrich Weigand
https://reviews.llvm.org/D58270
llvm-svn: 354896
As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.
llvm-svn: 354740
It seems there were some problem with using a .mir test. For some reason
doing '-stop-before=codegenprepare' and then '-start-before=codegenprepare'
on the output .mir file results in the NoVRegs Property after instruction
selection.
Recommitting the same test as an .ll file instead.
llvm-svn: 354160
isFPImmLegal() has been extended to recognize certain FP immediates that can
be built with VGM (Vector Generate Mask).
These scalar FP immediates (that were previously loaded from the constant
pool) are now selected as VGMF/VGMG in Select().
Review: Ulrich Weigand
https://reviews.llvm.org/D58003
llvm-svn: 353867
Since SystemZ supports counting of leading zeros with the FLOGR instruction,
isCheapToSpeculateCtlz() should return true, which it now does.
ISD::CTLZ_ZERO_UNDEF i32 is now handled the same way as ISD::CTLZ is, which
is needed since promotion to i64 is required and CTLZ_ZERO_UNDEF is only
expanded to CTLZ if it is Legal or Custom.
Review: Ulrich Weigand
https://reviews.llvm.org/D57710
llvm-svn: 353330
Don't lower BUILD_VECTORs to BYTE_MASK, but instead expose the BUILD_VECTORs
to the DAGCombiner and select them to VGBM in Select(). This allows the
DAGCombiner to understand the constant vector values.
For floating point, only all-zeros vectors are now generated with VGBM, as it
turned out to be somewhat complicated to handle any arbitrary constants,
while in practice this is very rare and hardly needed.
The SystemZ ISD opcodes z_byte_mask, z_vzero and z_vones have been removed.
Review: Ulrich Weigand
https://reviews.llvm.org/D57152
llvm-svn: 353325
The IPM sequence currently generated to compute the strcmp/memcmp
result will return INT_MIN for the "less than zero" case. While
this is in compliance with the standard, strictly speaking, it
turns out that common applications cannot handle this, e.g. because
they negate a comparison result in order to implement reverse
compares.
This patch changes code to use a different sequence that will result
in -2 for the "less than zero" case (same as GCC). However, this
requires that the two source operands of the compare instructions
are inverted, which breaks the optimization in removeIPMBasedCompare.
Therefore, I've removed this (and all of optimizeCompareInstr), and
replaced it with a mostly equivalent optimization in combineCCMask
at the DAGcombine level.
llvm-svn: 353304
I'm looking at adding a second INLINEASM opcode for better modeling asm-goto
as a terminator. Using the existing predicate will reduce teh number of
places that will need to use the new opcode.
llvm-svn: 353095
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
Summary:
Like with X86, this allows better DAG-level alias analysis and
alignment inference for wrapped addresses.
Reviewers: jonpa, uweigand
Reviewed By: uweigand
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D57407
llvm-svn: 352786
After creating new PHI instructions during isel pseudo expansion, the NoPHIs
property of MF should be reset in case it was previously set.
Review: Ulrich Weigand
llvm-svn: 352030
Two backend optimizations failed to handle cases when compiled with -g, due
to failing to consider DBG_VALUE instructions. This was in
SystemZTargetLowering::emitSelect() and
SystemZElimCompare::getRegReferences().
This patch makes sure that DBG_VALUEs are recognized so that they do not
affect these optimizations.
Tests for branch-on-count, load-and-trap and consecutive selects.
Review: Ulrich Weigand
https://reviews.llvm.org/D57048
llvm-svn: 351928
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
There are several vector instructions which may or may not set the
condition code register, depending on the value of an argument.
For codegen, we use two versions of the instruction, one that sets
CC and one that doesn't, which hard-code appropriate values of that
argument. But we also have a "generic" version of the instruction
that is used for the assembler/disassembler. These generic versions
should always be considered to clobber CC just to be safe.
llvm-svn: 349761
This patch fixes two deficiencies in current code that recognizes
the VLLEZ idiom:
- For the floating-point versions, we have ISel patterns that match
on a bitconvert as the top node. In more complex cases, that
bitconvert may already have been merged into something else.
Fix the patterns to match the inner nodes instead.
- For the 64-bit integer versions, depending on the surrounding code,
we may get either a DAG tree based on JOIN_DWORDS or one based on
INSERT_VECTOR_ELT. Use a PatFrags to simply match both variants.
llvm-svn: 349749
Current code in SystemZDAGToDAGISel::tryGather refuses to perform
any transformation if the Load SDNode has more than one use. This
(erronously) counts uses of the chain result, which prevents the
optimization in many cases unnecessarily. Fixed by this patch.
llvm-svn: 349748
We already have special code (DAG combine support for FP_ROUND)
to recognize cases where we an use a vector version of VLEDB to
perform two floating-point truncates in parallel, but equivalent
support for VLEDB (vector floating-point extends) has been
missing so far. This patch adds corresponding DAG combine
support for FP_EXTEND.
llvm-svn: 349746
When computing register allocation hints for a GRX32Bit register, make sure
that any of the hinted registers that are also copy hints are returned first
in the list.
Review: Ulrich Weigand.
llvm-svn: 349037
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.
This patch returns 0 cost for the icmp in these cases.
Review: Ulrich Weigand
https://reviews.llvm.org/D55111
llvm-svn: 348141
Three minor changes to these extra costs:
* For ICmp instructions, instead of adding 2 all the time for extending each
operand, this is only done if that operand is neither a load or an
immediate.
* The operands extension costs for divides removed, because we now use a high
cost already for the divide (20).
* The costs for lhsr/ashr extra costs removed as this did not seem useful.
Review: Ulrich Weigand
https://reviews.llvm.org/D55053
llvm-svn: 347961
CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value
in memory.
This patch makes such a load considered foldable and so gets a 0 cost.
Review: Ulrich Weigand
https://reviews.llvm.org/D54944
llvm-svn: 347735
AH, SH and MH costs are already covered in the cases where LHS is 32 bits and
RHS is 16 bits of memory sign-extended to i32.
As these instructions are also used when LHS is i16, this patch recognizes
that the loads will get folded then as well.
Review: Ulrich Weigand
https://reviews.llvm.org/D54940
llvm-svn: 347734
Single instructions exist for i8 and i16 comparisons of memory against a
small immediate.
This patch makes sure that if the load in these cases has a single user (the
ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.
Review: Ulrich Weigand
https://reviews.llvm.org/D54897
llvm-svn: 347733
Since byte-swapping loads and stores are supported, a 'load -> bswap' or
'bswap -> store' sequence should have the cost of one.
Review: Ulrich Weigand
https://reviews.llvm.org/D54870
llvm-svn: 347732
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.
For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, apilipenko, reames
Reviewed By: reames
Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53892
llvm-svn: 347584
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.
Review: Ulrich Weigand
https://reviews.llvm.org/D54789
llvm-svn: 347445
If a loaded value is replicated it is best to combine these two operations
into a VLREP (load and replicate), but isel will not produce this if the load
has other users as well.
This patch handles this by putting the other users of the load to use the
REPLICATE 0-element instead of the load. This way the load has only the
REPLICATE node as user, and we get a VLREP.
Review: Ulrich Weigand
https://reviews.llvm.org/D54264
llvm-svn: 346746
Improve getCastInstrCost() by respecting the different types of Src and Dst
for vector integer <-> fp conversions.
This means that extracting from integer becomes more expensive (by the
extraction penalty), and the extraction from fp becomes cheaper (no longer
has a false extraction penalty).
Review: Ulrich Weigand
https://reviews.llvm.org/D54423
llvm-svn: 346663
Iterate over all elements and count the number of uses among them for each
used load. Then make sure to REPLICATE the load which has the most uses in
order to minimize the number of needed element insertions.
Review: Ulrich Weigand
https://reviews.llvm.org/D54322
llvm-svn: 346637
A minor improvement of buildVector() that skips creating an
INSERT_VECTOR_ELT for a Value which has already been used for the
REPLICATE.
Review: Ulrich Weigand
https://reviews.llvm.org/D54315
llvm-svn: 346504
It was discovered in randomized testing that the SystemZ implementation of
shouldCoalesce() could be caused to crash when subreg liveness was
enabled. This was because an undef use of the virtual register was copied
outside current MBB at the point of shouldCoalesce() being called. For more
details, see https://bugs.llvm.org/show_bug.cgi?id=39276.
This patch changes the check for MBB locality from livein/liveout checks to
do checks for all instructions of both intervals being inside MBB. This
avoids the cases with dead defs / undef uses outside MBB, which are not
affecting liveness in/out of MBB.
The original test case included as a reduced .mir test case.
Review: Ulrich Weigand
https://reviews.llvm.org/D54197
llvm-svn: 346406
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.
llvm-svn: 346180
Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load.
Since the load already does the extension, there is no extra cost (previously
returned 2).
Review: Ulrich Weigand
https://reviews.llvm.org/D54028
llvm-svn: 346009
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.
This makes it more readable and just slightly more accurate generally.
Review: Ulrich Weigand
https://reviews.llvm.org/D53071
llvm-svn: 345998
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
of only 'break'.
We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
the outer case.
I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.
Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu
Differential Revision: https://reviews.llvm.org/D53950
llvm-svn: 345882
Scalar i1 to fp conversions are done with a branch sequence, so it should
have a higher cost.
Review: Ulrich Weigand
https://reviews.llvm.org/D53924
llvm-svn: 345818
This factors out a new method getBoolVecToIntConversionCost() containing the
code for vector sext/zext of i1, in order to reuse it for i1 to double vector
conversions.
Review: Ulrich Weigand
https://reviews.llvm.org/D53923
llvm-svn: 345817
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
The LRV and STRV nodes carry an extra operand to indicate the
type of the memory access. This is redundant, since the nodes
are actually of class MemIntrinsicNode and therefore hold that
same information already as MemoryVT.
NFC intended.
llvm-svn: 345618
Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a
load. This patch adds a check for this.
Review: Ulrich Weigand
https://reviews.llvm.org/D53791
llvm-svn: 345596
SystemZAsmParser can now handle -debug by printing the operands neatly to the
output stream. Before this patch this lead to an llvm_unreachable().
It seems that now '-mllvm -debug' does not cause any crashes anywhere (at
least not on SPEC).
Review: Ulrich Weigand
https://reviews.llvm.org/D53328
llvm-svn: 345349
In order to print the IR slot number for the memory operand, the DAG pointer
must be passed to SDNode::dump().
The isel-debug.ll test updated to also check for the IR Value reference being
printed correctly.
Review: Ulrich Weigand
https://reviews.llvm.org/D53333
llvm-svn: 345347
The SystemZ backend can do arithmetic of memory by loading and then extending
one of the operands. Similarly, a load + truncate can be folded into an
operand.
This patch improves the SystemZ TTI cost function to recognize this.
Review: Ulrich Weigand
https://reviews.llvm.org/D52692
llvm-svn: 345327
Enable the DAG optimization that converts vector div/rem with constants into
multiply+shifts sequences by expanding them early. This is needed since
ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be
available to BuildSDIV after legalization.
Better cost values for these instructions based on how they will be
implemented (a constant divisor is cheaper).
Review: Ulrich Weigand
https://reviews.llvm.org/D53196
llvm-svn: 345321
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
llvm-svn: 345218
interleave-group
The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.
Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53011
llvm-svn: 344472
Until mischeduler is clever enough to avoid spilling in a vectorized loop
with many (scalar) DLRs it is better to avoid high vectorization factors (8
and above).
llvm-svn: 344129
A new function getNumVectorRegs() is better to use for the number of needed
vector registers instead of getNumberOfParts(). This is to make sure that the
number of vector registers (and typically operations) required for a vector
type is accurate.
getNumberOfParts() which was previously used works by splitting the vector
type until it is legal gives incorrect results for types with a non
power of two number of elements (rare).
A new static function getScalarSizeInBits() that also checks for a pointer
type and returns 64U for it since otherwise it gets a value of 0). Used in a
few places where Ty may be pointer.
Review: Ulrich Weigand
llvm-svn: 344115
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.
NFC.
Review: Simon Pilgrim
https://reviews.llvm.org/D52316
llvm-svn: 343851
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.
This involves sext, zext and xor of i1.
The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.
Review: Ulrich Weigand
https://reviews.llvm.org/D51339
llvm-svn: 342207
a generically extensible collection of extra info attached to
a `MachineInstr`.
The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.
Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.
I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).
Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.
This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.
The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.
Differential Revision: https://reviews.llvm.org/D50701
llvm-svn: 339940
Change
subreg_r32 -> subreg_h32
subreg_r64 -> subreg_h64
subreg_hr32 -> subreg_hh32
The subregisters subreg_r32 and subreg_r64 were added to emphasize the
fact that modifying these subregisters may clobber the entire register.
This is not necessarily the case for subreg_h32, et al.
However, the ability to compose subreg_h64 with subreg_r32, and with
subreg_h32 and subreg_l32 at the same time makes the compositions be
treated as non-overlapping (leading to problems when tracking subreg
liveness). See D50468 for more details.
Differential Revision: https://reviews.llvm.org/D50725
llvm-svn: 339778
This option is needed to enable subreg liveness tracking during register
allocation.
Review: Ulrich Weigand
https://reviews.llvm.org/D50779
llvm-svn: 339776
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.
Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.
This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.
This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.
As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]
Differential Revision: https://reviews.llvm.org/D50680
llvm-svn: 339740
Remove the redundant check against zero when updating ProcResourceCounters in
nextGroup(), as pointed out in https://reviews.llvm.org/D50187.
Review: Ulrich Weigand.
llvm-svn: 339139
Some instructions expand to more than one decoder group.
This has been hitherto ignored, but is handled with this patch.
Review: Ulrich Weigand
https://reviews.llvm.org/D50187
llvm-svn: 338849
The DAG combiner logic to simplify AND masks in shift counts is invalid.
While it is true that the SystemZ shift instructions ignore all but the
low 6 bits of the shift count, it is still invalid to simplify the AND
masks while the DAG still uses the standard shift operators (which are
*not* defined to match the SystemZ instruction behavior).
Instead, this patch performs equivalent operations during instruction
selection. For completely removing the AND, this now happens via
additional DAG match patterns implemented by a multi-alternative
PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG
patterns were already mostly OK, they just needed an output XForm to
actually truncate the immediate value.
Unfortunately, the latter change also exposed a bug in TableGen: it
seems XForms are currently only handled correctly for direct operands of
the outermost operation node. This patch also fixes that bug by simply
recurring through the whole pattern. This should be NFC for all other
targets.
Differential Revision: https://reviews.llvm.org/D50096
llvm-svn: 338521
Since z13, the max group size will be 2 if any μop has more than 3 register
sources.
This has been ignored sofar in the SystemZHazardRecognizer, but is now
handled by recognizing those instructions and adjusting the tracking of
decoding and the cost heuristic for grouping.
Review: Ulrich Weigand
https://reviews.llvm.org/D49847
llvm-svn: 338368
NFC changes to make scheduler TableGen files more readable, by using loops
instead of a lot of similar defs with just e.g. a latency value that changes.
https://reviews.llvm.org/D49598
Review: Ulrich Weigand, Javed Abshar
llvm-svn: 337909
Two minor issues: The new MCD SchedWrite name does not contain "Unit" like
all the others, so a check is needed. Also, print "LSU" instead of "LS".
Review: Ulrich Weigand
llvm-svn: 337700
As a consequence of recent discussions
(http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch
changes the SystemZ SchedModels so that the IssueWidth is 6, which is the
decoder capacity, and NumMicroOps become the number of decoder slots needed
per instruction.
In addition, the SchedWrite latencies now match the MachineInstructions
def-operand indexes, and ReadAdvances have been added on instructions with
one register operand and one memory operand.
Review: Ulrich Weigand
https://reviews.llvm.org/D47008
llvm-svn: 337538
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.
Differential Revision: https://reviews.llvm.org/D49262
llvm-svn: 337258
A TableGen instruction record usually contains a DAG pattern that will
describe the SelectionDAG operation that can be implemented by this
instruction. However, there will be cases where several different DAG
patterns can all be implemented by the same instruction. The way to
represent this today is to write additional patterns in the Pattern
(or usually Pat) class that map those extra DAG patterns to the
instruction. This usually also works fine.
However, I've noticed cases where the current setup seems to require
quite a bit of extra (and duplicated) text in the target .td files.
For example, in the SystemZ back-end, there are quite a number of
instructions that can implement an "add-with-overflow" operation.
The same instructions also need to be used to implement just plain
addition (simply ignoring the extra overflow output). The current
solution requires creating extra Pat pattern for every instruction,
duplicating the information about which particular add operands
map best to which particular instruction.
This patch enhances TableGen to support a new PatFrags class, which
can be used to encapsulate multiple alternative patterns that may
all match to the same instruction. It operates the same way as the
existing PatFrag class, except that it accepts a list of DAG patterns
to match instead of just a single one. As an example, we can now define
a PatFrags to match either an "add-with-overflow" or a regular add
operation:
def z_sadd : PatFrags<(ops node:$src1, node:$src2),
[(z_saddo node:$src1, node:$src2),
(add node:$src1, node:$src2)]>;
and then use this in the add instruction pattern:
defm AR : BinaryRRAndK<"ar", 0x1A, 0xB9F8, z_sadd, GR32, GR32>;
These SystemZ target changes are implemented here as well.
Note that PatFrag is now defined as a subclass of PatFrags, which
means that some users of internals of PatFrag need to be updated.
(E.g. instead of using PatFrag.Fragment you now need to use
!head(PatFrag.Fragments).)
The implementation is based on the following main ideas:
- InlinePatternFragments may now replace each original pattern
with several result patterns, not just one.
- parseInstructionPattern delays calling InlinePatternFragments
and InferAllTypes. Instead, it extracts a single DAG match
pattern from the main instruction pattern.
- Processing of the DAG match pattern part of the main instruction
pattern now shares most code with processing match patterns from
the Pattern class.
- Direct use of main instruction patterns in InferFromPattern and
EmitResultInstructionAsOperand is removed; everything now operates
solely on DAG match patterns.
Reviewed by: hfinkel
Differential Revision: https://reviews.llvm.org/D48545
llvm-svn: 336999
This is needed to get CC operand in right place, as expected by the
SchedModel.
Review: Ulrich Weigand
https://reviews.llvm.org/D47820
llvm-svn: 334161
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.
Differential Revision: https://reviews.llvm.org/D44928
llvm-svn: 334078
With this we gain a little flexibility in how the generic object
writer is created.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47045
llvm-svn: 332868
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47035
llvm-svn: 332857
Sorry, the commit comment for r332703 is completely broken.
My mind slipped - the right description would be:
In SystemZDAGToDAGISel::Select(), in the handling for SELECT_CCMASK:
Check if UpdateNodeOperands() returns a different SDNode and in that
case call ReplaceNode.
Review: Ulrich Weigand.
llvm-svn: 332706
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.
This patch has no new test case. I have run regression test and there is
no difference in regression test.
Differential Revision: https://reviews.llvm.org/D45342
Patch by Hsiangkai Wang.
llvm-svn: 331844
Inspired by r331508, I did a grep and found these.
Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.
llvm-svn: 331577
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
This provides an optimized implementation of SADDO/SSUBO/UADDO/USUBO
as well as ADDCARRY/SUBCARRY on top of the new CC implementation.
In particular, multi-word arithmetic now uses UADDO/ADDCARRY instead
of the old ADDC/ADDE logic, which means we no longer need to use
"glue" links for those instructions. This also allows making full
use of the memory-based instructions like ALSI, which couldn't be
recognized due to limitations in the DAG matcher previously.
Also, the llvm.sadd.with.overflow et.al. intrinsincs now expand to
directly using the ADD instructions and checking for a CC 3 result.
llvm-svn: 331203
Currently, an instruction setting the condition code is linked to
the instruction using the condition code via a "glue" link in the
SelectionDAG. This has a number of drawbacks; in particular, it
means the same CC cannot be used by multiple users. It also makes
it more difficult to efficiently implement SADDO et. al.
This patch changes the back-end to represent CC dependencies as
normal values during SelectionDAG matching, along the lines of
how this is handled in the X86 back-end already.
In addition to the core mechanics of updating all relevant patterns,
this requires a number of additional changes:
- We now need to be able to spill/restore a CC value into a GPR
if necessary. This means providing a copyPhysReg implementation
for moves involving CC, and defining getCrossCopyRegClass.
- Since we still prefer to avoid such spills, we provide an override
for IsProfitableToFold to avoid creating a merged LOAD / ICMP if
this would result in multiple users of the CC.
- combineCCMask no longer requires a single CC user, and no longer
need to be careful about preventing invalid glue/chain cycles.
- emitSelect needs to be more careful in marking CC live-in to
the basic block it generates. Also, we can now optimize the
case of multiple subsequent selects with the same condition
just like X86 does.
llvm-svn: 331202
In patterns where we need to specify a result VT, prefer
[(set (tr.vt tr.op:$V1), (operator ...))]
over
[(set tr.op:$V1, (tr.vt (operator ...)))]
This is NFC now, but simplifies some future changes.
llvm-svn: 331192
If we have LOCR instructions, select them directly from SelectionDAG
instead of first going through a pseudo instruction and then using
the custom inserter to emit the LOCR.
Provide Select pseudo-instructions for VR32/VR64 if we have vector
instructions, to avoid having to go through the first 16 FPRs
unnecessarily.
If we do not have LOCFHR, prefer using LOCR followed by a move
over a conditional branch.
llvm-svn: 331191
If the MachineInstr uses a custom inserter and is then erased after
instruction selection, there is no use for mapping it to a sched class.
Review: Ulrich Weigand
llvm-svn: 331040
While not necessary for correctness, it is preferable for
performance reasons on all architectures we currently support
to align functions to 16-byte boundaries by default.
llvm-svn: 330718
The TargetSchedModel is always initialized using the TargetSubtargetInfo's
MCSchedModel and TargetInstrInfo, so we don't need to extract those and
pass 3 parameters to init().
Differential Revision: https://reviews.llvm.org/D44789
llvm-svn: 329540
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.
Also remove some stale-looking comments from the Nios2 target cmakefile.
No intended behavior change.
llvm-svn: 329181
As in SystemZ backend, correctly propagate node ids when inserting new
unselected nodes into the DAG during instruction Seleciton for X86
target.
Fixes PR36865.
Reviewers: jyknight, craig.topper
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44797
llvm-svn: 328233
Summary:
The docs already claim that this happens, but so far it hasn't. As a
consequence, existing TableGen files get this wrong a lot, but luckily
the fixes are all reasonably straightforward.
To make this work with all the existing forms of self-references (since
the true type of a record is only built up over time), the lookup of
self-references in !cast is delayed until the final resolving step.
Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D44475
llvm-svn: 327849
If DoneMBB becomes empty it must have CC added to its live-in list, since it
will fall-through into EndMBB. This happens when the CLC loop does the
complete range.
Review: Ulrich Weigand
llvm-svn: 327834
This is similar to the check later when we remap some of the instructions from one class to a new one. But if we reuse the class we don't get to do that check.
So many CPUs have violations of this check that I had to add a flag to the SchedMachineModel to allow it to be disabled. Hopefully we can get those cleaned up quickly and remove this flag.
A lot of the violations are due to overlapping regular expressions, but that's not the only kind of issue it found.
llvm-svn: 327808
Improve/implement these methods to improve DAG combining. This mainly
concerns intrinsics.
Some constant operands to SystemZISD nodes have been marked Opaque to avoid
transforming back and forth between generic and target nodes infinitely.
Review: Ulrich Weigand
llvm-svn: 327765
AnyReg is just for the assembler and it is better to have it as not
allocatable in order to simplify (make more intuitive) the RegPressureSets.
Review: Ulrich Weigand
llvm-svn: 327715
r327171 "Improve Dependency analysis when doing multi-node Instruction Selection"
r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection"
Reverting patch as NodeId invariant change is causing pathological
increases in compile time on PPC
llvm-svn: 327197
Instruction Selection makes use of the topological ordering of nodes
by node id (a node's operands have smaller node id than it) when doing
cycle detection. During selection we may violate this property as a
selection of multiple nodes may induce a use dependence (and thus a
node id restriction) between two unrelated nodes. If a selected node
has an unselected successor this may allow us to miss a cycle in
detection an invalid selection.
This patch fixes this by marking all unselected successors of a
selected node have negated node id. We avoid pruning on such negative
ids but still can reconstruct the original id for pruning.
In-tree targets have been updated to replace DAG-level replacements
with ISel-level ones which enforce this property.
This preemptively fixes PR36312 before triggering commit r324359 relands
Reviewers: craig.topper, bogner, jyknight
Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D43198
llvm-svn: 327170
getCurrCycleIdx() returns the decoder cycle index which the next candidate SU
will be placed on.
This patch improves this method by passing the candidate SU to it so that if
SU will begin a new group, the index of that group is returned instead.
Review: Ulrich Weigand
llvm-svn: 326880
Handle the not-taken branch in emitInstruction() where the TakenBranch
argument is available. This is cleaner than relying on EmitInstruction().
Review: Ulrich Weigand
llvm-svn: 326879
The byte-swapping loads and stores do not actually perform multiple
accesses to their memory operand, so they are OK to use with volatile
memory operands as well. Remove overly cautious check.
llvm-svn: 326613
This adds back-end support for the anyregcc calling convention
for use with patchpoints.
Since all registers are considered call-saved with anyregcc
(except for 0 and 1 which may still be clobbered by PLT stubs
and the like), this required adding support for saving and
restoring vector registers in prologue/epilogue code for the
first time. This is not used by any other calling convention.
llvm-svn: 326612
On SystemZ we need to provide a register save area of 160 bytes to
any called function. This size needs to be added when allocating
stack in the function prologue. However, it was not accounted for
as part of MachineFrameInfo::getStackSize(); instead the back-end
used a private routine getAllocatedStackSize().
This is OK for code-gen purposes, but it breaks other users of
the getStackSize() routine, in particular it breaks the recently-
added -stack-size-section feature.
Fix this by updating the main stack size tracked by common code
(in emitPrologue) instead of using the private routine.
No change in code generation intended.
llvm-svn: 326610
This adds support for specifying vector registers for use with inline
asm statements, either via the 'v' constraint or by explicit register
names (v0 ... v31).
llvm-svn: 326609
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.
Differential Revision: https://reviews.llvm.org/D42999
llvm-svn: 326341
Since getNode() might not always return the requsted opcode, for instance if
called with (ISD::AND, -1) arguments, there should be a check so that
SelectCode() is only called when appropriate.
Review: Ulrich Weigand
llvm-svn: 326178
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers. This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.
Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).
Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.
Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.
Clear the IsRenamable bit when changing an operand's register value.
Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.
Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.
Reviewers: qcolombet, MatzeB
Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D43042
llvm-svn: 325931
Since these methods will assert if the integer does not fit into 64 bits,
it is necessary to do this check before calling them in
supportedAddressingMode().
Review: Ulrich Weigand.
llvm-svn: 323866
Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table.
It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search.
There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line.
llvm-svn: 323551
The change in r322988 caused a failure in the bootstrap build bot.
The problem was that directly gluing a BR_CCMASK node to a
compare-and-swap could lead to issues if other nodes were
chained in between. There is then no way to create a topological
sort that respects both the chain sequence and the glue property.
Fixed for now by rejecting the optimization in this case. As a
future enhancement, we may be able to handle additional cases
by swapping chain links around.
llvm-svn: 323129
On current machines we have load-on-condition instructions that can be
used to directly implement the SETCC semantics. If we have those, it is
always preferable to use them instead of generating the IPM sequence.
llvm-svn: 322989
In order to implement a test whether a compare-and-swap succeeded, the
SystemZ back-end currently emits a rather inefficient sequence of first
converting the CC result into an integer, and then testing that integer
against zero. This commit changes the back-end to simply directly test
the CC value set by the compare-and-swap instruction.
llvm-svn: 322988
The SystemZ back-end uses a sequence of IPM followed by arithmetic
operations to implement the SETCC primitive. This is currently done
early during SelectionDAG. This patch moves generating those sequences
to much later in SelectionDAG (during PreprocessISelDAG).
This doesn't change much in generated code by itself, but it allows
further enhancements that will be checked-in as follow-on commits.
llvm-svn: 322987
BRCTH is capable of a long branch which needs to be recognized during branch
relaxation. This is done by checking for ExtraRelaxSize == 0.
Review: Ulrich Weigand
llvm-svn: 322688
Since a load and test instruction treat its operands as signed, it can only
replace a logical compare for EQ/NE uses.
Review: Ulrich Weigand
https://bugs.llvm.org/show_bug.cgi?id=35662
llvm-svn: 322488
Since a load and test instruction treat its operands as signed, it can only
replace a logical compare for EQ/NE uses.
Review: Ulrich Weigand
https://bugs.llvm.org/show_bug.cgi?id=35662
llvm-svn: 322161
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend.
D20830 threaded an MCSubtargetInfo reference through
MCAsmBackend::relaxInstruction, but this isn't the only function that would
benefit from access. This patch removes the Triple and CPUString arguments
from createMCAsmBackend and replaces them with MCSubtargetInfo.
This patch just changes the interface without making any intentional
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)
This change initially exposed PR35686, which has since been resolved in r321026.
Differential Revision: https://reviews.llvm.org/D41349
llvm-svn: 321692
Re-land r321234. It had to be reverted because it broke the shared
library build. The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).
Original commit message:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321375
Summary:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321234
Headers/Implementation files should be named after the class they
declare/define.
Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in
favor of `class LiveIntarvals;`
llvm-svn: 320546
Tabort (transaction abort) does not load from memory.
mayLoad flag removed from corresponding TABORT machine instruction.
Review: Ulrich Weigand
llvm-svn: 319905
Csmith discovered a program that caused wrong code generation with -O0:
When handling a SIGN_EXTEND in expandRxSBG(), RxSBG.BitSize may be less than
the Input width (if a truncate was previously traversed), so maskMatters()
should be called with a masked based on the width of the sign extend result
instead.
Review: Ulrich Weigand
llvm-svn: 319892
When folding a shift into a test-under-mask comparison, make sure that
there is no loss of precision when creating the shifted comparison
value. This usually never happens, except for certain always-true
comparisons in unoptimized code.
Fixes PR35529.
llvm-svn: 319818
This has proven a healthy exercise, as many cases of incorrect instruction
flags were corrected in the process. As part of this, IntrWriteMem was added
to several SystemZ instrinsics.
Furthermore, a bug was exposed in TwoAddress with this change (as incorrect
hasSideEffects flags were removed and instructions could now be sunk), and
the test case for that bugfix (r319646) is included here as
test/CodeGen/SystemZ/twoaddr-sink.ll.
One temporary test regression (one extra copy) which will hopefully go away
in upcoming patches for similar cases:
test/CodeGen/SystemZ/vec-trunc-to-i1.ll
Review: Ulrich Weigand.
https://reviews.llvm.org/D40437
llvm-svn: 319756
MachineRegisterInfo used to allow just one regalloc hint per virtual
register. This patch extends this to a vector of regalloc hints, which is
filled in by common code with sorted copy hints. Such hints will make for
more ID copies that can be removed.
NB! This improvement is currently (and hopefully temporarily) *disabled* by
default, except for SystemZ. The only reason for this is the big impact this
has on tests, which has unfortunately proven unmanageable. It was a long
while since all the tests were updated and just waiting for review (which
didn't happen), but now targets have to enable this themselves
instead. Several targets could get a head-start by downloading the tests
updates from the Phabricator review. Thanks to those who helped, and sorry
you now have to do this step yourselves.
This should be an improvement generally for any target!
The target may still create its own hint, in which case this has highest
priority and is stored first in the vector. If it has target-type, it will
not be recomputed, as per the previous behaviour.
The temporary hook enableMultipleCopyHints() will be removed as soon as all
targets return true.
Review: Quentin Colombet, Ulrich Weigand.
https://reviews.llvm.org/D38128
llvm-svn: 319754
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
Csmith generated a program where a store after load to the same address did
not get chained after the new load created during DAG legalizing, and so
performed an illegal overwrite of the expected value.
When the new zero-extending load is created, the chain users of the original
load must be updated, which was not done previously.
A similar case was also found and handled in lowerBITCAST.
Review: Ulrich Weigand
https://reviews.llvm.org/D40542
llvm-svn: 319409
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
Summary:
Make it possible to feed runtime information back to tablegen to enable
profile-guided tablegen-eration, detection of untested tablegen definitions, etc.
Being a cross-compiler by nature, LLVM will potentially collect data for multiple
architectures (e.g. when running 'ninja check'). We therefore need a way for
TableGen to figure out what data applies to the backend it is generating at the
time. This patch achieves that by including the name of the 'def X : Target ...'
for the backend in the TargetRegistry.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev
Differential Revision: https://reviews.llvm.org/D39742
llvm-svn: 318352
In rare cases, common code will attempt to select an OR of two
constants. This confuses the logic in splitLargeImmediate,
causing an internal error during isel. Fixed by simply leaving
this case to common code to handle.
This fixes PR34859.
llvm-svn: 318187
Before using the 32-bit RISBMux set of instructions we need to
verify that the input bits are actually within range of the 32-bit
instruction. This fixer PR35289.
llvm-svn: 318177
* The method getRegAllocationHints() is now of bool type instead of void. If
true is returned, regalloc (AllocationOrder) will *only* try to allocate the
hints, as opposed to merely trying them before non-hinted registers.
* TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with
an increase in number of LOCRs.
In this case, it is desired to force the hints even though there is a slight
increase in spilling, because if a non-hinted register would be allocated,
the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR
(Load On Condition) SystemZ instruction must have both operands in either the
low or high part of the 64 bit register.
Reviewers: Quentin Colombet and Ulrich Weigand
https://reviews.llvm.org/D36795
llvm-svn: 317879
We don't really need any special handling of "offsettable"
memory addresses, but since some existing code uses inline
asm statements with the "o" constraint, add support for this
constraint for compatibility purposes.
llvm-svn: 317807
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.
llvm-svn: 317647
SystemZ can do division and remainder in a single instruction for scalar
integer types, which are now reflected by returning true in this hook for
those cases.
Review: Ulrich Weigand
llvm-svn: 317477
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.
This reverts commit r315633.
llvm-svn: 315637
Merge LLVMTargetMachine into TargetMachine.
- There is no in-tree target anymore that just implements TargetMachine
but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
interface.
Differential Revision: https://reviews.llvm.org/D38489
llvm-svn: 315633
This adds debug tracing to the table-generated assembly instruction matcher,
enabled by the -debug-only=asm-matcher option.
The changes in the target AsmParsers are to add an MCInstrInfo reference under
a consistent name, so that we can use it from table-generated code. This was
already being used this way for targets that use deprecation warnings, but 5
targets did not have it, and Hexagon had it under a different name to the other
backends.
llvm-svn: 315445
functions.
This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.
llvm-svn: 315327
ELFObjectWriter's constructor.
Fixes the same ownership issue for ELF that r315245 did for MachO:
ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.
llvm-svn: 315254
The machine scheduler (before register allocation) is enabled by default for
SystemZ.
The SelectionDAG scheduling preference now becomes source order scheduling
(was regpressure).
Review: Ulrich Weigand
https://reviews.llvm.org/D37977
llvm-svn: 315063
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.
If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).
This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899https://bugs.llvm.org/show_bug.cgi?id=34610
llvm-svn: 314516
The expensive-checks build bot found a problem with the r314428 commit:
if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be
marked as live-in to the block after the loop the pseudo gets expanded
to. This actually fixes a code-gen bug as well, since if the CC isn't
live, the CR and JLH are merged to a CRJLH which doesn't actually set
the condition code any more.
llvm-svn: 314465
The SystemZ compare-and-swap instructions already provide the "success"
indication via a condition-code value, so the default expansion of those
operations generates an unnecessary extra comparsion.
llvm-svn: 314428
More conversions to load-and-test can be made with this patch by adding a
forward search in optimizeCompareZero().
Review: Ulrich Weigand
https://reviews.llvm.org/D38076
llvm-svn: 313877
SystemZTargetLowering::combineSTORE contains code to transform a
combination of STORE + BSWAP into a STRV type instruction.
This transformation is correct for regular stores, but not for
truncating stores. The routine neglected to check for that case.
Fixes a miscompilation of llvm-objcopy with clang, which caused
test suite failures in the SystemZ multistage build bot.
llvm-svn: 313669
This bit is needed in order for the CalleeSavedRegs list to automatically
include the super registers if all of their subregs are present.
Thanks to Wei Mi for initially indicating this deficiency in the SystemZ
backend.
Review: Ulrich Weigand.
https://bugs.llvm.org/show_bug.cgi?id=34550
llvm-svn: 313023
The idea of this patch is to continue the scheduler state over an MBB boundary
in the case where the successor block has only one predecessor. This means
that the scheduler will continue in the successor block (after emitting any
branch instructions) with e.g. maintained processor resource counters.
Benchmarks have been confirmed to benefit from this.
The algorithm in MachineScheduler.cpp that extracts scheduling regions of an
MBB has been extended so that the strategy may optionally reverse the order
of processing the regions themselves. This is controlled by a new method
doMBBSchedRegionsTopDown(), which defaults to false.
Handling the top-most region of an MBB first also means that a top-down
scheduler can continue the scheduler state across any scheduling boundary
between to regions inside MBB.
Review: Ulrich Weigand, Matthias Braun, Andy Trick.
https://reviews.llvm.org/D35053
llvm-svn: 311072
The liveness-tracking code assumes that the registers that were saved
in the function's prolog are live outside of the function. Specifically,
that registers that were saved are also live-on-exit from the function.
This isn't always the case as illustrated by the LR register on ARM.
Differential Revision: https://reviews.llvm.org/D36160
llvm-svn: 310619
isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.
The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().
The isFoldableMemAccess() hook has been removed everywhere.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933
llvm-svn: 310463
This adds support for the main 128-bit atomic operations,
using the SystemZ instructions LPQ, STPQ, and CDSG.
Generating these instructions is a bit more complex than usual
since the i128 type is not legal for the back-end. Therefore,
we have to hook the LowerOperationWrapper and ReplaceNodeResults
TargetLowering callbacks.
llvm-svn: 310094
We currently emit a serialization operation (bcr 14, 0) before every
atomic load and after every atomic store. This is overly conservative.
The SystemZ architecture actually does not require any serialization
for atomic loads, and a serialization after an atomic store only if
we need to enforce sequential consistency. This is what other compilers
for the platform implement as well.
llvm-svn: 310093
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
llvm-svn: 309911
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
This adds support for the new 128-bit vector float instructions of z14.
Note that these instructions actually only operate on the f128 type,
since only each 128-bit vector register can hold only one 128-bit
float value. However, this is still preferable to the legacy 128-bit
float instructions, since those operate on pairs of floating-point
registers (so we can hold at most 8 values in registers), while the
new instructions use single vector registers (so we hold up to 32
value in registers).
Adding support includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions. This includes allocating the f128
type now to the VR128BitRegClass instead of FP128BitRegClass.
- Scheduler description support for the instructions.
Note that for a small number of operations, we have no new vector
instructions (like integer <-> 128-bit float conversions), and so
we use the legacy instruction and then reformat the operand
(i.e. copy between a pair of floating-point registers and a
vector register).
llvm-svn: 308196
This adds support for the new 32-bit vector float instructions of z14.
This includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions, including new LLVM intrinsics.
- Scheduler description support for the instructions.
- Update to the vector cost function calculations.
In general, CodeGen support for the new v4f32 instructions closely
matches support for the existing v2f64 instructions.
llvm-svn: 308195
This patch series adds support for the IBM z14 processor. This part includes:
- Basic support for the new processor and its features.
- Support for new instructions (except vector 32-bit float and 128-bit float).
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of z14 as host processor.
Support for the new 32-bit vector float and 128-bit vector float
instructions is provided by separate patches.
llvm-svn: 308194
The issue is not if the value is pcrel. It is whether we have a
relocation or not.
If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.
llvm-svn: 307730
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
Several integer multiply/divide instructions require use of a
register pair as input and output. This patch moves setting
up the input register pair from C++ code to TableGen, simplifying
the whole process and making it more easily extensible.
No functional change.
llvm-svn: 307155
Fixes a couple of whitespace errors, re-sorts the vector floating-point
instructions to make them more easily extensible, and adds a missing
pseudo instruction.
No functional change.
llvm-svn: 307154
This adds all remaining instructions that were still missing, mostly
privileged and semi-privileged system-level instructions. These are
provided for use with the assembler and disassembler only.
This brings the LLVM assembler / disassembler to parity with the
GNU binutils tools.
llvm-svn: 306876
There are a few instructions provided by the high-word facility (z196)
that we cannot easily exploit for code generation. This patch at least
adds those missing instructions for the assembler and disassembler.
This means that now all nonprivileged instructions up to z13 are
supported by the LLVM assembler / disassembler.
llvm-svn: 306821
We sometimes need emergency spill slots for the register scavenger.
This may be the case when code needs to access a stack slot that
has an offset of 4096 or more relative to the stack pointer.
To make that determination, processFunctionBeforeFrameFinalized
currently simply checks the total stack frame size of the current
function. But this is not enough, since code may need to access
stack slots in the caller's stack frame as well, in particular
incoming arguments stored on the stack.
This commit fixes the problem by taking argument slots into account.
llvm-svn: 306305
Csmith discovered that this function can be called with a zero argument,
in which case an assert for this triggered.
This patch also adds a guard before the other call to this function since
it was missing, although the test only covers the case where it was
discovered.
Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll.
Review: Ulrich Weigand
llvm-svn: 306287
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
llvm-svn: 306177
This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad
introduced by r196905. Nothing in the semantics of the "volatile"
keyword or the definition of the z/Architecture actually requires
that volatile loads are preceded by a serialization operation, and
no other compiler on the platform actually implements this.
Since we've now seen a use case where this additional serialization
causes noticable performance degradation, this patch removes it.
The patch still leaves in the serialization before atomic loads,
which is now implemented directly in lowerATOMIC_LOAD. (This also
seems overkill, but that can be addressed separately.)
llvm-svn: 306117
The isBarrier/isTerminator flags have been removed from the SystemZ trap
instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just
an issue at -O0 and did not affect code output on benchmarks.
(Like Eli pointed out: "targets are split over whether they consider their
"trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and
SystemZ do. We should probably try to be consistent here.". This is still the
case, although SystemZ has switched sides).
SystemZ now returns true in isMachineVerifierClean() :-)
These Generic tests have been modified so that they can be run with or without
EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and
CodeGen/Generic/print-machineinstrs.ll
Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman
https://bugs.llvm.org/show_bug.cgi?id=33047https://reviews.llvm.org/D34143
llvm-svn: 306106
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.
Reviewers: chandlerc, rnk, reames
Reviewed By: reames
Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D33903
llvm-svn: 305189
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.
Differential Revision: https://reviews.llvm.org/D33843
llvm-svn: 304864
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
This adds a callback to the LLVMTargetMachine that lets target indicate
that they do not pass the machine verifier checks in all cases yet.
This is intended to be a temporary measure while the targets are fixed
allowing us to enable the machine verifier by default with
EXPENSIVE_CHECKS enabled!
Differential Revision: https://reviews.llvm.org/D33696
llvm-svn: 304320
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.
While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.
llvm-svn: 304247
This adds assembler / disassembler support for the decimal
floating-point instructions. Since LLVM does not yet have
support for decimal float types, these cannot be used for
codegen at this point.
llvm-svn: 304203
This adds assembler / disassembler support for the hexadecimal
floating-point instructions. Since the Linux ABI does not use
any hex float data types, these are not useful for codegen.
llvm-svn: 304202