Commit 9965b12fd1 was supposed to change the offset constant when
lowering load/stores, but only introduced this change for loads. This
patch adds the same fix for stores.
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.
I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.
Vector support not yet implemented.
Differential Revision: https://reviews.llvm.org/D73659
Summary:
This is mostly NFC. computeForAddSub may give more precise results in
some cases, but that doesn't seem to affect any existing GlobalISel
tests.
Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73431
G_CTPOP is generated from llvm.ctpop.<type> intrinsics, clang generates
these intrinsics from __builtin_popcount and __builtin_popcountll.
Add lower and narrow scalar for G_CTPOP.
Lower G_CTPOP for MIPS32.
Differential Revision: https://reviews.llvm.org/D73216
llvm.cttz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
G_CTTZ is generated from llvm.cttz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_ctz and
__builtin_ctzll.
G_CTTZ_ZERO_UNDEF comes from llvm.cttz.<type> (<type> <src>, i1 true).
Clang generates such intrinsics as parts of expansion of builtin_ffs
and builtin_ffsll. It is also traditionally part of and many
algorithms that are now predicated on avoiding zero-value inputs.
Add narrow scalar (algorithm uses G_CTTZ_ZERO_UNDEF) for G_CTTZ.
Lower G_CTTZ and G_CTTZ_ZERO_UNDEF for MIPS32.
Differential Revision: https://reviews.llvm.org/D73215
llvm.ctlz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
MIPS clz instruction returns 32 for zero input.
G_CTLZ is generated from llvm.ctlz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_clz and
__builtin_clzll.
G_CTLZ_ZERO_UNDEF can also be generated from llvm.ctlz with true as
second argument. It is also traditionally part of and many algorithms
that are now predicated on avoiding zero-value inputs.
Add narrow scalar for G_CTLZ (algorithm uses G_CTLZ_ZERO_UNDEF).
Lower G_CTLZ_ZERO_UNDEF and select G_CTLZ for MIPS32.
Differential Revision: https://reviews.llvm.org/D73214
Teach the GISelKnowBits analysis how to deal with PHI operations.
PHIs are essentially COPYs happening on edges, so we can just reuse
the code for COPY.
This is NFC COPY-wise has we leave Depth untouched when calling
computeKnownBitsImpl for COPYs, like it was before this patch.
Increasing Depth is however required for PHIs as they may loop back to
themselves and we would end up in an infinite loop if we were not
increasing Depth.
Differential Revision: https://reviews.llvm.org/D73317
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
This was assuming the narrow target was the source type. Respect the
requested type when these don't match by using intermediate
merges. This avoids producing very wide, illegal shift expansions.
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
This now develops the same problem G_ZEXT/G_ANYEXT have where the
requested type is assumed to be the source type. This will be fixed
separately by creating intermediate merges.
Bitcast only really applies between scalars and vectors. Implement as
an unmerge and remerge. The test needs to tolerate failure since one
of the unmerges currently fails to legalize.
We're planning to remove the shufflemask operand from ShuffleVectorInst
(D72467); fix GlobalISel so it doesn't depend on that Constant.
The change to prelegalizercombiner-shuffle-vector.mir happens because
the input contains a literal "-1" in the mask (so the parser/verifier
weren't really handling it properly). We now treat it as equivalent to
"undef" in all contexts.
Differential Revision: https://reviews.llvm.org/D72663
As an intermediate step, some TLI functions can be converted to using
LLT instead of MVT. Move this somewhere out of GlobalISel so DAG
functions can use these.
AMDGPU can't unambiguously go back from the selected instruction
register class to the register bank without knowing if this was used
in a boolean context.
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Recommit notes:
Introduce temporary variables in order to make sure
instructions get inserted into MachineFunction in same order
regardless of compiler used to build llvm.
Differential Revision: https://reviews.llvm.org/D71363
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Differential Revision: https://reviews.llvm.org/D71363
G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates
these intrinsics from __builtin_bswap32 and __builtin_bswap64.
Add lower and narrowscalar for G_BSWAP.
Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later.
Differential Revision: https://reviews.llvm.org/D71362
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).
Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.
Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.
This matches what was done for X86 in 6bf108d77a.
Differential Revision: https://reviews.llvm.org/D71721
Legalization algorithm is complicated by two facts:
1) While regular instructions should be possible to legalize in
an isolated, per-instruction, context-free manner, legalization
artifacts can only be eliminated in pairs, which could be deeply, and
ultimately arbitrary nested: { [ () ] }, where which paranthesis kind
depicts an artifact kind, like extend, unmerge, etc. Such structure
can only be fully eliminated by simple local combines if they are
attempted in a particular order (inside out), or alternatively by
repeated scans each eliminating only one innermost pair, resulting in
O(n^2) complexity.
2) Some artifacts might in fact be regular instructions that could (and
sometimes should) be legalized by the target-specific rules. Which
means failure to eliminate all artifacts on the first iteration is
not a failure, they need to be tried as instructions, which may
produce more artifacts, including the ones that are in fact regular
instructions, resulting in a non-constant number of iterations
required to finish the process.
I trust the recently introduced termination condition (no new artifacts
were created during as-a-regular-instruction-retrial of artifacts not
eliminated on the previous iteration) to be efficient in providing
termination, but only performing the legalization in full if and only if
at each step such chains of artifacts are successfully eliminated in
full as well.
Which is currently not guaranteed, as the artifact combines are applied
only once and in an arbitrary order that has to do with the order of
creation or insertion of artifacts into their worklist, which is a no
particular order.
In this patch I make a small change to the artifact combiner, making it
to re-insert into the worklist immediate (modulo a look-through copies)
artifact users of each vreg that changes its definition due to an
artifact combine.
Here the first scan through the artifacts worklist, while not
being done in any guaranteed order, only needs to find the innermost
pair(s) of artifacts that could be immediately combined out. After that
the process follows def-use chains, making them shorter at each step, thus
combining everything that can be combined in O(n) time.
Reviewers: volkan, aditya_nandakumar, qcolombet, paquette, aemerson, dsanders
Reviewed By: aditya_nandakumar, paquette
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71448
and introducing new unittests/CodeGen/GlobalISel/LegalizerTest.cpp
relying on it to unit test the entire legalizer algorithm (including the
top-level main loop).
See also https://reviews.llvm.org/D71448
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.
Reviewers: sdesmalen, efriedma, fhahn, aemerson
Reviewed By: efriedma
Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70496
Summary:
Previously, it was not possible to skip running the localizer pass
conditionally. This patch adds an input function to the pass which
decides if the pass should run on the given MachineFunction or not.
No test case as there is no upstream target needs this functionality.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: rovka, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71038
https://reviews.llvm.org/D70922
This adds a hook to allow targets to define exactly what extension
operation should be performed for widening constants. This handles cases
like widening i1 true which would end up becoming -1 which affects code
quality during combines.
Additionally, in order to stay consistent with how DAG is promoting
constants, we now signextend for byte sized types and zero extend
otherwise (by default). Targets can of course override this if
necessary.
Summary:
When combining COPY instructions, we were replacing the destination registers
with the source register without checking register constraints. This patch adds
a simple logic to check if the constraints match before replacing registers.
Reviewers: qcolombet, aditya_nandakumar, aemerson, paquette, dsanders, Petar.Avramovic
Reviewed By: aditya_nandakumar
Subscribers: rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70616
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so. Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:
1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so. This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.
With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.
2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set. This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.
I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:
- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON
Reviewers: beanz, smeenai, compnerd, phosek
Reviewed By: beanz
Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70179
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
LLVM IR of 1-element vectors get lower into scalar in GISel. As a
result, shuffle vector may also produce a scalar.
This patch teaches the shuffle combiner how to deal with scalars when
they are in the destination type of a shuffle vector.
For now, we just support the easy case where this can be lowered to
a plain copy. For other cases, we leave the shuffle vector as is.
This type of IR are seen in O0 pipelines. E.g., as produced with
SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c.
rdar://problem/57198904
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD
Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69734
Teach the combiner helper how to replace shuffle_vector of scalars
into build_vector.
I am not particularly happy about having to add this combine, but we
currently get those from <1 x iN> from the IR.
Bonus: This fixes an assert in the shuffle_vector combines since before
this patch, we were expecting vector types.
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
Teach the CombinerHelper how to turn shuffle_vectors, that
concatenate vectors, into concat_vectors and add this combine
to the AArch64 pre-legalizer combiner.
Differential Revision: https://reviews.llvm.org/D69149
llvm-svn: 375452
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.
Noticed with -ftime-trace.
llvm-svn: 375311
Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.
The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.
Depends on D68426
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68438
llvm-svn: 375067
Teach the combiner helper how to flatten concat_vectors of build_vectors
into a build_vector.
Add this combine as part of AArch64 pre-legalizer combiner.
Differential Revision: https://reviews.llvm.org/D69071
llvm-svn: 375066
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.
The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.
Depends on D68424
Reviewers: bogner, volkan
Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68426
llvm-svn: 375057
This patch kills off a significant user of the "IsIndirect" field of
DBG_VALUE machine insts. Brought up in in PR41675, IsIndirect is
techncally redundant as it can be expressed by the DIExpression of a
DBG_VALUE inst, and it isn't helpful to have two ways of expressing
things.
Rather than setting IsIndirect, have DBG_VALUE creators add an extra deref
to the insts DIExpression. There should now be no appearences of
IsIndirect=True from isel down to LiveDebugVariables / VirtRegRewriter,
which is ensured by an assertion in LDVImpl::handleDebugValue. This means
we also get to delete the IsIndirect handling in LiveDebugVariables. Tests
can be upgraded by for example swapping the following IsIndirect=True
DBG_VALUE:
DBG_VALUE $somereg, 0, !123, !DIExpression(DW_OP_foo)
With one where the indirection is in the DIExpression, by _appending_
a deref:
DBG_VALUE $somereg, $noreg, !123, !DIExpression(DW_OP_foo, DW_OP_deref)
Which both mean the same thing.
Most of the test changes in this patch are updates of that form; also some
changes in how the textual assembly printer handles these insts.
Differential Revision: https://reviews.llvm.org/D68945
llvm-svn: 374877
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374784
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
The CmpInst::getType() calls can be replaced by just using User::getType() that it was dyn_cast from, and we then need to assert that any default predicate cases came from the CmpInst.
llvm-svn: 374716
The exciting code is actually already enough to handle the splitting
of vector arguments but we were lacking a test case.
This commit adds a test case for vector argument lowering involving
splitting and enable the related support in call lowering.
llvm-svn: 374589
Teach buildMerge how to deal with scalar to vector kind of requests.
Prior to this patch, buildMerge would issue either a G_MERGE_VALUES
when all the vregs are scalars or a G_CONCAT_VECTORS when the destination
vreg is a vector.
G_CONCAT_VECTORS was actually not the proper instruction when the source
vregs were scalars and the compiler would assert that the sources must
be vectors. Instead we want is to issue a G_BUILD_VECTOR when we are
in this situation.
This patch fixes that.
llvm-svn: 374588
In GISel we have both G_CONSTANT and G_FCONSTANT, but because
in GISel we don't really have a concept of Float vs Int value
the only difference between the two is where the data originates
from.
What both G_CONSTANT and G_FCONSTANT return is just a bag of bits
with the constant representation in it.
By making getConstantVRegVal() return G_FCONSTANTs bit representation
as well we allow ConstantFold and other things to operate with
G_FCONSTANT.
Adding tests that show ConstantFolding to work on mixed G_CONSTANT
and G_FCONSTANT sources.
Differential Revision: https://reviews.llvm.org/D68739
llvm-svn: 374458
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
llvm-svn: 373937
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.
https://reviews.llvm.org/D68439
llvm-svn: 373935
SelectionDAG has a bunch of machinery to defer this to selection time
for some reason. Just directly emit a copy during IRTranslator. The
x86 usage does somewhat questionably check hasFP, which could depend
on the whole function being at minimum translated.
This does lose the convergent bit if the callsite had it, which may be
a problem. We also lose that in general for intrinsics, which may also
be a problem.
llvm-svn: 373294
Summary:
It seems we missed that the target hook can't query the known-bits for the
inputs to a target instruction. Fix that oversight
Reviewers: aditya_nandakumar
Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67380
llvm-svn: 373264
This adds support for lowering variadic musttail calls. To do this, we have
to...
- Detect a musttail call in a variadic function before attempting to lower the
call's formal arguments. This is done in the IRTranslator.
- Compute forwarded registers in `lowerFormalArguments`, and add copies for
those registers.
- Restore the forwarded registers in `lowerTailCall`.
Because there doesn't seem to be any nice way to wrap these up into the outgoing
argument handler, the restore code in `lowerTailCall` is done separately.
Also, irritatingly, you have to make sure that the registers don't overlap with
any passed parameters. Otherwise, the scheduler doesn't know what to do with the
extra copies and asserts.
Add call-translator-variadic-musttail.ll to test this. This is pretty much the
same as the X86 musttail-varargs.ll test. We didn't have as nice of a test to
base this off of, but the idea is the same.
Differential Revision: https://reviews.llvm.org/D68043
llvm-svn: 373226
We need to propagate this information from the IR in order to be able to safely
do tail call optimizations on the intrinsics during legalization. Assuming
it's safe to do tail call opt without checking for the marker isn't safe because
the mem libcall may use allocas from the caller.
This adds an extra immediate operand to the end of the intrinsics and fixes the
legalizer to handle it.
Differential Revision: https://reviews.llvm.org/D68151
llvm-svn: 373140
When checking for tail call eligibility, we should use the correct CCAssignFn
for each argument, rather than just checking if the caller/callee is varargs or
not.
This is important for tail call lowering with varargs. If we don't check it,
then basically any varargs callee with parameters cannot be tail called on
Darwin, for one thing. If the parameters are all guaranteed to be in registers,
this should be entirely safe.
On top of that, not checking for this could potentially make it so that we have
the wrong stack offsets when checking for tail call eligibility.
Also refactor some of the stuff for CCAssignFnForCall and pull it out into a
helper function.
Update call-translator-tail-call.ll to show that we can now correctly tail call
on Darwin. Also add two extra tail call checks. The first verifies that we still
respect the caller's stack size, and the second verifies that we still don't
tail call when a varargs function has a memory argument.
Differential Revision: https://reviews.llvm.org/D67939
llvm-svn: 372897
We were miscompiling switch value comparisons with the wrong signedness, which
shows up when we have things like switch case values with i1 types, which end up
being legalized incorrectly.
Fixes PR43383
llvm-svn: 372675
We currently always set the HasCalls on MFI during translation and legalization if
we're handling a call or legalizing to a libcall. However, if that call is later
optimized to a tail call then we don't need the flag. The flag being set to true
causes frame lowering to always save and restore FP/LR, which adds unnecessary code.
This change does the same thing as SelectionDAG and ports over some code that scans
instructions after selection, using TargetInstrInfo to determine if target opcodes
are known calls.
Code size geomean improvements on CTMark:
-O0 : 0.1%
-Os : 0.3%
Differential Revision: https://reviews.llvm.org/D67868
llvm-svn: 372443
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
r371901 was overeager and widenScalarDst() and the like in the legalizer
attempt to increment the insert point given in order to add new instructions
after the currently legalizing inst. In cases where the insertion point is not
exactly the current instruction, then callers need to de-compensate for the
behaviour by decrementing the insertion iterator before calling them. It's not
a nice state of affairs, for now just undo the problematic parts of the change.
llvm-svn: 372050
For some reason we sometimes insert new instructions one instruction before
the first non-PHI when legalizing. This can result in having non-PHI
instructions before PHIs, which mean that PHI elimination doesn't catch them.
Differential Revision: https://reviews.llvm.org/D67570
llvm-svn: 371901
Because memory intrinsics are handled differently than other calls, we need to
check them for tail call eligiblity in the legalizer. This allows us to still
inline them when it's beneficial to do so, but also tail call when possible.
This adds simple tail calling support for when the intrinsic is followed by a
return.
It ports the attribute checks from `TargetLowering::isInTailCallPosition` into
a similarly-named function in LegalizerHelper.cpp. The target-specific
`isUsedByReturnOnly` hook is not ported here.
Update tailcall-mem-intrinsics.ll to show that GlobalISel can now tail call
memory intrinsics.
Update legalize-memcpy-et-al.mir to have a case where we don't tail call.
Differential Revision: https://reviews.llvm.org/D67566
llvm-svn: 371893
Unlike SelectionDAG, treat this as a normally legalizable operation.
In SelectionDAG this is supposed to only ever formed if it's legal,
but I've found that to be restricting. For AMDGPU this is contextually
legal depending on whether denormal flushing is allowed in the use
function.
Technically we currently treat the denormal mode as a subtarget
feature, so custom lowering could be avoided. However I consider this
to be a defect, and this should be contextually dependent on the
controllable rounding mode of the parent function.
llvm-svn: 371800
This adds support for lowering sibling calls with outgoing arguments.
e.g
```
define void @foo(i32 %a)
```
Support is ported from AArch64ISelLowering's `isEligibleForTailCallOptimization`.
The only thing that is missing is a full port of
`TargetLowering::parametersInCSRMatch`. So, if we're using swiftself,
we'll never tail call.
- Rename `analyzeCallResult` to `analyzeArgInfo`, since the function is now used
for both outgoing and incoming arguments
- Teach `OutgoingArgHandler` about tail calls. Tail calls use frame indices for
stack arguments.
- Teach `lowerFormalArguments` to set the bytes in the caller's stack argument
area. This is used later to check if the tail call's parameters will fit on
the caller's stack.
- Add `areCalleeOutgoingArgsTailCallable` to perform the eligibility check on
the callee's outgoing arguments.
For testing:
- Update call-translator-tail-call to verify that we can now tail call with
outgoing arguments, use G_FRAME_INDEX for stack arguments, and respect the
size of the caller's stack
- Remove GISel-specific check lines from speculation-hardening.ll, since GISel
now tail calls like the other selectors
- Add a GISel test line to tailcall-string-rvo.ll since we can tail call in that
test now
- Add a GISel test line to tailcall_misched_graph.ll since we tail call there
now. Add specific check lines for GISel, since the debug output from the
machine-scheduler differs with GlobalISel. The dependency still holds, but
the output comes out in a different order.
Differential Revision: https://reviews.llvm.org/D67471
llvm-svn: 371780
First we were asserting that the ValNo of a VA was the wrong value. It doesn't actually
make a difference for us in CallLowering but fix that anyway to silence the assert.
The bigger issue was that after fixing the assert we were generating invalid MIR
because the merging/unmerging of values split across multiple registers wasn't
also implemented for memory locs. This happens when we run out of registers and
have to pass the split types like i128 -> i64 x 2 on the stack. This is do-able, but
for now just fall back.
llvm-svn: 371693
This fixes a crash in tail call translation caused by assume and lifetime_end
intrinsics.
It's possible to have instructions other than a return after a tail call which
will still have `Analysis::isInTailCallPosition` return true. (Namely,
lifetime_end and assume intrinsics.)
If we emit a tail call, we should stop translating instructions in the block.
Otherwise, we can end up emitting an extra return, or dead instructions in
general. This makes the verifier unhappy, and is generally unfortunate for
codegen.
This also removes the code from AArch64CallLowering that checks if we have a
tail call when lowering a return. This is covered by the new code now.
Also update call-translator-tail-call.ll to show that we now properly tail call
in the presence of lifetime_end and assume.
Differential Revision: https://reviews.llvm.org/D67415
llvm-svn: 371572
Add support for sibcalling calls whose calling convention differs from the
caller's.
- Port over `CCState::resultsCombatible` from CallingConvLower.cpp into
CallLowering. This is used to verify that the way the caller and callee CC
handle incoming arguments matches up.
- Add `CallLowering::analyzeCallResult`. This is basically a port of
`CCState::AnalyzeCallResult`, but using `ArgInfo` rather than `ISD::InputArg`.
- Add `AArch64CallLowering::doCallerAndCalleePassArgsTheSameWay`. This checks
that the calling conventions are compatible, and that the caller and callee
preserve the same registers.
For testing:
- Update call-translator-tail-call.ll to show that we can now handle this.
- Add a GISel line to tailcall-ccmismatch.ll to show that we will not tail call
when the regmasks don't line up.
Differential Revision: https://reviews.llvm.org/D67361
llvm-svn: 371570
Loosely based on DAGCombiner version, but this part is slightly simpler in
GlobalIsel because all address calculation is performed by G_GEP. That makes
the inc/dec distinction moot so there's just pre/post to think about.
No targets can handle it yet so testing is via a special flag that overrides
target hooks.
llvm-svn: 371384
Now that we look through copies, it's possible to visit registers that
have a register class constraint but not a type constraint. Avoid looking
through copies when this occurs as the SrcReg won't be able to determine
it's bit width or any known bits.
Along the same lines, if the initial query is on a register that doesn't
have a type constraint then the result is a default-constructed KnownBits,
that is, a 1-bit fully-unknown value.
llvm-svn: 371116
Recommit basic sibling call lowering (https://reviews.llvm.org/D67189)
The issue was that if you have a return type other than void, call lowering
will emit COPYs to get the return value after the call.
Disallow sibling calls other than ones that return void for now. Also
proactively disable swifterror tail calls for now, since there's a similar issue
with COPYs there.
Update call-translator-tail-call.ll to include test cases for each of these
things.
llvm-svn: 371114
This adds support for basic sibling call lowering in AArch64. The intent here is
to only handle tail calls which do not change the ABI (hence, sibling calls.)
At this point, it is very restricted. It does not handle
- Vararg calls.
- Calls with outgoing arguments.
- Calls whose calling conventions differ from the caller's calling convention.
- Tail/sibling calls with BTI enabled.
This patch adds
- `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent
to the same function in AArch64ISelLowering.cpp (albeit with the restrictions
above.)
- `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in
AArch64ISelLowering.cpp.
- `getCallOpcode`, which is exactly what it sounds like.
Tail/sibling calls are lowered by checking if they pass target-independent tail
call positioning checks, and checking if they satisfy
`isEligibleForTailCallOptimization`. If they do, then a tail call instruction is
emitted instead of a normal call. If we have a sibling call (which is always the
case in this patch), then we do not emit any stack adjustment operations. When
we go to lower a return, we check if we've already emitted a tail call. If so,
then we skip the return lowering.
For testing, this patch
- Adds call-translator-tail-call.ll to test which tail calls we currently lower,
which ones we don't, and which ones we shouldn't.
- Updates branch-target-enforcement-indirect-calls.ll to show that we fall back
as expected.
Differential Revision: https://reviews.llvm.org/D67189
........
This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll
llvm-svn: 371051
This adds support for basic sibling call lowering in AArch64. The intent here is
to only handle tail calls which do not change the ABI (hence, sibling calls.)
At this point, it is very restricted. It does not handle
- Vararg calls.
- Calls with outgoing arguments.
- Calls whose calling conventions differ from the caller's calling convention.
- Tail/sibling calls with BTI enabled.
This patch adds
- `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent
to the same function in AArch64ISelLowering.cpp (albeit with the restrictions
above.)
- `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in
AArch64ISelLowering.cpp.
- `getCallOpcode`, which is exactly what it sounds like.
Tail/sibling calls are lowered by checking if they pass target-independent tail
call positioning checks, and checking if they satisfy
`isEligibleForTailCallOptimization`. If they do, then a tail call instruction is
emitted instead of a normal call. If we have a sibling call (which is always the
case in this patch), then we do not emit any stack adjustment operations. When
we go to lower a return, we check if we've already emitted a tail call. If so,
then we skip the return lowering.
For testing, this patch
- Adds call-translator-tail-call.ll to test which tail calls we currently lower,
which ones we don't, and which ones we shouldn't.
- Updates branch-target-enforcement-indirect-calls.ll to show that we fall back
as expected.
Differential Revision: https://reviews.llvm.org/D67189
llvm-svn: 370996
Similar to the issue with G_ZEXT that was fixed earlier, this is a quick
to fall back if the source type is not exactly half of the dest type.
Fixes the clang-cmake-aarch64-lld bot build.
llvm-svn: 370847
Now that we have the infrastructure to support s128 types as parameters
we can expand these to libcalls.
Differential Revision: https://reviews.llvm.org/D66185
llvm-svn: 370823
On AArch64, s128 types have to be split into s64 GPRs when passed as arguments.
This change adds the generic support in call lowering for dealing with multiple
registers, for incoming and outgoing args.
Support for splitting for return types not yet implemented.
Differential Revision: https://reviews.llvm.org/D66180
llvm-svn: 370822
Add lower for G_FPTOUI. Algorithm is similar to the SDAG version
in TargetLowering::expandFP_TO_UINT.
Lower G_FPTOUI for MIPS32.
Differential Revision: https://reviews.llvm.org/D66929
llvm-svn: 370431
AMDGPU uses this for some addressing mode selection patterns. The
analysis run itself doesn't do anything so it seems easier to just
always require this than adding a way to opt in.
llvm-svn: 370388
These are currently translated as normal functions calls in AArch64.
Until we have proper tail call lowering, we shouldn't translate these.
Differential Revision: https://reviews.llvm.org/D66842
llvm-svn: 370225
This change moves the actual stack pointer manipulation into the legalizer,
available to targets via lower(). The codegen is slightly different because
we're using explicit masks instead of G_PTRMASK, and using G_SUB rather than
adding a negative amount via G_GEP.
Differential Revision: https://reviews.llvm.org/D66678
llvm-svn: 370104
Main difference is in the way Hi for Long shift (HiL) is made.
G_LSHR fills HiL with zeros, while G_ASHR fills HiL with sign bit value.
Differential Revision: https://reviews.llvm.org/D66589
llvm-svn: 370064
Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS
to match names of surrounding variables.
Differential Revision: https://reviews.llvm.org/D66587
llvm-svn: 370062
Summary:
Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s
possible to combine them after processing the rest of the instruction because
the legalization is likely to generate more artifacts that allow ArtifactCombiner
to combine away them.
Instead, move illegal artifacts to another list called RetryList and wait until all of the
instruction in InstList are legalized. After that, check if there is any new artifacts and
try to combine them again if that’s the case. If not, abort. The idea is similar to D59339,
but the approach is a bit different.
This patch fixes the issue described above, but the legalizer still may be unable to handle
some cases depending on when to legalize artifacts. So, in the long run, we probably need
a different legalization strategy that handles this dependency in a better way.
Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette
Reviewed By: dsanders
Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65894
llvm-svn: 369805
The x86 tests are now broken (in paticular add-scalar.ll now hits the
DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a
custom lowering for this, so that will need to be implemented.
llvm-svn: 369673
This is necessary for handling <3 x s16> on AMDGPU, assuming this
should be handled as 2 separate legalization actions. The alternative
would be for fewerElementsVector to handle 3->2.
llvm-svn: 369547
Add NarrowScalar for G_TRUNC when NarrowTy is half the size of source.
NarrowScalar G_TRUNC to s32 for MIPS32.
Differential Revision: https://reviews.llvm.org/D66202
llvm-svn: 369509
https://reviews.llvm.org/D66077
The value passed into dbg.value may relate to multiple registers,
each of which need a DBG_VALUE.
This fix calls MIRBuilder.buildDirectDbgValue for each register.
Without this, IR passed in from flang-compiler/flang may fail an
assertion in getOrCreateVReg.
Patch by : peterwaller-arm.
llvm-svn: 369403
Again, it's weird that these are allowed. Since lowering support was added in
r368709 we started crashing on compiling the neon intrinsics test in the test
suite. This fixes the lowering to fold the 1 elt src/mask case into copies.
llvm-svn: 369135
Summary:
This patch adds G_GEP to `shouldCSEOpc` so that it can be CSEd. It also refactors
`translateGetElementPtr` by replacing `createGenericVirtualRegister` calls with types.
Reviewers: aditya_nandakumar, arsenm, dsanders, paquette, aemerson
Reviewed By: aditya_nandakumar
Subscribers: wdng, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66316
llvm-svn: 369070
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Currently shufflemasks get emitted as any other constant, and you end
up with a bunch of virtual registers of G_CONSTANT with a
G_BUILD_VECTOR. The AArch64 selector then asserts on anything that
doesn't fit this pattern. This isn't an ideal representation, and
should avoid legalization and have fewer opportunities for a
representational error.
Rather than invent a new shuffle mask operand type, similar to what
ShuffleVectorSDNode does, just track the original IR Constant mask
operand. I don't completely like the idea of adding another link to
the IR, but MIR is already quite dependent on IR constants already,
and this will allow sharing the shuffle mask utility functions with
the IR.
llvm-svn: 368704
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.
AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.
Differential Revision: https://reviews.llvm.org/D65984
llvm-svn: 368652
https://reviews.llvm.org/D66039
We were using getIndexSize instead of getIndexSizeInBits().
Added test case for G_PTRTOINT and G_INTTOPTR.
llvm-svn: 368618
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
%1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
%2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
%3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 23
All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.
To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
%1 = G_CONSTANT 16
%2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
%2 = G_SEXT_INREG %0, 16
or:
%1 = G_CONSTANT 16
%2:_(s32) = G_SHL %0, %1
%3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.
Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61289
llvm-svn: 368487
I've now needed to add an extra parameter to this call twice recently. Not only
is the signature getting extremely unwieldy, but just updating all of the
callsites and implementations is a pain. Putting the parameters in a struct
sidesteps both issues.
llvm-svn: 368408
https://reviews.llvm.org/D65698
This adds a KnownBits analysis pass for GISel. This was done as a
pass (compared to static functions) so that we can add other features
such as caching queries(within a pass and across passes) in the future.
This patch only adds the basic pass boiler plate, and implements a lazy
non caching knownbits implementation (ported from SelectionDAG). I've
also hooked up the AArch64PreLegalizerCombiner pass to use this - there
should be no compile time regression as the analysis is lazy.
llvm-svn: 368065
FastISel already does this since the initial arm64 port was upstreamed, so
it seems there are no issues with doing this at -O0 for very small memcpys.
Gives a 0.2% geomean code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D65758
llvm-svn: 367919
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jfb, jakehehrlich
Reviewed By: jfb
Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65514
llvm-svn: 367828
This is an old commit that exposed a bug in the GISel importer, which caused
non-truncating stores to be selected for truncating store patterns. Now that's
been fixed in r367737 this can go back in.
llvm-svn: 367739
AMDGPU sometimes has legal s16 and <2 x s16> operations, but all
registers are really 32-bit. An unmerge destination really should ben
widened to a 32-bit register. If widening a scalarizing vector with a
target size that matches the vector size, bitcast to integer and
extract the relevant bits with shifts.
I'm not sure if this is the right place for this. This could arguably
be part of widenScalar for the result. I also have a growing feeling
that we're missing a bitcast legalize action.
llvm-svn: 367604
Summary:
This will make it possible to improve IPRA by taking into account
register usage in indirect calls.
NFC yet; this is just laying the groundwork to start building
up patches to take advantage of the information for improved register
allocation.
Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette
Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65488
llvm-svn: 367476
This introduces a new family of combiner helper routines that re-use the
target specific cost model from SelectionDAG, and generate inline implementations
of the memcpy family of intrinsics.
The combines are only enabled at optimization levels higher than -O0, and give
very substantial performance improvements.
Differential Revision: https://reviews.llvm.org/D65167
llvm-svn: 366951
r366317 added a legalization for s128 G_ICMP narrow scalar which tried to hard
code the result type of the new legalized G_SELECT. Change this to instead use
type of the original G_ICMP result and allow the target to legalize it if necessary
later.
llvm-svn: 366943
I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't
do that unless we delay lowering to actual function calls. This patch changes
the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and
then have each target specify that using the new custom legalizer for intrinsics
hook that they want it expanded it a libcall.
Differential Revision: https://reviews.llvm.org/D64895
llvm-svn: 366516
Extract the sources to the GCD of the original size and target size,
padding with implicit_def as necessary.
Also fix the case where the requested source type is wider than the
original result type. This was ignoring the type, and just using the
destination. Do the operation in the requested type and truncate back.
llvm-svn: 366367
Use an anyext to the requested type for the leftover operand to
produce a slightly wider type, and then truncate the final merge.
I have another implementation almost ready which handles arbitrary
widens, but I think it produces worse code in this example (which I
think is 90% due to not folding redundant copies or folding out
implicit_def users), so I wanted to add this as a baseline first.
llvm-svn: 366366
Add narrowScalar to half of original size for G_ICMP.
ClampScalar G_ICMP's operands 2 and 3 to to s32.
Select G_ICMP for pointers for MIPS32. Pointer compare is same
as for integers, it is enough to declare them as legal type.
Differential Revision: https://reviews.llvm.org/D64856
llvm-svn: 366317
AMDGPU needs to allocate special argument registers separately from
the user function argument list, so needs direct control over the
CCState.
The ArgLocs argument is only really necessary because CCState doesn't
allow access to it.
llvm-svn: 366279
Since we have distinct types for pointers and scalars, G_INTTOPTRs can sometimes
obstruct attempts to find constant source values. These usually come about when
try to do some kind of null pointer check. Teaching getConstantVRegValWithLookThrough
about this operation allows the CBZ/CBNZ optimization to catch more cases.
This change also improves the case where we can't find a constant source at all.
Previously we would emit a cmp, cset and tbnz for that. Now we try to just emit
a cmp and conditional branch, saving an instruction.
The cumulative code size improvement of this change plus D64354 is 5.5% geomean
on arm64 CTMark -O0.
Differential Revision: https://reviews.llvm.org/D64377
llvm-svn: 365690
In SelectionDAG AMDGPU treated these as legal, but this was mostly
because the bitcasts required for FP types were painful. Theoretically
the bitpattern should eventually match to bfi, so don't bother trying
to get the patterns to import.
llvm-svn: 365583
If we have an icmp->brcond->br sequence where the brcond just branches to the
next block jumping over the br, while the br takes the false edge, then we can
modify the conditional branch to jump to the br's target while inverting the
condition of the incoming icmp. This means we can eliminate the br as an
unconditional branch to the fallthrough block.
Differential Revision: https://reviews.llvm.org/D64354
llvm-svn: 365510
Select gprb or fprb when def/use register operand of G_PHI is
used/defined by either:
copy to/from physical register or
instruction with only one mapping available for that use/def operand.
Integer s64 phi is handled with narrowScalar when mapping is applied,
produced artifacts are combined away. Manually set gprb to all register
operands of instructions created during narrowScalar.
Differential Revision: https://reviews.llvm.org/D64351
llvm-svn: 365494
If the requested source type an be used as a merge source type, create
a merge of merges. This avoids creating large, illegal extensions and
bit-ops directly to the result type.
llvm-svn: 364841
https://reviews.llvm.org/D31359
Add a hook "legalizeInstrinsic" to allow backends to override this
and custom lower/legalize intrinsics.
llvm-svn: 364821
Fix stack-use-after-scope errors from r364512. One instance was already
fixed in r364611 - this patch simplifies that fix and addresses one more
instance of similar code.
Discussed in: https://reviews.llvm.org/D63905
llvm-svn: 364778
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.
llvm-svn: 364613
This patch intends to fix ASAN stack-use-after-scope error.
This is at least a short-term fix to unbreak LLVM's mainline.
Differential Revision: https://reviews.llvm.org/D63905
llvm-svn: 364611
Remove the last use of packRegs from IRTranslator and delete
pack/unpackRegs. This introduces a fallback to DAGISel for intrinsics
with aggregate arguments, since we don't have a testcase for them so
it's hard to tell how we'd want to handle them.
Discussed in https://reviews.llvm.org/D63551
llvm-svn: 364514
Change the interface of CallLowering::lowerCall to accept several
virtual registers for each argument, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63551
llvm-svn: 364512
Change the interface of CallLowering::lowerCall to accept several
virtual registers for the call result, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63550
llvm-svn: 364511
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.
With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.
AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.
Mips doesn't support aggregates yet, so it's also NFC.
x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.
Differential Revision: https://reviews.llvm.org/D63549
llvm-svn: 364510
Allow CallLowering::ArgInfo to contain more than one virtual register.
This is useful when passes split aggregates into several virtual
registers, but need to also provide information about the original type
to the call lowering. Used in follow-up patches.
Differential Revision: https://reviews.llvm.org/D63548
llvm-svn: 364509
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
GlobalISel/IRTranslator.cpp now references SelectionDAG/FunctionLoweringInfo.cpp.
This fixes a link error in -DBUILD_SHARED_LIBS=on builds:
ld.lld: error: undefined symbol: llvm::FunctionLoweringInfo::clear()
>>> referenced by IRTranslator.cpp:2198 (../lib/CodeGen/GlobalISel/IRTranslator.cpp:2198)
>>> lib/CodeGen/GlobalISel/CMakeFiles/LLVMGlobalISel.dir/IRTranslator.cpp.o:(llvm::IRTranslator::finalizeFunction())
llvm-svn: 364124
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.
Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.
For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.
There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.
Actual arm64 support for selection of jump tables is coming in a later patch.
Differential Revision: https://reviews.llvm.org/D63169
llvm-svn: 364085
G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's
essentially a side effect free cast instruction we can remat both instructions.
This patch changes the localizer to enable localization of the chains by
iterating over the entry block instructions in reverse order. That way, uses will
localized first, and then the defs are free to be localized as well.
This also changes the previous SmallPtrSet of localized instructions to use a
SetVector instead. We're dealing with pointers and need deterministic iteration
order.
Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean.
Differential Revision: https://reviews.llvm.org/D63630
llvm-svn: 364001
Summary:
All the GlobalISel passes are initialized when the target calls
initializeGlobalISel(), so we don't need to call the initializers
from the pass constructors.
Reviewers: qcolombet, t.p.northover, paquette, dsanders, aemerson, aditya_nandakumar
Reviewed By: aemerson
Subscribers: rovka, kristof.beyls, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63235
llvm-svn: 363642
This was ignoring the flag on fneg, and using the source instruction's
flags. Also fixes tests missing from r358702.
Note the expansion itself isn't correct without nnan, but that should
be fixed separately.
llvm-svn: 363637
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.
The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.
One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.
Overall, these changes improve CTMark code size on arm64 by 1.2%.
Full code size results:
Program baseline new diff
------------------------------------------------------------------------------
test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6%
test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6%
test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4%
test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2%
test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2%
test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9%
test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0%
test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0%
Geomean difference -1.2%
Differential Revision: https://reviews.llvm.org/D63303
llvm-svn: 363632
Summary: This case is related to D63405 in that we need to be propagating FMF on negates.
Reviewers: volkan, spatel, arsenm
Reviewed By: arsenm
Subscribers: wdng, javed.absar
Differential Revision: https://reviews.llvm.org/D63458
llvm-svn: 363631
Summary:
Change the way we deal with iterator invalidation in the extload combines as it
was still possible to neglect to visit a use. Even worse, it happened in the
in-tree test cases and the checks weren't good enough to detect it.
We now take a cheap copy of the use list before iterating over it. This
prevents iterator invalidation from occurring and has the nice side effect
of making the existing schedule-for-erase/schedule-for-insert mechanism
moot.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61813
llvm-svn: 363616
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.
I fixed the same bug in SelectionDAG in r287593.
llvm-svn: 363580
This is a branch opcode that takes a jump table pointer, jump table index and an
index into the table to do an indirect branch.
We pass both the table pointer and JTI to allow targets like ARM64 to more
easily use the existing jump table compression optimization without having to
walk up the block to find a paired G_JUMP_TABLE.
Differential Revision: https://reviews.llvm.org/D63159
llvm-svn: 363434
Avoid producing illegal register bank copies for reg_sequence and
phi. The default implementation assumes it is possible to pick any
operand's bank and use that for the result, introducing a copy for
operands with a different bank. This does not check for illegal
copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR
operand requires the result to be a VGPR.
The changes in getInstrMappingImpl aren't strictly necessary, since
AMDGPU now just bypasses this for reg_sequence/phi. This could be
replaced with an assert in case other targets run into this. It is
currently responsible for producing the error for unsatisfiable
copies, but this will be better served with a verifier check.
For phis, for now assume any undetermined operands must be
VGPRs. Eventually, this needs to be able to defer mapping these
operations. This also does not yet have a way to check for whether the
block is in a divergent region.
llvm-svn: 363410
Avoid a check for valid and a set of redundant asserts. The place
InstructionMapping is constructed asserts all of the default fields
are passed anyway for an invalid mapping, so don't overcomplicate
this.
llvm-svn: 363391
Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which
lets us use the vreg def assuming it dominates all other users. However, it can
cause jumpy debug behaviour since the DebugLoc attached to these MIs are from
a user instruction that could be in a different block.
Fixes PR40887.
Differential Revision: https://reviews.llvm.org/D63286
llvm-svn: 363331
This opcode generates a pointer to the address of the jump table
specified by the source operand, which is a jump table index.
It will be used in conjunction with an upcoming G_BRJT opcode to support
jump table codegen with GlobalISel.
Differential Revision: https://reviews.llvm.org/D63111
llvm-svn: 363096
If the source is undef, then just don't do anything.
This matches SelectionDAG's behaviour in SelectionDAG.cpp.
Also add a test showing that we do the right thing here.
(irtranslator-memfunc-undef.ll)
Differential Revision: https://reviews.llvm.org/D63095
llvm-svn: 362989
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362128
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362012
avoid static check fail
RegClassOrBank is an object of RegClassOrRegBank, which is defined as
using llvm::RegClassOrRegBank = typedef PointerUnion<const
TargetRegisterClass *, const RegisterBank *>
so control flow can not get here. Use ""llvm_unreachable" here to avoid
"null pointer" confusion.
Patch by Shengchen Kan (skan)
Differential Revision: https://reviews.llvm.org/D62006
Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 361912
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.
llvm-svn: 361608
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.
This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.
llvm-svn: 361081
When breaking up loads and stores of aggregates, the IRTranslator uses
LLT::scalar(64) for the index type of the G_GEP instructions that
compute the addresses. This is unnecessarily large for 32-bit targets.
Use the int ptr type provided by the DataLayout instead.
Note that we're already doing the right thing when translating
getelementptr instructions from the IR. This is just an oversight when
generating new ones while translating loads/stores.
Both x86 and AArch64 already have tests confirming that the old
behaviour is preserved for 64-bit targets.
Differential Revision: https://reviews.llvm.org/D61852
llvm-svn: 360656
We use to incorrectly use the store size instead of the alloc size when
creating the stack slot for allocas.
On aarch64 this can be demonstrated by allocating weirdly sized types.
For instance, in the added test case, we use an alloca for i19. We used
to allocate a slot of size 24-bit (19 rounded up to the next byte),
whereas we really want to use a full 32-bit slot for this type.
llvm-svn: 359856
* LegalizeAction should be printed by name rather than number
* Newly created instructions are incomplete at the point the observer first sees
them. They are therefore recorded in a small vector and printed just before
the legalizer moves on to another instruction. By this point, the instruction
must be complete.
llvm-svn: 359481
When constrainRegClass is called if the constraining happens on a use the COPY
needs to be inserted before the instruction that contains the MachineOperand,
but if we are constraining a definition it actually needs to be added
after the instruction. In addition, the COPY needs to have its operands
flipped (in the use case we are copying from the old unconstrained register
to the new constrained register, while in the definition case we are copying
from the new constrained register that the instruction defines to the old
unconstrained register).
llvm-svn: 359282
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.
Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.
llvm-svn: 359204
Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update
arm64-irtranslator.ll.
Differential Revision: https://reviews.llvm.org/D60922
llvm-svn: 359203
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
Same patch as G_FCEIL etc.
Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.
llvm-svn: 359021
Exactly the same as G_FCEIL, G_FABS, etc.
Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.
Differential Revision: https://reviews.llvm.org/D60895
llvm-svn: 358799
This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc.
Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change.
Differential Revision: https://reviews.llvm.org/D60218
llvm-svn: 358764
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.
This matches how SelectionDAG handles these operations.
Differential Revision: https://reviews.llvm.org/D59971
llvm-svn: 358613
Since non-pow-2 types are going to get split up into multiple loads anyway,
don't do the [SZ]EXTLOAD combine for those and save us trouble later in
legalization.
llvm-svn: 358458
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.
This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.
I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.
Compile time:
Program base cse diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8%
test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7%
test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0%
Geomean difference -0.2%
Code size:
Program base cse diff
test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0%
Geomean difference -0.9%
Differential Revision: https://reviews.llvm.org/D60580
llvm-svn: 358369
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.
This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.
llvm-svn: 358368
This enables the simple copy combine that already exists in the CombinerHelper.
However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a
set of MIs to process, and so would end up causing a crash when deleted MIs were
being added to the combiner worklist again.
Differential Revision: https://reviews.llvm.org/D60579
llvm-svn: 358318
This crash was introduced in r358032 as we try to construct an EVT from an MVT
in order to find the register type for the calling conv. Fall back instead of
trying to do this with an invalid MVT coming from i256.
llvm-svn: 358314
Call lowering should use this directly instead of going through the
EVT version, but more work is needed to deal with this (mostly the
passing of the IR type pointer instead of the relevant properties in
ArgInfo).
llvm-svn: 358111
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.
Differential Revision: https://reviews.llvm.org/D60425
llvm-svn: 358032
Create method `optForNone()` testing for the function level equivalent of
`-O0` and refactor appropriately.
Differential revision: https://reviews.llvm.org/D59852
llvm-svn: 357638
Same as G_EXP. Add a test, and update legalizer-info-validation.mir and
f16-instructions.ll.
Differential Revision: https://reviews.llvm.org/D60165
llvm-svn: 357605
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).
This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.
As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.
Differential Revision: https://reviews.llvm.org/D59892
llvm-svn: 357101
The AArch64 test was broken since the result register already had a
set register class, so this test was a no-op. The mapping verify call
would fail because the result size is not the same as the inputs like
in a copy or phi.
The AMDGPU testcases are half broken and introduce illegal VGPR->SGPR
copies which need much more work to handle correctly (same for phis),
but add them as a baseline.
llvm-svn: 356713
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().
For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.
This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.
llvm-svn: 356396
This relaxes some asserts about sizes, and adds an optional subreg parameter
to buildCopy().
Also update AArch64 instruction selector to use this in places where we
previously used MachineInstrBuilder manually.
Differential Revision: https://reviews.llvm.org/D59434
llvm-svn: 356304
This is consistent with what SelectionDAG does and is much easier to
work with than the extract sequence with an artificial wide register.
For the AMDGPU control flow intrinsics, this was producing an s128 for
the i64, i1 tuple return. Any legalization that should apply to a real
s128 value would badly obscure the direct values that need to be seen.
llvm-svn: 356147
getConstantVRegVal used to only look for G_CONSTANT when looking at
unboxing the value of a vreg. However, constants are sometimes not
directly used and are hidden behind trunc, s|zext or copy chain of
computation.
In particular this may be introduced by the legalization process that
doesn't want to simplify these patterns because it can lead to infine
loop when legalizing a constant.
To circumvent that problem, add a new variant of getConstantVRegVal,
named getConstantVRegValWithLookThrough, that allow to look through
extensions.
Differential Revision: https://reviews.llvm.org/D59227
llvm-svn: 356116
Overloaded intrinsics aren't necessarily safe for instruction selection. One
such intrinsic is aarch64.neon.addp.*.
This is a temporary workaround to ensure that we always fall back on that
intrinsic. Eventually this will be replaced with a proper solution.
https://bugs.llvm.org/show_bug.cgi?id=40968
Differential Revision: https://reviews.llvm.org/D59062
llvm-svn: 355865
The control flow here cannot ever use the uninitialized value, but it's
too hard for the compiler to figure that out. Clang warns:
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2600:28: error: variable 'CarrySum' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized]
for (unsigned i = 2; i < Factors.size(); ++i)
^~~~~~~~~~~~~~~~~~
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2604:26: note: uninitialized use occurs here
CarrySumPrevDstIdx = CarrySum;
^~~~~~~~
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2600:28: note: remove the condition if it is always true
for (unsigned i = 2; i < Factors.size(); ++i)
^~~~~~~~~~~~~~~~~~
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2583:22: note: initialize the variable 'CarrySum' to silence this warning
unsigned CarrySum;
^
= 0
llvm-svn: 355818
Narrow Scalar G_MUL for MIPS32.
Revisit NarrowScalar implementation in LegalizerHelper.
Introduce new helper function multiplyRegisters.
It performs generic multiplication of values held in multiple registers.
Generated instructions use only types NarrowTy and i1.
Destination can be same or two times size of the source.
Differential Revision: https://reviews.llvm.org/D58824
llvm-svn: 355814
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.
llvm-svn: 354828
For AMDGPU, if an operand requires an SGPR but is only available as a
VGPR, a loop needs to be introduced to execute the instruction with
each unique combination of values across all lanes. The rest of the
instructions in the block will be moved to a new block following the
loop. Check if the next instruction's parent changed, and update the
iterators and insertion block if this happened.
Tests will be included in a future patch.
llvm-svn: 354591
Legalize/select llvm.ctlz.*
Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and
arm64-vclz.ll to show that we perform valid transformations in optimized builds,
and document where GISel can improve.
Differential Revision: https://reviews.llvm.org/D58155
llvm-svn: 354299
https://reviews.llvm.org/D58073
Speed up insertion during the initial populating phase into the
GISelWorkList by deferring repeatedly resizing the DenseMap.
This results in ~10% improvement in the combiner passes, and
~3% speedup in the Legalizer.
reviewed by: aemerson.
llvm-svn: 354093
Select G_BR and G_BRCOND for MIPS32.
Unconditional branch G_BR does not have register operand,
for that reason we only add tests.
Since conditional branch G_BRCOND compares register to zero on MIPS32,
explicit extension must be performed on i1 condition in order to set
high bits to appropriate value.
Differential Revision: https://reviews.llvm.org/D58182
llvm-svn: 354022
Summary:
The declarative tablegen definitions split rules into match and apply steps.
Prepare for that by doing the same in the C++ implementations. This aids
some of the migration effort while the tablegen version is incomplete.
Reviewers: bogner, volkan, aditya_nandakumar, paquette, aemerson
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, Petar.Avramovic, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58150
llvm-svn: 353996
Instead of only having this code work for unary intrinsics, have it work for
an arbitrary number of parameters.
Factor out the cases that fall under this (fma, pow).
This makes it a bit easier to add more intrinsics which don't require any
special work.
Differential Revision: https://reviews.llvm.org/D58079
llvm-svn: 353863
This teaches the IRTranslator to emit G_BSWAP when it runs into
Intrinsic::bswap. This allows us to select G_BSWAP for non-vector types in
AArch64.
Add a select-bswap.mir test, and add global isel checks to a couple existing
tests in test/CodeGen/AArch64.
This doesn't handle every bswap case, since some of these rely on known bits
stuff. This just lets us handle the naive case.
Differential Revision: https://reviews.llvm.org/D58081
llvm-svn: 353861
This teaches the legalizer about G_FFLOOR, and lets us select G_FFLOOR in
AArch64.
It updates the existing floating point tests, and adds a select-floor.mir test.
Differential Revision: https://reviews.llvm.org/D57486
llvm-svn: 353722
After the changes introduced in r353586, this instruction doesn't cause any
issues for any backend.
Original review: https://reviews.llvm.org/D57485
llvm-svn: 353720
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and
G_SEXTLOAD. That is perform widenScalarDst to size given by the target
and avoid additional checks in common code. Targets can reorder or add
additional rules in LegalizeRuleSet for the opcode to achieve desired
behavior.
Select extending load that does not have specified type of extension
into zero extending load.
Select truncating store that stores number of bytes indicated by size
in MachineMemoperand.
Differential Revision: https://reviews.llvm.org/D57454
llvm-svn: 353520
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.
This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.
llvm-svn: 353455
Introduce a new function which handles instructions with multiple type
indices, but have the same number of vector elements.
Also legalize v2s16 shifts when applicable.
llvm-svn: 353432
There was a lot of repeated code wrt unary math intrinsics in
translateKnownIntrinsic. This factors out the repeated MIRBuilder code into
two functions: translateSimpleUnaryIntrinsic and getSimpleUnaryIntrinsicOpcode.
This simplifies adding simple unary intrinsics, since after this, all you have
to do is add the mapping to SimpleUnaryIntrinsicOpcodes.
Differential Revision: https://reviews.llvm.org/D57774
llvm-svn: 353316
The fewerElementsVectors implementation for load/stores
handles the scalar reduction case just as well, so drop
the redundant code in narrowScalar. This also introduces
support for narrowing irregular size breakdowns for
scalars.
llvm-svn: 353125
Don't handle vector conditions.
I think this can be merged in the future with
fewerElementsVectorSelect, although this becomes slightly tricky with
a vector condition.
llvm-svn: 353122
Try to use the underlying source registers.
This enables legalization in more cases where some irregular
operations are widened and others narrowed.
This seems to make the test_combines_2 AArch64 test worse, since the
MERGE_VALUES has multiple uses. Since this should be required for
legalization, a hasOneUse check is probably inappropriate (or maybe
should only be used if the merge is legal?).
llvm-svn: 353121
This fixes two problems with CSE done in buildConstant. First, this
would hit an assert when used with a vector result type. Solve this by
allowing CSE on the vector elements, but not on the result vector for
now.
Second, this was also performing the CSE based on the input
ConstantInt pointer. The underlying buildConstant could potentially
convert the constant depending on the result type, giving in a
different ConstantInt*. Stop allowing the APInt and ConstantInt forms
from automatically casting to the result type to avoid any similar
problems in the future.
llvm-svn: 353077
This reverts commit 8bbd570fd5205a04d88d2e5513a6e4adbd028039.
Apparently adding ffloor breaks AMDGPU somehow, so I need to back this out
while I look into it.
llvm-svn: 353064
For the scalar case only.
Also move the similar G_MERGE_VALUES handling to a separate function
and cleanup to make them look more similar.
llvm-svn: 352979
Also fix an alignment bug getMachineMemOperand. If the
tracked value is null, the offset isn't tracked so the
base alignment needs to be reduced.
llvm-svn: 352716
This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also
allows us to select G_FEXP.
It...
- Updates the legalizer-info tests
- Adds a test for legalizing exp
- Updates the existing fp tests to show that we can now select G_FEXP
https://reviews.llvm.org/D57483
llvm-svn: 352692
This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters
it.
It updates the existing floating point tests to show that we don't fall back on
the intrinsic, and select the correct instructions. It also adds a legalizer
test for G_FLOG2.
https://reviews.llvm.org/D57357
llvm-svn: 352673
This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer
test for G_FSQRT, a selection test for it, and updates existing floating point
tests.
https://reviews.llvm.org/D57361
llvm-svn: 352671
I've repeatedly encountered bugs resulting from custom legalize
mutations returning nonsense legalize results, such as increasing the
number of elements for FewerElements. Add an assert function to make
sure the type to mutate to is consistent with the legalize action.
llvm-svn: 352636
This was ignoring the memory size, and producing multiple loads/stores
if the operand size was different from the memory size.
I assume this is the intent of not having an explicit G_ANYEXTLOAD
(although I think that would probably be better).
llvm-svn: 352523
This adds support for legalizing G_FLOG into a RTLib call.
It adds a legalizer test, and updates the existing floating point tests.
https://reviews.llvm.org/D57347
llvm-svn: 352429
This adds instruction selection support for @llvm.log10 in AArch64. It teaches
GISel to lower it to a library call, updates the relevant tests, and adds a
legalizer test for log10.
https://reviews.llvm.org/D57341
llvm-svn: 352418
This adds ISel support for lifetime markers in opt levels above O0.
It also updates the arm64-irtranslator test, and updates some AArch64 tests that
use them for added coverage.
It also adds a testcase taken from the X86 codegen tests which verified a bug
caused by lifetime markers + stack colouring in the past. This is intended to
make sure that GISel doesn't re-introduce the bug.
(This is basically a straight copy from what SelectionDAG does in
SelectionDAGBuilder.cpp)
https://reviews.llvm.org/D57187
llvm-svn: 352410
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.
https://reviews.llvm.org/D57197
3/3
llvm-svn: 352402
Lower G_USUBO and G_USUBE. Add narrowScalar for G_SUB.
Legalize and select G_SUB for MIPS 32.
Differential Revision: https://reviews.llvm.org/D53416
llvm-svn: 352351
This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an
extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any
extension, by avoiding touching those < s8 size loads.
This bug was uncovered by a verifier update r351584, which I reverted it to keep
the bots green.
llvm-svn: 352311
https://reviews.llvm.org/D57178
Now add a hook in TargetPassConfig to query if CSE needs to be
enabled. By default this hook returns false only for O0 opt level but
this can be overridden by the target.
As a consequence of the default of enabled for non O0, a few tests
needed to be updated to not use CSE (by passing in -O0) to the run
line.
reviewed by: arsenm
llvm-svn: 352126
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.
To do this, this patch...
- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors
It also adds
- A legalizer test for the 16-bit vector ceil which verifies that we create a
G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
full fp16 is supported
- A test for selecting G_UNMERGE_VALUES
And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.
https://reviews.llvm.org/D56682
llvm-svn: 352113
For AMDGPU the shift amount is never 64-bit, and
this needs to use a 32-bit shift.
X86 uses i8, but seemed to be hacking around this before.
llvm-svn: 351882
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
https://reviews.llvm.org/D52803
This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.
llvm-svn: 351283
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.
I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).
Also legalize brcond for AMDGPU.
llvm-svn: 350595
Add widen scalar for type index 1 (i1 condition) for G_SELECT.
Select G_SELECT for pointer, s32(integer) and smaller low level
types on MIPS32.
Differential Revision: https://reviews.llvm.org/D56001
llvm-svn: 350063
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.
This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.
llvm-svn: 349927
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.
It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.
llvm-svn: 349664
Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.
Reviewers: aditya_nandakumar, bogner
Reviewed By: bogner
Subscribers: rovka, kristof.beyls, javed.absar
Differential Revision: https://reviews.llvm.org/D55668
llvm-svn: 349514
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.
Differential Revision: https://reviews.llvm.org/D55651
llvm-svn: 349499
Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32
with clampScalar on MIPS32.
Differential Revision: https://reviews.llvm.org/D55362
llvm-svn: 349475
Summary:
This allows us to register it with the MachineFunction delegate and be
notified automatically about erasure and creation of instructions. However,
we still need explicit notification for modifications such as those caused
by setReg() or replaceRegWith().
There is a catch with this though. The notification for creation is
delivered before any operands can be added. While appropriate for
scheduling combiner work. This is unfortunate for debug output since an
opcode by itself doesn't provide sufficient information on what happened.
As a result, the work list remembers the instructions (when debug output is
requested) and emits a more complete dump later.
Another nit is that the MachineFunction::Delegate provides const pointers
which is inconvenient since we want to use it to schedule future
modification. To resolve this GISelWorkList now has an optional pointer to
the MachineFunction which describes the scope of the work it is permitted
to schedule. If a given MachineInstr* is in this function then it is
permitted to schedule work to be performed on the MachineInstr's. An
alternative to this would be to remove the const from the
MachineFunction::Delegate interface, however delegates are not permitted
to modify the MachineInstr's they receive.
In addition to this, the observer has three interface changes.
* erasedInstr() is now erasingInstr() to indicate it is about to be erased
but still exists at the moment.
* changingInstr() and changedInstr() have been added to report changes
before and after they are made. This allows us to trace the changes
in the debug output.
* As a convenience changingAllUsesOfReg() and
finishedChangingAllUsesOfReg() will report changingInstr() and
changedInstr() for each use of a given register. This is primarily useful
for changes caused by MachineRegisterInfo::replaceRegWith()
With this in place, both combine rules have been updated to report their
changes to the observer.
Finally, make some cosmetic changes to the debug output and make Combiner
and CombinerHelp
Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar
Reviewed By: aditya_nandakumar
Subscribers: mgorny, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D52947
llvm-svn: 349167
Summary:
In addition to knowing that an instruction is changed. It's also useful to
know when it's about to change. For example, it might print the instruction so
you can track the changes in a debug log, it might remove it from some queue
while it's being worked on, or it might want to change several instructions as
a single transaction and act on all the changes at once.
Added changingInstr() to all existing uses of changedInstr()
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55623
llvm-svn: 348992
Summary:
There's little of interest that can be done to an already-erased instruction.
You can't inspect it, write it to a debug log, etc. It ought to be notification
that we're about to erase it. Rename the function to clarify the timing of the
event and reflect current usage.
Also fixed one case where we were trying to print an erased instruction.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55611
llvm-svn: 348976
https://reviews.llvm.org/D55516
Add the ability to pass in flags to buildInstr calls. Currently no
validation is performed but that can be easily performed based on the
opcode (if necessary).
Reviewed by: paquette.
llvm-svn: 348893
https://reviews.llvm.org/D55294
Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as
B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to
buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like
B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.
Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.
llvm-svn: 348815
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788
Record the stack protector index in MachineFrameInfo when translating
Intrinsic::stackprotector similarly as is done by SelectionDAG when
processing the same intrinsic.
Setting this index allows the Prologue/Epilogue Insertion to recognize
that the stack protection is enabled. The pass can then make sure that
the stack protector comes before local variables on the stack and
assigns potentially vulnerable objects first so they are close to the
stack protector slot.
Differential Revision: https://reviews.llvm.org/D55418
llvm-svn: 348761
These opcodes are intended to subsume some of the capability of G_MERGE_VALUES,
as it was too powerful and thus complex to add deal with throughout the GISel
pipeline.
G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed
scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar
operands which are larger than the destination vector element type, and
therefore does an implicit truncate.
G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed,
vectors together.
These will be used in a subsequent commit. This commit just adds the initial
infrastructure.
Differential Revision: https://reviews.llvm.org/D53594
llvm-svn: 348430
https://reviews.llvm.org/D54980
This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.
Reviewed by: vkeles.
llvm-svn: 348406
We can now select CLZ via the TableGen'erated code, so support G_CTLZ
and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32.
Legalizer:
If the CLZ instruction is available, use it for both G_CTLZ and
G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and
lower G_CTLZ in terms of it.
In order to achieve this we need to add support to the LegalizerHelper
for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2).
We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF
if that is supported as a libcall, as opposed to just if it is Legal or
Custom. Due to a minor refactoring of the helper function in charge of
this, we will also allow the same behaviour for G_CTTZ and G_CTPOP.
This is not going to be a problem in practice since we don't yet have
support for treating G_CTTZ and G_CTPOP as libcalls (not even in
DAGISel).
Reg bank select:
Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point.
Instruction select:
Nothing to do.
llvm-svn: 347545
The IEEE-754 Standard makes it clear that fneg(x) and
fsub(-0.0, x) are two different operations. The former is a bitwise
operation, while the latter is an arithmetic operation. This patch
creates a dedicated FNeg IR Instruction to model that behavior.
Differential Revision: https://reviews.llvm.org/D53877
llvm-svn: 346774
This adds the llvm-side support for post-inlining evaluation of the
__builtin_constant_p GCC intrinsic.
Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call
to a function where canConstantFoldTo returns true, and one of the
arguments is a struct.
Updated from patch initially by Janusz Sobczak.
Differential Revision: https://reviews.llvm.org/D4276
llvm-svn: 346322
https://reviews.llvm.org/D53304
Currently dead phis are not cleaned up during DCE. This patch allows
dead PHI and G_PHI insts to be deleted.
Reviewed by: dsanders
llvm-svn: 344811
Port over the implementation in SelectionDAGBuilder.cpp into the IRTranslator
and update the arm64-irtranslator test.
These were causing fallbacks in CTMark/Bullet (-Rpass-missed=gisel-select),
and this patch fixes that.
https://reviews.llvm.org/D52945
llvm-svn: 343885
The simplest instance of this is an intrinsic with no results which will have the
intrinsic ID as operand 0.
Also fix some benign incorrectness when op0 is a reg but isn't a def that was
guarded against by checking for the extension opcodes.
llvm-svn: 343821
This brings the extending loads patch back to the original intent but minus the
PHI bug and with another small improvement to de-dupe truncates that are
inserted into the same block.
The truncates are sunk to their uses unless this would require inserting before a
phi in which case it sinks to the _beginning_ of the predecessor block for that
path (but no earlier than the def).
The reason for choosing the beginning of the predecessor is that it makes de-duping
multiple truncates in the same block simple, and optimized code is going to run a
scheduler at some point which will likely change the position anyway.
llvm-svn: 343804
This fixes a problem where the register allocator fails to eliminate a PHI
because there's a non-PHI in the middle of the PHI instructions at the start
of a BB.
This G_TRUNC can be better placed but this at least fixes the correctness issue
quickly. I'll follow up with a patch to the verifier to catch this kind of bug
in future.
llvm-svn: 343693
Summary: Depends on D45541
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson
Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45543
The previous commit failed portions of the test-suite on GreenDragon due to
duplicate COPY instructions and iterator invalidation. Both issues have now
been fixed. To assist with this, a helper (cloneVirtualRegister) has been added
to MachineRegisterInfo that can be used to get another register that has the same
type and class/bank as an existing one.
llvm-svn: 343654
There's a strange assertion on two of the Green Dragon bots that goes away when
this is reverted. The assertion is in RegBankAlloc and if it is this commit then
-verify-machine-instrs should have caught it earlier in the pipeline.
llvm-svn: 343546
Clang-cl was complaining about some sort of constexpr narrowing bug:
C:\src\llvm-project\llvm\lib\CodeGen\GlobalISel\CombinerHelper.cpp(136,31): error: non-constant-expression cannot be narrowed from type 'llvm::TargetOpcode::(anonymous enum at C:\src\llvm-project\llvm\include\llvm/CodeGen/TargetOpcodes.h:22:1)' to 'unsigned int' in initializer list [-Wc++11-narrowing]
unsigned(MI.getOpcode()) == unsigned(TargetOpcode::G_LOAD)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\src\llvm-project\llvm\lib\CodeGen\GlobalISel\CombinerHelper.cpp(136,31): note: insert an explicit cast to silence this issue
unsigned(MI.getOpcode()) == unsigned(TargetOpcode::G_LOAD)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
static_cast<unsigned int>(
llvm-svn: 343541
https://reviews.llvm.org/D51147
Asserting if any extend of vectors should be up to the target's
legalizer/target specific code not in CallLowering.
reviewed by : dsanders.
llvm-svn: 343325
Summary:
We have `llvm::addLandingPadInfo` and `MachineFunction::addLandingPad`,
both of which add landing pad information to populate `LandingPadInfo`
but are called from different locations, which was confusing. This patch
unifies them with one `MachineFunction::addLandingPad` function, which
now has functionlities of both functions.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52428
llvm-svn: 343018
Summary: This patch adds a GlobalIsel copy utility into MI for flags and updates the instruction emitter for the SDAG path. Some tests show new behavior and I added one for GlobalIsel which mirrors an SDAG test for handling nsw/nuw.
Reviewers: spatel, wristow, arsenm
Reviewed By: arsenm
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D52006
llvm-svn: 342576
https://reviews.llvm.org/D51197
Currently, IRTranslator (and GISel) seems to be arbitrarily picking
which overflow intrinsics get mapped into opcodes which either have a
carry as an input or not.
For intrinsics such as Intrinsic::uadd_with_overflow, translate it to an
opcode (G_UADDO) which doesn't have any carry inputs (similar to LLVM
IR).
This patch adds 4 missing opcodes for completeness - G_UADDO, G_USUBO,
G_SSUBE and G_SADDE.
llvm-svn: 340865
The core get and set routines move to the `Instruction` class. These
routines are only valid to call on instructions which are terminators.
The iterator and *generic* range based access move to `CFG.h` where all
the other generic successor and predecessor access lives. While moving
the iterator here, simplify it using the iterator utilities LLVM
provides and updates coding style as much as reasonable. The APIs remain
pointer-heavy when they could better use references, and retain the odd
behavior of `operator*` and `operator->` that is common in LLVM
iterators. Adjusting this API, if desired, should be a follow-up step.
Non-generic range iteration is added for the two instructions where
there is an especially easy mechanism and where there was code
attempting to use the range accessor from a specific subclass:
`indirectbr` and `br`. In both cases, the successors are contiguous
operands and can be easily iterated via the operand list.
This is the first major patch in removing the `TerminatorInst` type from
the IR's instruction type hierarchy. This change was discussed in an RFC
here and was pretty clearly positive:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123407.html
There will be a series of much more mechanical changes following this
one to complete this move.
Differential Revision: https://reviews.llvm.org/D47467
llvm-svn: 340698
This reverts commit d1341152d91398e9a882ba2ee924147ea2f9b589.
This patch originally made use of Nested MachineIRBuilder buildInstr
calls, and since order of argument processing is not well defined, the
instructions were built slightly in a different order (still correct).
I've removed the nested buildInstr calls to have a defined order now.
Patch was tested by Mikael.
llvm-svn: 340309
This reverts commit 7debc334e6421bb5251ef8f18e97166dfc7dd787.
I missed updating legalizer-info-validation.mir as I had assertions
turned off in my build and that specific test requires asserts. Fixed it
now.
llvm-svn: 340197
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
llvm-svn: 340039
https://reviews.llvm.org/D50401
Add opcodes for llvm.intrinsic.trunc, round, and update the IRTranslator
for the same.
Reviewed by: dsanders.
llvm-svn: 339977
a generically extensible collection of extra info attached to
a `MachineInstr`.
The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.
Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.
I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).
Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.
This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.
The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.
Differential Revision: https://reviews.llvm.org/D50701
llvm-svn: 339940
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
llvm-svn: 339676
Previously we were just visiting the blocks in the function in IR order, which
is rather arbitrary. Therefore we wouldn't always visit defs before uses, but
the translation code relies on this assumption in some places.
Only codegen change seen in tests is an elision of a redundant copy.
Fixes PR38396
llvm-svn: 338476
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
llvm-svn: 338390
Summary:
This is a follow-up to r303043. In computeMapping(), we need to disqualify an
InstrMapping if it would be impossible to repair one of the registers in the
instruction to match the mapping.
This change is needed in order to be able to define an instruction
mapping for G_SELECT for the AMDGPU target and will be tested
by test/CodeGen/AMDGPU/GlobalISel/regbankselect-select.mir
Reviewers: ab, qcolombet, t.p.northover, dsanders
Reviewed By: qcolombet
Subscribers: tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D49735
llvm-svn: 337882
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
llvm-svn: 336964
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
llvm-svn: 336929
Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.
I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D40092
llvm-svn: 336589
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.
Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.
llvm-svn: 335767
Before we were relying on the any extend of the s1 to s32, but
for AAPCS we need to zero-extend it to at least s8.
Fixes PR36719
Differential Revision: https://reviews.llvm.org/D47425
llvm-svn: 333747
This commit adds a simple verifier that tracks type indices being
touched by legalization rules' builders.
Every target will now have an opportunity to call
LegalizerInfo::verify(...) at the end of its derived LegalizerInfo's
constructor and check there are no obvious mistakes like checking only
first type for an opcode that has more than one type index and therefore
implicitly declaring any type for the second (and higher) type index
legal.
The check is only ran in assert builds and should have very minor
performance impact in assert builds and none in release builds.
This commit does not add LegalizerInfo::verify(...) calls to
target-specific legalizers, look for separate commits for that.
This commit also doesn't make the verification errors fatal, only
produces an error message, look for a later commit that does.
Reviewers: aemerson, qcolombet
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D46338
llvm-svn: 333576
by replacing DenseMap with IndexedMap for LLTs within MRI, as
benchmarked by cross-compiling sqlite3 amalgamation for AArch64
on x86 machine.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D46809
llvm-svn: 333125
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.
This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.
A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.
Patch is based on the original work of Tim Northover.
Differential Revision: https://reviews.llvm.org/D46018
llvm-svn: 332449
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its
value as a (small) unsigned integer, therefore its incorrect to widen it
in any way but by zero extending it.
G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their
destination and first source operands, but not the "number of bits to
shift" operand).
Generally, shifts aren't as similar to regular binary operations as it
might seem, for instance, they aren't commutative nor associative and
the second source operand usually requires a special treatment.
Reviewers: bogner, javed.absar, aivchenk, rovka
Reviewed By: bogner
Subscribers: igorb, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46413
llvm-svn: 331926
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.
llvm-svn: 331846
Fix a silly mistake in my pre-commit changes for r331816. It should check what
opcode the insn is before extracting the operands.
NFC at the moment since the caller already checked the opcode.
llvm-svn: 331820
Refactoring LegalizerHelper::widenScalar member function reducing its
size by approximately a factor of 2 and (hopefuly) making it more
straightforward and regular by introducing widenScalarSrc and
widenScalarDst helper methods.
The new widenScalar* methods mutate the instructions in place instead
of recreating them from scratch and removing the originals. The
compile time implications of this were measured on sqlite3
amalgamation, targeting AArch64 in -O0:
LegalizerHelper::widenScalar: > 25% faster
Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster
Also adding MachineOperand::setCImm and refactoring out
MachineIRBuilder::recordInsertion methods to make the change possible.
Reviewers: aditya_nandakumar, bogner, javed.absar, t.p.northover, ab, dsanders, arsenm
Reviewed By: aditya_nandakumar
Subscribers: wdng, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46414
llvm-svn: 331819
Summary:
constrainOperandRegClass() currently fails if it tries to constrain the
register class of an operand that is defeined with an unallocatable register
class. This patch resolves this by adding a target callback to compute
register constriants in this case.
This is required by the AMDGPU because many of its instructions have source opreands
defined with the unallocatable register classe VS_32 which is a union of two allocatable
register classes VGPR_32 and SReg_32.
Reviewers: dsanders, aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D45991
llvm-svn: 331485
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.
llvm-svn: 331201
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
improve optimization of extends and truncates, this legality requirement
would spread without considerable care w.r.t when certain combines were
permitted.
* The SelectionDAG importer required some ugly and fragile pattern
rewriting to translate patterns into this style.
This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.
This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.
Depends on D45466
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45540
llvm-svn: 331115
Some of the bots were failing in a different way to the others. These were
unable to compare tuples. Fix this by changing to a struct, thereby avoiding
the quirks of tuples.
llvm-svn: 331081
Summary:
Currently only the memory size is supported but others can be added as
needed.
narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45466
llvm-svn: 331071
The main goal of this change is to make it much easier to track which
rules are actually covered by Testgen'erated regression tests.
Reviewers: aemerson, dsanders
Differential Revision: https://reviews.llvm.org/D46095
llvm-svn: 330988
Lower is slightly odd. It often doesn't change the type but the lowerings
do use the new type to decide what code to create. Treat it like a mutation
but provide convenience functions that re-use the existing type.
Re-uses the existing tests:
test/CodeGen/AArch64/GlobalISel/legalize-rem.mir
test/CodeGen/AArch64/GlobalISel//legalize-mul.mir
test/CodeGen/AArch64/GlobalISel//legalize-cmpxchg-with-success.mir
llvm-svn: 329623
building.
https://reviews.llvm.org/D45067
This change attempts to do two things:
1) It separates out the state that is stored in the
MachineIRBuilder(InsertionPt, MF, MRI, InsertFunction etc) into a
separate object called MachineIRBuilderState.
2) Add the ability to constant fold operations while building instructions
(optionally). MachineIRBuilder is now refactored into a MachineIRBuilderBase
which contains lots of non foldable build methods and their implementation.
Instructions which can be constant folded/transformed are now in a class
called FoldableInstructionBuilder which uses CRTP to use the implementation
of the derived class for buildBinaryOps. Additionally buildInstr in the derived
class can be used to implement other kinds of transformations.
Also because of separation of state, given a MachineIRBuilder in an API,
if one wishes to use another MachineIRBuilder, a new one can be
constructed from the state locally. For eg,
void doFoo(MachineIRBuilder &B) {
MyCustomBuilder CustomB(B.getState());
// Use CustomB for building.
}
reviewed by : aemerson
llvm-svn: 329596
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: bogner, rnk, MatzeB, RKSimon
Reviewed By: rnk
Subscribers: JDevlieghere, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45133
llvm-svn: 329435
Added helpers to build G_FCONSTANT, along with matching ConstantFP and
unit tests for the same.
Sample usage.
auto MIB = Builder.buildFConstant(s32, 0.5); // Build IEEESingle
For Matching the above
const ConstantFP* Tmp;
mi_match(DstReg, MRI, m_GFCst(Tmp));
https://reviews.llvm.org/D44128
reviewed by: volkan
llvm-svn: 327152
Summary:
Fabs is a common floating-point operation, especially for some expansions. This patch adds
a new generic opcode for llvm.fabs.* intrinsic in order to avoid building/matching this intrinsic.
Reviewers: qcolombet, aditya_nandakumar, dsanders, rovka
Reviewed By: aditya_nandakumar
Subscribers: kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D43864
llvm-svn: 326749
Currently it's impossible to test InstructionSelect pass with MIR which
is considered illegal by the Legalizer in Assert builds. In early stages
of porting an existing backend from SelectionDAG ISel to GlobalISel,
however, we would have very basic CallLowering, Legalizer, and
RegBankSelect implementations, but rather functional Instruction Select
with quite a few patterns selectable due to the semi-automatic porting
process borrowing them from SelectionDAG ISel.
As we are trying to define legality as a property of being selectable by
the instruction selector, it would be nice to be able to easily check
what the selector can do in its current state w/o the legality check
provided by the Legalizer getting in the way.
It also seems beneficial to have a regression testing set up that would
not allow the selector to silently regress in its support of the MIR not
supported yet by the previous passes in the GlobalISel pipeline.
This commit adds -disable-gisel-legality-check command line option to
llc that disables those legality checks in RegBankSelect and
InstructionSelect passes.
It also adds quite a few MIR test cases for AArch64's Instruction
Selector. Every one of them would fail on the legality check at the
moment, but will select just fine if the check is disabled. Every test
MachineFunction is intended to exercise a specific selection rule and
that rule only, encoded in the MachineFunction's name by the rule's
number, ID, and index of its GIM_Try opcode in TableGen'erated
MatchTable (-optimize-match-table=false).
Reviewers: ab, dsanders, qcolombet, rovka
Reviewed By: bogner
Subscribers: kristof.beyls, volkan, aditya_nandakumar, aemerson,
rengolin, t.p.northover, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42886
llvm-svn: 326396
FailedISel MachineFunction property is part of the CodeGen pipeline
state as much as every other property, notably, Legalized,
RegBankSelected, and Selected. Let's make that part of the state also
serializable / de-serializable, so if GlobalISel aborts on some of the
functions of a large module, but not the others, it could be easily seen
and the state of the pipeline could be maintained through llc's
invocations with -stop-after / -start-after.
To make MIR printable and generally to not to break it too much too
soon, this patch also defers cleaning up the vreg -> LLT map until
ResetMachineFunctionPass.
To make MIR with FailedISel: true also machine verifiable, machine
verifier is changed so it treats a MIR-module as non-regbankselected and
non-selected if there is FailedISel property set.
Reviewers: qcolombet, ab
Reviewed By: dsanders
Subscribers: javed.absar, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42877
llvm-svn: 326343
Currently when abort is enabled, we get a diagnostic saying "Fallback
path used .... " and the program terminates. To actually figure out what
the reason is, we need to run again with another verbose argument
"-pass-remarks-missed=gisel". Instead, when we are going to abort,
we might as well print expensive remarks.
https://reviews.llvm.org/D43796
llvm-svn: 326215
Currently we assert that only non target specific opcodes can have
missing RegisterClass constraints in the MCDesc. The backend can have
instructions with register operands but don't have RegisterClass
constraints (say using unknown_class) in which case the instruction
defining the register will constrain it.
Change the assert to only fire if a def has no regclass.
https://reviews.llvm.org/D43409
llvm-svn: 326142
This makes sure that alloca() function calls properly probe the
stack as needed.
Differential Revision: https://reviews.llvm.org/D42356
llvm-svn: 325433
Summary:
This patch adds templated functions to MachineIRBuilder for some opcodes
and adds pattern matcher support for G_AND and G_OR.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D43309
llvm-svn: 325162
* Document most API's
* Delete a useless function call
* Fix a discrepancy between the single and multi-opcode variants of
getActionDefinitions().
The multi-opcode variant now requires that more than one opcode is requested.
Previously it acted much like the single-opcode form but unnecessarily
enforced the requirements of the multi-opcode form.
llvm-svn: 325067
Until we support extending loads properly we're going to fall back for these.
We already handle stores in the same way, so this is just being consistent.
llvm-svn: 324001
Legal if we have hardware support for floating point, libcalls
otherwise.
Also add the necessary support for libcalls in the legalizer helper.
llvm-svn: 323726
Summary:
Apparently, we missed on constraining register classes of VReg-operands of all the instructions
built from a destination pattern but the root (top-level) one. The issue exposed itself
while selecting G_FPTOSI for armv7: the corresponding pattern generates VTOSIZS wrapped
into COPY_TO_REGCLASS, so top-level COPY_TO_REGCLASS gets properly constrained,
while nested VTOSIZS (or rather its destination virtual register to be exact) does not.
Fixing this by issuing GIR_ConstrainSelectedInstOperands for every nested GIR_BuildMI.
https://bugs.llvm.org/show_bug.cgi?id=35965
rdar://problem/36886530
Patch by Roman Tereshin
Reviewers: dsanders, qcolombet, rovka, bogner, aditya_nandakumar, volkan
Reviewed By: dsanders, qcolombet, rovka
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42565
llvm-svn: 323692
Prior to committing r323681, we decided to change pick() to identity() since
it wasn't clear from the name what pick() did. However, identity() isn't a very
good name either since it implies that no changes are made. For some reason,
naming it changeTo() didn't occur to me until just after the commit. This
should resolve the lack of clarity that pick() had while still implying that
it changes the MIR.
llvm-svn: 323689
Summary:
As discussed in D42244, we have difficulty describing the legality of some
operations. We're not able to specify relationships between types.
For example, declaring the following
setAction({..., 0, s32}, Legal)
setAction({..., 0, s64}, Legal)
setAction({..., 1, s32}, Legal)
setAction({..., 1, s64}, Legal)
currently declares these type combinations as legal:
{s32, s32}
{s64, s32}
{s32, s64}
{s64, s64}
but we currently have no means to say that, for example, {s64, s32} is
not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/
G_UNMERGE_VALUES have relationships between the types that are currently
described incorrectly.
Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics
differently to atomics. The necessary information is in the MMO but we have no
way to use this in the legalizer. Similarly, there is currently no way for the
register type and the memory type to differ so there is no way to cleanly
represent extending-load/truncating-store in a way that can't be broken by
optimizers (resulting in illegal MIR).
It's also difficult to control the legalization strategy. We've added support
for legalizing non-power of 2 types but there's still some hardcoded assumptions
about the strategy. The main one I've noticed is that type0 is always legalized
before type1 which is not a good strategy for `type0 = G_EXTRACT type1, ...` if
you need to widen the container. It will converge on the same result eventually
but it will take a much longer route when legalizing type0 than if you legalize
type1 first.
Lastly, the definition of legality and the legalization strategy is kept
separate which is not ideal. It's helpful to be able to look at a one piece of
code and see both what is legal and the method the legalizer will use to make
illegal MIR more legal.
This patch adds a layer onto the LegalizerInfo (to be removed when all targets
have been migrated) which resolves all these issues.
Here are the rules for shift and division:
for (unsigned BinOp : {G_LSHR, G_ASHR, G_SDIV, G_UDIV})
getActionDefinitions(BinOp)
.legalFor({s32, s64}) // If type0 is s32/s64 then it's Legal
.clampScalar(0, s32, s64) // If type0 is <s32 then WidenScalar to s32
// If type0 is >s64 then NarrowScalar to s64
.widenScalarToPow2(0) // Round type0 scalars up to powers of 2
.unsupported(); // Otherwise, it's unsupported
This describes everything needed to both define legality and describe how to
make illegal things legal.
Here's an example of a complex rule:
getActionDefinitions(G_INSERT)
.unsupportedIf([=](const LegalityQuery &Query) {
// If type0 is smaller than type1 then it's unsupported
return Query.Types[0].getSizeInBits() <= Query.Types[1].getSizeInBits();
})
.legalIf([=](const LegalityQuery &Query) {
// If type0 is s32/s64/p0 and type1 is a power of 2 other than 2 or 4 then it's legal
// We don't need to worry about large type1's because unsupportedIf caught that.
const LLT &Ty0 = Query.Types[0];
const LLT &Ty1 = Query.Types[1];
if (Ty0 != s32 && Ty0 != s64 && Ty0 != p0)
return false;
return isPowerOf2_32(Ty1.getSizeInBits()) &&
(Ty1.getSizeInBits() == 1 || Ty1.getSizeInBits() >= 8);
})
.clampScalar(0, s32, s64)
.widenScalarToPow2(0)
.maxScalarIf(typeInSet(0, {s32}), 1, s16) // If type0 is s32 and type1 is bigger than s16 then NarrowScalar type1 to s16
.maxScalarIf(typeInSet(0, {s64}), 1, s32) // If type0 is s64 and type1 is bigger than s32 then NarrowScalar type1 to s32
.widenScalarToPow2(1) // Round type1 scalars up to powers of 2
.unsupported();
This uses a lambda to say that G_INSERT is unsupported when type0 is bigger than
type1 (in practice, this would be a default rule for G_INSERT). It also uses one
to describe the legal cases. This particular predicate is equivalent to:
.legalFor({{s32, s1}, {s32, s8}, {s32, s16}, {s64, s1}, {s64, s8}, {s64, s16}, {s64, s32}})
In terms of performance, I saw a slight (~6%) performance improvement when
AArch64 was around 30% ported but it's pretty much break even right now.
I'm going to take a look at constexpr as a means to reduce the initialization
cost.
Future work:
* Make it possible for opcodes to share rulesets. There's no need for
G_LSHR/G_ASHR/G_SDIV/G_UDIV to have separate rule and ruleset objects. There's
no technical barrier to this, it just hasn't been done yet.
* Replace the type-index numbers with an enum to get .clampScalar(Type0, s32, s64)
* Better names for things like .maxScalarIf() (clampMaxScalar?) and the vector rules.
* Improve initialization cost using constexpr
Possible future work:
* It's possible to make these rulesets change the MIR directly instead of
returning a description of how to change the MIR. This should remove a little
overhead caused by parsing the description and routing to the right code, but
the real motivation is that it removes the need for LegalizeAction::Custom.
With Custom removed, there's no longer a requirement that Custom legalization
change the opcode to something that's considered legal.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner
Reviewed By: bogner
Subscribers: hintonda, bogner, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42251
llvm-svn: 323681
Summary:
The improvements to the LegalizerInfo discussed in D42244 require that
LegalizerInfo::LegalizeAction be available for use in other classes. As such,
it needs to be moved out of LegalizerInfo. This has been done separately to the
next patch to minimize the noise in that patch.
llvm-svn: 323669
We weren't converting the immediate ConstantFP during legalization, which caused
the wrong bit patterns to be emitted for half type FP constants.
Fixes PR36106.
llvm-svn: 323582
https://reviews.llvm.org/D41373
The various components are
GICombinerHelper contains transformations that are common to all
targets. Targets can pick and choose which transformations (at
function/opcode granularity) each pass uses via configuring a
GICombinerInfo.
GICombiner contains some common code and it does the traversal,
driving of combines, worklist management and iterating until
convergence.
GICombinerInfo is an interface with a virtual method called combine.
The combiner info will allow targets to pick and choose (or
implement their own specific combines). CombineInfos can make
use of available combines in GICombineHelper to configure the
transformations for a particular pass. Currently this approach allows
cherry picking transformations from helpers (at function/opcode
granularity) and also allows early returning on specific
transformations. Targets also get to prioritize whether target specific
combines run before/after the opt-in generic combines. Ideally we would
like this part to be configured by both C++ and Tablegen. The
CombinerInfo also has a field which indicates how to deal with
IllegalOps (ie - should we allow to create them/or legalize them?).
A CombinerPass would configure a CombinerInfo, create the GICombiner
with the Info, and call
GICombiner::combineMachineInstrs(MachineFunction&).
This organization is very similar to the GISelLegalizer.
llvm-svn: 323392
Summary:
`getAction(const InstrAspect &) const` breaks encapsulation by exposing
the smaller components that are used to decide how to legalize an
instruction.
This is a problem because we need to change the implementation of
LegalizerInfo so that it's able to describe particular type combinations
rather than just cartesian products of types.
For example, declaring the following
setAction({..., 0, s32}, Legal)
setAction({..., 0, s64}, Legal)
setAction({..., 1, s32}, Legal)
setAction({..., 1, s64}, Legal)
currently declares these type combinations as legal:
{s32, s32}
{s64, s32}
{s32, s64}
{s64, s64}
but we currently have no means to say that, for example, {s64, s32} is
not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/
G_UNMERGE_VALUES has relationships between the types that are currently
described incorrectly.
Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics
differently to atomics. The necessary information is in the MMO but we have no
way to use this in the legalizer. Similarly, there is currently no way for the
register type and the memory type to differ so there is no way to cleanly
represent extending-load/truncating-store in a way that can't be broken by
optimizers (resulting in illegal MIR).
This patch introduces LegalityQuery which provides all the information
needed by the legalizer to make a decision on whether something is legal
and how to legalize it.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner
Reviewed By: bogner
Subscribers: bogner, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D42244
llvm-svn: 323342
https://reviews.llvm.org/D42402
A lot of these copies are useless (copies b/w VRegs having the same
regclass) and should be cleaned up.
llvm-svn: 323291
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware
support, but only for conversions between float and double.
Also add the necessary boilerplate so that the LegalizerHelper can
introduce the required libcalls. This also works only for float and
double, but isn't too difficult to extend when the need arises.
llvm-svn: 322651
For hard float with VFP4, it is legal. Otherwise, we use libcalls.
This needs a bit of support in the LegalizerHelper for soft float
because we didn't handle G_FMA libcalls yet. The support is trivial, as
the only difference between G_FMA and other libcalls that we already
handle is that it has 3 input operands rather than just 2.
llvm-svn: 322366
Previously the code for handling G_SMULO didn't properly check for the signed
multiply overflow, instead treating it the same as the unsigned G_UMULO.
Fixes PR35800.
llvm-svn: 321690
A call may have an intrinsic name but not have a valid intrinsic ID,
for example with llvm.invariant.group.barrier. If so, treat it as a
normal call like FastISel does.
llvm-svn: 321662
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.
On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.
llvm-svn: 320746
This is due to PR26161 needing to be resolved before we can fix
big endian bugs like PR35359. The work to split aggregates into smaller LLTs
instead of using one large scalar will take some time, so in the mean time
we'll fall back to SDAG.
Some ARM BE tests xfailed for now as a result.
Differential Revision: https://reviews.llvm.org/D40789
llvm-svn: 320388
This patch splits atomics out of the generic G_LOAD/G_STORE and into their own
G_ATOMIC_LOAD/G_ATOMIC_STORE. This is a pragmatic decision rather than a
necessary one. Atomic load/store has little in implementation in common with
non-atomic load/store. They tend to be handled very differently throughout the
backend. It also has the nice side-effect of slightly improving the common-case
performance at ISel since there's no longer a need for an atomicity check in the
matcher table.
All targets have been updated to remove the atomic load/store check from the
G_LOAD/G_STORE path. AArch64 has also been updated to mark
G_ATOMIC_LOAD/G_ATOMIC_STORE legal.
There is one issue with this patch though which also affects the extending loads
and truncating stores. The rules only match when an appropriate G_ANYEXT is
present in the MIR. For example,
(G_ATOMIC_STORE (G_TRUNC:s16 (G_ANYEXT:s32 (G_ATOMIC_LOAD:s16 X))))
will match but:
(G_ATOMIC_STORE (G_ATOMIC_LOAD:s16 X))
will not. This shouldn't be a problem at the moment, but as we get better at
eliminating extends/truncates we'll likely start failing to match in some
cases. The current plan is to fix this in a patch that changes the
representation of extending-load/truncating-store to allow the MMO to describe
a different type to the operation.
llvm-svn: 319691
Summary: LegalizerInfo assumes all G_MERGE_VALUES and G_UNMERGE_VALUES instructions are legal, so it is not possible to legalize vector operations on illegal vector types. This patch fixes the problem by removing the related check and adding default actions for G_MERGE_VALUES and G_UNMERGE_VALUES.
Reviewers: qcolombet, ab, dsanders, aditya_nandakumar, t.p.northover, kristof.beyls
Reviewed By: dsanders
Subscribers: rovka, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D39823
llvm-svn: 319524
G_ATOMICRMW_* is generally legal on AArch64. The exception is G_ATOMICRMW_NAND.
G_ATOMIC_CMPXCHG_WITH_SUCCESS needs to be lowered to G_ATOMIC_CMPXCHG with an
external comparison.
Note that IRTranslator doesn't generate these instructions yet.
llvm-svn: 319466
This is needed for cases when the memory access is not as big as the width of
the data type. For instance, storing i1 (1 bit) would be done in a byte (8
bits).
Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a
size of 0, which for instance makes alias analysis return NoAlias even when
it shouldn't.
There are no tests as this was done as a follow-up to the bugfix for the case
where this was discovered (r318824). This handles more similar cases.
Review: Björn Petterson
https://reviews.llvm.org/D40339
llvm-svn: 319173
LLVM Coding Standards:
Function names should be verb phrases (as they represent actions), and
command-like function should be imperative. The name should be camel
case, and start with a lower case letter (e.g. openFile() or isFoo()).
Differential Revision: https://reviews.llvm.org/D40416
llvm-svn: 319168
TableGen already generates code for selecting a G_FDIV, so we only need
to add a test.
For the legalizer and reg bank select, we do the same thing as for the
other floating point binary operations: either mark as legal if we have
a FP unit or lower to a libcall, and map to the floating point
registers.
llvm-svn: 318915
TableGen already generates code for selecting a G_FMUL, so we only need
to add a test for that part.
For the legalizer and reg bank select, we do the same thing as the other
floating point binary operators: either mark as legal if we have a FP
unit or lower to a libcall, and map to the floating point registers.
llvm-svn: 318910
Instead of asserting that the type sizes are exactly equal, we check
that the new size is big enough to contain the original type.
We have to relax this constrain because, right now, we sometimes
specify that things that are smaller than a storage type are legal
instead of widening everything to the size of a storage type.
E.g., we say that G_AND s16 is legal and we map that on GPR32.
This is something we may revisit in the future (either by changing
the legalization process or keeping track separately of the storage
size and the size of the type), but let us reflect the reality of
the situation for now.
llvm-svn: 318587
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
Summary:
This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV,
causes TableGen to instrument the generated table to collect rule coverage
information. However, LLVM_ENABLE_GISEL_COV goes a bit further than
LLVM_ENABLE_DAGISEL_COV. The information is written to files
(${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be
concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will
read this information and use it to emit warnings about untested rules.
This technique could also be used by SelectionDAG and can be further
extended to detect hot rules and give them priority over colder rules.
Usage:
* Enable LLVM_ENABLE_GISEL_COV in CMake
* Build the compiler and run some tests
* cat gisel-coverage-[0-9]* > gisel-coverage-all
* Delete lib/Target/*/*GenGlobalISel.inc*
* Build the compiler
Known issues:
* ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual
step due to a lack of a portable 'cat' command. It should be the
concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files.
* There's no mechanism to discard coverage information when the ruleset
changes
Depends on D39742
Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka
Reviewed By: rovka
Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D39747
llvm-svn: 318356
artifacts along with DCE
Legalization Artifacts are all those insts that are there to make the
type system happy. Currently, the target needs to say all combinations
of extends and truncs are legal and there's no way of verifying that
post legalization, we only have *truly* legal instructions. This patch
changes roughly the legalization algorithm to process all illegal insts
at one go, and then process all truncs/extends that were added to
satisfy the type constraints separately trying to combine trivial cases
until they converge. This has the added benefit that, the target
legalizerinfo can only say which truncs and extends are okay and the
artifact combiner would combine away other exts and truncs.
Updated legalization algorithm to roughly the following pseudo code.
WorkList Insts, Artifacts;
collect_all_insts_and_artifacts(Insts, Artifacts);
do {
for (Inst in Insts)
legalizeInstrStep(Inst, Insts, Artifacts);
for (Artifact in Artifacts)
tryCombineArtifact(Artifact, Insts, Artifacts);
} while(!Insts.empty());
Also, wrote a simple wrapper equivalent to SetVector, except for
erasing, it avoids moving all elements over by one and instead just
nulls them out.
llvm-svn: 318210
Allow a pattern rewriter to be installed in CodeGenDAGPatterns and use it to
correct situations where SelectionDAG and GlobalISel disagree on
representation. For example, it would rewrite:
(sextload:i32 $ptr)<<unindexedload>><<sextload>><<sextloadi16>
to:
(sext:i32 (load:i16 $ptr)<<unindexedload>>)
I'd have preferred to replace the fragments and have the expansion happen
naturally as part of PatFrag expansion but the type inferencing system can't
cope with loads of types narrower than those mentioned in register classes.
This is because the SDTCisInt's on the sext constrain both the result and
operand to the 'legal' integer types (where legal is defined as 'a register
class can contain the type') which immediately rules the narrower types out.
Several targets (those with only one legal integer type) would then go on to
crash on the SDTCisOpSmallerThanOp<> when it removes all the possible types
for the result of the extend.
Also, improve isObviouslySafeToFold() slightly to automatically return true for
neighbouring instructions. There can't be any re-ordering problems if
re-ordering isn't happenning. We'll need to improve it further to handle
sign/zero-extending loads when the extend and load aren't immediate neighbours
though.
llvm-svn: 317971
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.
llvm-svn: 317647
This changes the interface of how targets describe how to legalize, see
the below description.
1. Interface for targets to describe how to legalize.
In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.
For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel worked before. For example, for a
target that supports 32 and 64 bit adds natively:
for (auto Ty : {s32, s64})
setAction({G_ADD, 0, s32}, Legal);
or for a target that needs a library call for a 32 bit division:
setAction({G_SDIV, s32}, Libcall);
The main conceptual change to the LegalizerInfo API, is in specifying
how to legalize the type sizes for which a change of size is needed. For
example, in the above example, how to specify how all types from i1 to
i8388607 (apart from s32 and s64 which are legal) need to be legalized
and expressed in terms of operations on the available legal sizes
(again, i32 and i64 in this case). Before, the implementation only
allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar). A worse limitation was that if you'd wanted to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.
Instead, the legalization actions that need to change the size of the
type are specified now using a "SizeChangeStrategy". For example:
setLegalizeScalarToDifferentSizeStrategy(
G_ADD, 0, widenToLargerAndNarrowToLargest);
This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.
Another example, taken from the ARM backend is:
for (unsigned Op : {G_SDIV, G_UDIV}) {
setLegalizeScalarToDifferentSizeStrategy(Op, 0,
widenToLargerTypesUnsupportedOtherwise);
if (ST.hasDivideInARMMode())
setAction({Op, s32}, Legal);
else
setAction({Op, s32}, Libcall);
}
For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).
The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:
setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
setLegalizeVectorElementToDifferentSizeStrategy(
G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);
As currently implemented here, vector types are legalized by first
making the vector element size legal, followed by then making the number
of lanes legal. The strategy to follow in the first step is set by a
call to setLegalizeVectorElementToDifferentSizeStrategy, see example
above. The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see code for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.
Therefore, for the above specification, some example legalizations are:
* getAction({G_ADD, LLT::vector(3, 3)})
returns {WidenScalar, LLT::vector(3, 8)}
* getAction({G_ADD, LLT::vector(3, 8)})
then returns {MoreElements, LLT::vector(8, 8)}
* getAction({G_ADD, LLT::vector(20, 8)})
returns {FewerElements, LLT::vector(16, 8)}
2. Key implementation aspects.
How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:
setScalarAction({G_ADD, LLT:scalar(1)},
{{1, WidenScalar}, // bit sizes [ 1, 31[
{32, Legal}, // bit sizes [32, 33[
{33, WidenScalar}, // bit sizes [33, 64[
{64, Legal}, // bit sizes [64, 65[
{65, NarrowScalar} // bit sizes [65, +inf[
});
Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is currently missing, this is just trying to
make it possible for targets to specify what is legal, and how non-legal
types should be legalized. Probably quite a bit of further work is
needed in the actual legalizing and the other passes in GlobalISel to
support non-power-of-2 sized types.
I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.
This drops the need for LLT::{half,double}...Size().
Differential Revision: https://reviews.llvm.org/D30529
llvm-svn: 317560
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
Summary: Make sure shifts are legal/specified by the legalizerinfo before creating it
Reviewers: qcolombet, dsanders, rovka, t.p.northover
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39264
llvm-svn: 316602
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.
At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.
The previous commit failed on MSVC due to a failure to convert an
initializer_list to a std::vector. Hopefully, MSVC will accept this version.
Depends on D37457
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37458
llvm-svn: 315887
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.
At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.
Depends on D37457
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37458
llvm-svn: 315885
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.
This patch adds support for this in order to enable the import of load/store
patterns.
Depends on D37445
Hopefully fixed the ambiguous constructor that a large number of bots reported.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D37456
llvm-svn: 315869
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.
This patch adds support for this in order to enable the import of load/store
patterns.
Depends on D37445
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D37456
llvm-svn: 315863
TargetRegisterInfo::getMinimalPhysRegClass is actually pretty expensive
because it has to iterate over all the register classes.
Cache this information as we need and get it so that we limit its usage.
Right now, we heavily rely on it, because this is how we get the mapping
for vregs defined by copies from physreg (i.e., the one that are ABI
related).
Improve compile time by up to 10% for that pass.
NFC
llvm-svn: 315759
Prior to this patch we used to create SetVectors in temporaries that
were created and destroyed for each instruction. Now, instead we create
and destroyed them only once, but clear the content for each
instruction.
This speeds up the pass by ~25%.
NFC.
llvm-svn: 315756
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.
Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.
Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.
Differential Revision: https://reviews.llvm.org/D38406
llvm-svn: 315590
We end up creating COPY's that are either truncating/extending and this
should be illegal.
https://reviews.llvm.org/D37640
Patch for X86 and ARM by igorb, rovka
llvm-svn: 315240
r313390 taught 'allowExtraAnalysis' to check whether remarks are
enabled at all. Use that to only do the expensive instruction printing
if they are.
llvm-svn: 313552
Added a combiner which can clean up truncs/extends that are created in
order to make the types work during legalization.
Also moved the combineMerges to the LegalizeCombiner.
https://reviews.llvm.org/D36880
llvm-svn: 312158
Since the lambda isn't escaped (via a std::function or similar) it's
fine/better to use default capture-by-ref to provide semantics similar
to language-level nested scopes (if/for/while/etc).
llvm-svn: 311782
G_PHI has the same semantics as PHI but also has types.
This lets us verify that the types in the G_PHI are consistent.
This also allows specifying legalization actions for G_PHIs.
https://reviews.llvm.org/D36990
llvm-svn: 311596
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.
llvm-svn: 309990
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951
llvm-svn: 309426
If the localizer pass puts one of its constants before the label that tells the
unwinder "jump here to handle your exception" then control-flow will skip it,
leaving uninitialized registers at runtime. That's bad.
llvm-svn: 308687
Treat widening G_SREM and G_UREM the same as G_SDIV and G_UDIV. This is
going to be used in the ARM backend (and that's when the test will come
too).
llvm-svn: 308278
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
This covers both hard and soft float.
Hard float is easy, since it's just Legal.
Soft float is more involved, because there are several different ways to
handle it based on the predicate: one and ueq need not only one, but two
libcalls to get a result. Furthermore, we have large differences between
the values returned by the AEABI and GNU functions.
AEABI functions return a nice 1 or 0 representing true and respectively
false. GNU functions generally return a value that needs to be compared
against 0 (e.g. for ogt, the value returned by the libcall is > 0 for
true). We could introduce redundant comparisons for AEABI as well, but
they don't seem easy to remove afterwards, so we do different processing
based on whether or not the result really needs to be compared against
something (and just truncate if it doesn't).
llvm-svn: 307243
Summary:
Also, made a few minor tweaks to shave off a little more cumulative memory consumption:
* All rules share a single NewMIs instead of constructing their own. Only one
will end up using it.
* Use MIs.resize(1) instead of MIs.clear();MIs.push_back(I) and prevent
GIM_RecordInsn from changing MIs[0].
Depends on D33764
Reviewers: rovka, vitalybuka, ab, t.p.northover, qcolombet, aditya_nandakumar
Reviewed By: ab
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D33766
llvm-svn: 307159
We used to have a helper that replaced an instruction with a libcall.
That turns out to be too aggressive, since sometimes we need to replace
the instruction with at least two libcalls. Therefore, change our
existing helper to only create the libcall and leave the instruction
removal as a separate step. Also rename the helper accordingly.
llvm-svn: 307149
Add a helper for building simple binary ops like add, mul, sub, and.
This can be used in the future for quickly adding support for or, xor.
llvm-svn: 307139
Summary:
This further improves the compile-time regressions that will be caused by a
re-commit of r303259.
Also added included preliminary work in preparation for the multi-insn emitter
since I needed to change the relevant part of the API for this patch anyway.
Depends on D33758
Reviewers: rovka, vitalybuka, ab, t.p.northover, qcolombet, aditya_nandakumar
Reviewed By: ab
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D33764
llvm-svn: 307133
Summary:
Replace the matcher if-statements for each rule with a state-machine. This
significantly reduces compile time, memory allocations, and cumulative memory
allocation when compiling AArch64InstructionSelector.cpp.o after r303259 is
recommitted.
The following patches will expand on this further to fully fix the regressions.
Reviewers: rovka, ab, t.p.northover, qcolombet, aditya_nandakumar
Reviewed By: ab
Subscribers: vitalybuka, aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33758
llvm-svn: 307079
It looks like there are two target-independent but not GISel instructions that
need legalization, IMPLICIT_DEF and PHI. These are already anomalies since
their operands have important LLTs attached, so to make things more uniform it
seems like a good idea to add generic variants. Starting with G_IMPLICIT_DEF.
llvm-svn: 306875
In r301116, a custom lowering needed to be introduced to be able to
legalize 8 and 16-bit divisions on ARM targets without a division
instruction, since 2-step legalization (WidenScalar from 8 bit to 32
bit, then Libcall the 32-bit division) doesn't work.
This fixes this and makes this kind of multi-step legalization, where
first the size of the type needs to be changed and then some action is
needed that doesn't require changing the size of the type,
straighforward to specify.
Differential Revision: https://reviews.llvm.org/D32529
llvm-svn: 306806
Without this check, COPY instructions can actually be one of the generic casts
in disguise. That's confusing and bad.
At some point during ISel this restriction has to be relaxed since the fully
selected instructions will usually use COPY for those purposes. Right now I
think it's possible that relaxation occurs during RegBankSelect (hence the
change there). I'm not convinced that's where it belongs long-term though.
llvm-svn: 306470
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.
llvm-svn: 306120
G_SEQUENCE is going away soon so as a first step the MachineIRBuilder needs to
be taught how to emulate it with alternatives. We use G_MERGE_VALUES where
possible, and a sequence of G_INSERTs if not.
llvm-svn: 306119
Summary:
As part of this
* Emitted instructions now have named MachineInstr variables associated
with them. This isn't particularly important yet but it's a small step
towards multiple-insn emission.
* constrainSelectedInstRegOperands() is no longer hardcoded. It's now added
as the ConstrainOperandsToDefinitionAction() action. COPY_TO_REGCLASS uses
an alternate constraint mechanism ConstrainOperandToRegClassAction() which
supports arbitrary constraints such as that defined by COPY_TO_REGCLASS.
Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls, aditya_nandakumar
Reviewed By: ab
Subscribers: javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D33590
llvm-svn: 305791
Summary:
In some cases legalization ends up with not symmetric merge/unmerge nodes.
Transform it to merge/unmerge nodes.
Reviewers: t.p.northover, qcolombet, zvi
Reviewed By: t.p.northover
Subscribers: rovka, kristof.beyls, guyblank, llvm-commits
Differential Revision: https://reviews.llvm.org/D33626
llvm-svn: 305783
Add support for modulo for targets that have hardware division and for
those that don't. When hardware division is not available, we have to
choose the correct libcall to use. This is generally straightforward,
except for AEABI.
The AEABI variant is trickier than the other libcalls because it
returns { quotient, remainder }, instead of just one value like the
other libcalls that we've seen so far. Therefore, we need to use custom
lowering for it. However, we don't want to have too much special code,
so we refactor the target-independent code in the legalizer by adding a
helper for replacing an instruction with a libcall. This helper is used
by the legalizer itself when dealing with simple calls, and also by the
custom ARM legalization for the more complicated AEABI divmod calls.
llvm-svn: 305459
Summary:
When legalizing G_LOAD/G_STORE using NarrowScalar, we should avoid emitting
%0 = G_CONSTANT ty 0
%1 = G_GEP %x, %0
since it's cheaper to not emit the redundant instructions than it is to fold them
away later.
Reviewers: qcolombet, t.p.northover, ab, rovka, aditya_nandakumar, kristof.beyls
Reviewed By: qcolombet
Subscribers: javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D32746
llvm-svn: 305340
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
There is no guarantee that the first use of a constant that is traversed
is actually the first in the related basic block. Thus, if we use that
as the insertion point we may end up with definitions that don't
dominate there use.
llvm-svn: 304244