Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
This reverts commit b3297ef051.
This change is incorrect. The current semantic of null in the IR is a
pointer with the bitvalue 0. It is not a cast from an integer 0, so
this should preserve the pointer type.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
Summary:
Add new generic MIR opcodes G_SADDSAT etc. Add support in IRTranslator
for translating the saturating add/subtract intrinsics to the new
opcodes.
Reviewers: aemerson, dsanders, paquette, arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76600
https://reviews.llvm.org/D67133
While investigating some non determinism (CSE doesn't produce wrong
code, it just doesn't CSE some times) in GISel CSE on an out of tree
target, I realized that the core issue was that there were lots of code
that mutates (setReg, setRegClass etc), but doesn't notify observers
(CSE in this case but this could be any other observer). In order to
make the Observer be available in various parts of code and to avoid
having to thread it through various API, the MachineFunction now has the
observer as field. This allows it to be easily used in helper functions
such as constrainOperandRegClass.
Also added some invariant verification method in CSEInfo which can
catch these issues (when CSE is enabled).
This is a one off special case, since actually implementing full inline asm
support will be much more involved. This lets us compile a lot more code as a
common simple case.
Differential Revision: https://reviews.llvm.org/D74201
This reverts commit ed29dbaafa.
I'm backing out D68945, which as the discussion for D73526 shows, doesn't
seem to handle the -O0 path through the codegen backend correctly. I'll
reland the patch when a fix is worked out, apologies for all the churn.
The two parent commits are part of this revert too.
Conflicts:
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
llvm/test/DebugInfo/X86/dbg-addr-dse.ll
SelectionDAGBuilder conflict is due to a nearby change in e39e2b4a79
that's technically unrelated. dbg-addr-dse.ll conflicted because
41206b61e3 (legitimately) changes the order of two lines.
There are further modifications to dbg-value-func-arg.ll: it landed after
the patch being reverted, and I've converted indirection to be represented
by the isIndirect field rather than DW_OP_deref.
We can have geps that have a scalar base pointer, and a vector index value, which
means that the base pointer must be splatted into a vector of pointers.
This fixes crashes on arm64 GlobalISel with optimizations enabled.
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
We're planning to remove the shufflemask operand from ShuffleVectorInst
(D72467); fix GlobalISel so it doesn't depend on that Constant.
The change to prelegalizercombiner-shuffle-vector.mir happens because
the input contains a literal "-1" in the mask (so the parser/verifier
weren't really handling it properly). We now treat it as equivalent to
"undef" in all contexts.
Differential Revision: https://reviews.llvm.org/D72663
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).
Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.
Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.
This matches what was done for X86 in 6bf108d77a.
Differential Revision: https://reviews.llvm.org/D71721
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.
Reviewers: sdesmalen, efriedma, fhahn, aemerson
Reviewed By: efriedma
Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70496
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD
Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69734
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
This patch kills off a significant user of the "IsIndirect" field of
DBG_VALUE machine insts. Brought up in in PR41675, IsIndirect is
techncally redundant as it can be expressed by the DIExpression of a
DBG_VALUE inst, and it isn't helpful to have two ways of expressing
things.
Rather than setting IsIndirect, have DBG_VALUE creators add an extra deref
to the insts DIExpression. There should now be no appearences of
IsIndirect=True from isel down to LiveDebugVariables / VirtRegRewriter,
which is ensured by an assertion in LDVImpl::handleDebugValue. This means
we also get to delete the IsIndirect handling in LiveDebugVariables. Tests
can be upgraded by for example swapping the following IsIndirect=True
DBG_VALUE:
DBG_VALUE $somereg, 0, !123, !DIExpression(DW_OP_foo)
With one where the indirection is in the DIExpression, by _appending_
a deref:
DBG_VALUE $somereg, $noreg, !123, !DIExpression(DW_OP_foo, DW_OP_deref)
Which both mean the same thing.
Most of the test changes in this patch are updates of that form; also some
changes in how the textual assembly printer handles these insts.
Differential Revision: https://reviews.llvm.org/D68945
llvm-svn: 374877
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374784
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
The CmpInst::getType() calls can be replaced by just using User::getType() that it was dyn_cast from, and we then need to assert that any default predicate cases came from the CmpInst.
llvm-svn: 374716
SelectionDAG has a bunch of machinery to defer this to selection time
for some reason. Just directly emit a copy during IRTranslator. The
x86 usage does somewhat questionably check hasFP, which could depend
on the whole function being at minimum translated.
This does lose the convergent bit if the callsite had it, which may be
a problem. We also lose that in general for intrinsics, which may also
be a problem.
llvm-svn: 373294
This adds support for lowering variadic musttail calls. To do this, we have
to...
- Detect a musttail call in a variadic function before attempting to lower the
call's formal arguments. This is done in the IRTranslator.
- Compute forwarded registers in `lowerFormalArguments`, and add copies for
those registers.
- Restore the forwarded registers in `lowerTailCall`.
Because there doesn't seem to be any nice way to wrap these up into the outgoing
argument handler, the restore code in `lowerTailCall` is done separately.
Also, irritatingly, you have to make sure that the registers don't overlap with
any passed parameters. Otherwise, the scheduler doesn't know what to do with the
extra copies and asserts.
Add call-translator-variadic-musttail.ll to test this. This is pretty much the
same as the X86 musttail-varargs.ll test. We didn't have as nice of a test to
base this off of, but the idea is the same.
Differential Revision: https://reviews.llvm.org/D68043
llvm-svn: 373226
We need to propagate this information from the IR in order to be able to safely
do tail call optimizations on the intrinsics during legalization. Assuming
it's safe to do tail call opt without checking for the marker isn't safe because
the mem libcall may use allocas from the caller.
This adds an extra immediate operand to the end of the intrinsics and fixes the
legalizer to handle it.
Differential Revision: https://reviews.llvm.org/D68151
llvm-svn: 373140
We were miscompiling switch value comparisons with the wrong signedness, which
shows up when we have things like switch case values with i1 types, which end up
being legalized incorrectly.
Fixes PR43383
llvm-svn: 372675
We currently always set the HasCalls on MFI during translation and legalization if
we're handling a call or legalizing to a libcall. However, if that call is later
optimized to a tail call then we don't need the flag. The flag being set to true
causes frame lowering to always save and restore FP/LR, which adds unnecessary code.
This change does the same thing as SelectionDAG and ports over some code that scans
instructions after selection, using TargetInstrInfo to determine if target opcodes
are known calls.
Code size geomean improvements on CTMark:
-O0 : 0.1%
-Os : 0.3%
Differential Revision: https://reviews.llvm.org/D67868
llvm-svn: 372443
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
This fixes a crash in tail call translation caused by assume and lifetime_end
intrinsics.
It's possible to have instructions other than a return after a tail call which
will still have `Analysis::isInTailCallPosition` return true. (Namely,
lifetime_end and assume intrinsics.)
If we emit a tail call, we should stop translating instructions in the block.
Otherwise, we can end up emitting an extra return, or dead instructions in
general. This makes the verifier unhappy, and is generally unfortunate for
codegen.
This also removes the code from AArch64CallLowering that checks if we have a
tail call when lowering a return. This is covered by the new code now.
Also update call-translator-tail-call.ll to show that we now properly tail call
in the presence of lifetime_end and assume.
Differential Revision: https://reviews.llvm.org/D67415
llvm-svn: 371572
This change moves the actual stack pointer manipulation into the legalizer,
available to targets via lower(). The codegen is slightly different because
we're using explicit masks instead of G_PTRMASK, and using G_SUB rather than
adding a negative amount via G_GEP.
Differential Revision: https://reviews.llvm.org/D66678
llvm-svn: 370104
https://reviews.llvm.org/D66077
The value passed into dbg.value may relate to multiple registers,
each of which need a DBG_VALUE.
This fix calls MIRBuilder.buildDirectDbgValue for each register.
Without this, IR passed in from flang-compiler/flang may fail an
assertion in getOrCreateVReg.
Patch by : peterwaller-arm.
llvm-svn: 369403
Summary:
This patch adds G_GEP to `shouldCSEOpc` so that it can be CSEd. It also refactors
`translateGetElementPtr` by replacing `createGenericVirtualRegister` calls with types.
Reviewers: aditya_nandakumar, arsenm, dsanders, paquette, aemerson
Reviewed By: aditya_nandakumar
Subscribers: wdng, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66316
llvm-svn: 369070
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Currently shufflemasks get emitted as any other constant, and you end
up with a bunch of virtual registers of G_CONSTANT with a
G_BUILD_VECTOR. The AArch64 selector then asserts on anything that
doesn't fit this pattern. This isn't an ideal representation, and
should avoid legalization and have fewer opportunities for a
representational error.
Rather than invent a new shuffle mask operand type, similar to what
ShuffleVectorSDNode does, just track the original IR Constant mask
operand. I don't completely like the idea of adding another link to
the IR, but MIR is already quite dependent on IR constants already,
and this will allow sharing the shuffle mask utility functions with
the IR.
llvm-svn: 368704
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jfb, jakehehrlich
Reviewed By: jfb
Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65514
llvm-svn: 367828
I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't
do that unless we delay lowering to actual function calls. This patch changes
the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and
then have each target specify that using the new custom legalizer for intrinsics
hook that they want it expanded it a libcall.
Differential Revision: https://reviews.llvm.org/D64895
llvm-svn: 366516
Fix stack-use-after-scope errors from r364512. One instance was already
fixed in r364611 - this patch simplifies that fix and addresses one more
instance of similar code.
Discussed in: https://reviews.llvm.org/D63905
llvm-svn: 364778
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.
llvm-svn: 364613
This patch intends to fix ASAN stack-use-after-scope error.
This is at least a short-term fix to unbreak LLVM's mainline.
Differential Revision: https://reviews.llvm.org/D63905
llvm-svn: 364611
Remove the last use of packRegs from IRTranslator and delete
pack/unpackRegs. This introduces a fallback to DAGISel for intrinsics
with aggregate arguments, since we don't have a testcase for them so
it's hard to tell how we'd want to handle them.
Discussed in https://reviews.llvm.org/D63551
llvm-svn: 364514
Change the interface of CallLowering::lowerCall to accept several
virtual registers for each argument, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63551
llvm-svn: 364512
Change the interface of CallLowering::lowerCall to accept several
virtual registers for the call result, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63550
llvm-svn: 364511
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.
With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.
AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.
Mips doesn't support aggregates yet, so it's also NFC.
x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.
Differential Revision: https://reviews.llvm.org/D63549
llvm-svn: 364510
Allow CallLowering::ArgInfo to contain more than one virtual register.
This is useful when passes split aggregates into several virtual
registers, but need to also provide information about the original type
to the call lowering. Used in follow-up patches.
Differential Revision: https://reviews.llvm.org/D63548
llvm-svn: 364509
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.
Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.
For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.
There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.
Actual arm64 support for selection of jump tables is coming in a later patch.
Differential Revision: https://reviews.llvm.org/D63169
llvm-svn: 364085
Summary:
All the GlobalISel passes are initialized when the target calls
initializeGlobalISel(), so we don't need to call the initializers
from the pass constructors.
Reviewers: qcolombet, t.p.northover, paquette, dsanders, aemerson, aditya_nandakumar
Reviewed By: aemerson
Subscribers: rovka, kristof.beyls, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63235
llvm-svn: 363642
Summary: This case is related to D63405 in that we need to be propagating FMF on negates.
Reviewers: volkan, spatel, arsenm
Reviewed By: arsenm
Subscribers: wdng, javed.absar
Differential Revision: https://reviews.llvm.org/D63458
llvm-svn: 363631
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.
I fixed the same bug in SelectionDAG in r287593.
llvm-svn: 363580
Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which
lets us use the vreg def assuming it dominates all other users. However, it can
cause jumpy debug behaviour since the DebugLoc attached to these MIs are from
a user instruction that could be in a different block.
Fixes PR40887.
Differential Revision: https://reviews.llvm.org/D63286
llvm-svn: 363331
If the source is undef, then just don't do anything.
This matches SelectionDAG's behaviour in SelectionDAG.cpp.
Also add a test showing that we do the right thing here.
(irtranslator-memfunc-undef.ll)
Differential Revision: https://reviews.llvm.org/D63095
llvm-svn: 362989
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.
llvm-svn: 361608
When breaking up loads and stores of aggregates, the IRTranslator uses
LLT::scalar(64) for the index type of the G_GEP instructions that
compute the addresses. This is unnecessarily large for 32-bit targets.
Use the int ptr type provided by the DataLayout instead.
Note that we're already doing the right thing when translating
getelementptr instructions from the IR. This is just an oversight when
generating new ones while translating loads/stores.
Both x86 and AArch64 already have tests confirming that the old
behaviour is preserved for 64-bit targets.
Differential Revision: https://reviews.llvm.org/D61852
llvm-svn: 360656
We use to incorrectly use the store size instead of the alloc size when
creating the stack slot for allocas.
On aarch64 this can be demonstrated by allocating weirdly sized types.
For instance, in the added test case, we use an alloca for i19. We used
to allocate a slot of size 24-bit (19 rounded up to the next byte),
whereas we really want to use a full 32-bit slot for this type.
llvm-svn: 359856
Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update
arm64-irtranslator.ll.
Differential Revision: https://reviews.llvm.org/D60922
llvm-svn: 359203
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.
This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.
I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.
Compile time:
Program base cse diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8%
test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7%
test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0%
Geomean difference -0.2%
Code size:
Program base cse diff
test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0%
Geomean difference -0.9%
Differential Revision: https://reviews.llvm.org/D60580
llvm-svn: 358369
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.
This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.
llvm-svn: 358368
Call lowering should use this directly instead of going through the
EVT version, but more work is needed to deal with this (mostly the
passing of the IR type pointer instead of the relevant properties in
ArgInfo).
llvm-svn: 358111
This is consistent with what SelectionDAG does and is much easier to
work with than the extract sequence with an artificial wide register.
For the AMDGPU control flow intrinsics, this was producing an s128 for
the i64, i1 tuple return. Any legalization that should apply to a real
s128 value would badly obscure the direct values that need to be seen.
llvm-svn: 356147
Instead of only having this code work for unary intrinsics, have it work for
an arbitrary number of parameters.
Factor out the cases that fall under this (fma, pow).
This makes it a bit easier to add more intrinsics which don't require any
special work.
Differential Revision: https://reviews.llvm.org/D58079
llvm-svn: 353863
This teaches the IRTranslator to emit G_BSWAP when it runs into
Intrinsic::bswap. This allows us to select G_BSWAP for non-vector types in
AArch64.
Add a select-bswap.mir test, and add global isel checks to a couple existing
tests in test/CodeGen/AArch64.
This doesn't handle every bswap case, since some of these rely on known bits
stuff. This just lets us handle the naive case.
Differential Revision: https://reviews.llvm.org/D58081
llvm-svn: 353861
After the changes introduced in r353586, this instruction doesn't cause any
issues for any backend.
Original review: https://reviews.llvm.org/D57485
llvm-svn: 353720
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563