The post-index matcher, before it queries the target legality, walks uses
of some instructions which in pathological cases can be massive. Since
no targets actually support indexed loads yet, disable this to stop wasting
compile time on something which is going to fail anyway.
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.
For e.g.
svint32_t bar(svint32_t);
svint32_t foo(svint32_t x, bool *err) {
try { bar(x); } catch (...) { *err = true; }
return x;
}
`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84737
This is needed for an upcoming change to how we translate conditional branches
which might generate these.
Differential Revision: https://reviews.llvm.org/D86383
With gcc 6.3.0, I hit the following compilation error:
../lib/CodeGen/GlobalISel/Combiner.cpp: In member function
‘bool llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&,
llvm::GISelCSEInfo*)’:
../lib/CodeGen/GlobalISel/Combiner.cpp:156:54: error: suggest parentheses
around ‘&&’ within ‘||’ [-Werror=parentheses]
assert(!CSEInfo || !errorToBool(CSEInfo->verify()) &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
"CSEInfo is not consistent. Likely missing calls to "
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"observer on mutations");
Fix the code as suggested by the compiler.
https://reviews.llvm.org/D83833
Patch adds two new GICombinerRules for G_SELECT. The rules include:
combining selects with undef comparisons into their first selectee value,
and to combine away selects with constant comparisons. Patch additionally
adds a new combiner test for the AArch64 target to test these new G_SELECT
combiner rules and the existing select_same_val combiner rule.
Patch by mkitzan
https://reviews.llvm.org/D86676
Sometimes we can have the following code
x:gpr(s32) = G_OP
Say we build G_OP2 to the same x and then delete the previous instruction. Using something like
Register X = ...;
auto NewMIB = CSEBuilder.buildOp2(X, ... args);
Currently there's a mismatch in how NewMIB is profiled and inserted into the CSEMap (ie it doesn't consider register bank/register class along with type).Unify the profiling by refactoring and calling the common method.
This was found by turning on the CSEInfo::verify in at the end of each of our GISel passes which turns inconsistent state/non determinism in CSEing into crashes which likely usually indicates missing calls to Observer on mutations (the most common case). Here non determinism usually means not cseing sometimes, but almost never about producing incorrect code.
Also this patch adds this verification at the end of the combiners as well.
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
This produces less work for addressing mode matching. I think this is
safe since I don't think machine IR is supposed to give the same
aliasing properties as getelementptr in the IR.
shl ([sza]ext x, y) => zext (shl x, y).
Turns expensive 64 bit shifts into 32 bit if it does not overflow the
source type:
This is a port of an AMDGPU DAG combine added in
5fa289f0d8. InstCombine does this
already, but we need to do it again here to apply it to shifts
introduced for lowered getelementptrs. This will help matching
addressing modes that use 32-bit offsets in a future patch.
TableGen annoyingly assumes only a single match data operand, so
introduce a reusable struct. However, this still requires defining a
separate GIMatchData for every combine which is still annoying.
Adds a morally equivalent function to the existing
getShiftAmountTy. Without this, we would have to do try to repeatedly
query the legalizer info and guess at what type to use for the shift.
Known bits for G_ANYEXT was incorrectly using KnownBits::zext, causing
us to treat the high bits as zero even though they're (by definition)
unknown.
Differential Revision: https://reviews.llvm.org/D86323
The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear
before the landingpad instructions, resulting in a fallback to SelectionDAG.
This change relaxes the check to allow PHI nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86141
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.
This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.
In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.
e.g.
```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```
Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.
Differential Revision: https://reviews.llvm.org/D85463
It's annoying to have to maintain multiple, nearly identical chains of if
statements which all set the same attributes.
Add a helper function, `addFlagsUsingAttrFn` which performs the attribute
setting.
Then, use wrappers for that function in `lowerCall` and `setArgFlags`.
(Note that the flag-setting code in `setArgFlags` was missing the returned
attribute. There's no selection for this yet, so no test. It's an example of
the kind of thing this lets us avoid, though.)
Differential Revision: https://reviews.llvm.org/D86159
Similar to this commit:
faf8065a99
Testcase is pretty much the same as
test/CodeGen/AArch64/tailcall-explicit-sret.ll
Except it uses i64 (since we don't handle the i1024 return values yet), and
doesn't have indirect tail call testcases (because we can't translate those
yet).
Differential Revision: https://reviews.llvm.org/D86148
This is restricted to single use loads, which if we fold to sextloads we can
find more optimal addressing modes on AArch64.
This also fixes an overload the MachineFunction::getMachineMemOperand() method
which was incorrectly using the MF alignment instead of the MMO alignment.
Differential Revision: https://reviews.llvm.org/D85966
By detecting this sign extend pattern early, we can uncover opportunities for
more optimizations.
Differential Revision: https://reviews.llvm.org/D85965
We weren't looking through the parameters on calls at all.
E.g., say you had
```
declare i32 @zext(i32 zeroext %x)
...
%y = call i32 @zext(i32 %something)
...
```
At the point of the call, we wouldn't know that the %something should have the
zeroext attribute.
This sets flags in about the same way as
TargetLoweringBase::ArgListEntry::setAttributes.
Differential Revision: https://reviews.llvm.org/D86125
Some of the lower implementations were relying on this, however the
type was not set depending on which form .lower* helper form you were
using. For instance, if you used an unconditonal lower(), the type was
never set. Most of the lower actions do not benefit from a type
parameter, and just expand in terms of the original operation's types.
However, some lowerings could benefit from an additional type hint to
combine a promotion and an expansion. An example of this is for
add/sub sat. The DAG integer legalization tries to use smarter
expansions directly when promoting the integer type, and doesn't
always produce the same instruction with a wider type.
Treat this as an optional hint argument, that only means something for
specific lower actions. It may be useful to generalize this mechanism
to pass a full list of type indexes and desired types, but I haven't
run into a case like that yet.
These should really match either G_BUILD_VECTOR or
G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing
mechanism for matching alternative opcodes. There is GIM_SwitchOpcode,
but it seems to assume it's oly only used for matcher optimization.
I could also omit any opcode check and rely on the matcher directly
checking the opcode, but the table optimizer currently assumes there
has to be an opcode check.
Also doesn't try to handle undef elements like the DAG version.
This is mostly a straight port from SelectionDAG. We re-use the actual bit-test
analysis part from SwitchLoweringUtils, which was factored out earlier to
support jump-tables.
Differential Revision: https://reviews.llvm.org/D85233
This implements
```
(logic_op (op x...), (op y...)) -> (op (logic_op x, y))
```
when `op` is an extend, a shift, or an and.
This is similar to `DAGCombiner::hoistLogicOpWithSameOpcodeHands`
(with a bunch of missing cases, e.g. G_TRUNC, G_BITCAST, etc.)
This is implemented so it works both pre and post-legalization.
This also adds a general way to add a series of instructions in a combine.
(`applyBuildInstructionSteps`).
Differential Revision: https://reviews.llvm.org/D85050
This mirrors the support for the equivalent extracts. This also
creates a huge mess that would be greatly improved if we had any bit
operation combines.
Use the same basic strategy as LegalizeVectorTypes. Try to index into
smaller pieces if there's a constant index, and otherwise fall back to
a stack temporary.
Add given input and mark it as tied.
Doesn't create additional copy compared to
matching input constraint to virtual register.
Differential Revision: https://reviews.llvm.org/D85122
Get the argument register and ensure there's a copy to the virtual
register. AMDGPU and AArch64 have similarish code to get the livein
value, and I also want to use this in multiple places.
This is a bit more aggressive about setting the register class than
the original function, but that's probably OK.
I think we're missing a few verifier checks for function live ins. I
noticed AArch64's calling convention code is not actually adding
liveins to functions, only the entry block (which apparently might not
matter that much?). There should probably be a verifier check that
entry block live ins are also live into the function. We also might
need a verifier check that the copy to the livein virtual register is
in the entry block.
This one is pretty easy and shrinks the list of unhandled
intrinsics. I'm not sure how relevant the insert point is. Using the
insert position of EntryBuilder will place this after
constants. SelectionDAG seems to end up emitting these after argument
copies and before anything else, but I don't think it really
matters. This also ends up emitting these in the opposite order from
SelectionDAG, but I don't think that matters either.
This also needs a fix to stop the later passes dropping this as a dead
instruction. DeadMachineInstructionElim's version of isDead special
cases LOCAL_ESCAPE for some reason, and I'm not sure why it's excluded
from MachineInstr::isLabel (or why isDead doesn't check it).
I also noticed DeadMachineInstructionElim never considers inline asm
as dead, but GlobalISel will drop asm with no constraints.
This patch stops unconditionally transforming FSUB(-0, X) into an FNEG(X) while building the MIR.
This corresponds with the SelectionDAGISel change in D84056.
Differential Revision: https://reviews.llvm.org/D85139
Use pad with undef and unmerge with unused results. This is annoyingly
similar to several other places in LegalizerHelper, but they're all
slightly different.
For AMDGPU, vectors with elements < 32 bits should be indexed in
32-bit elements and the desired bits extracted from there. For
elements > 64-bits, these should be reduce to 64/32 elements to enable
the normal dynamic indexing paths.
In the dynamic index cases, this produces shorter code most of the
time. This does immediately regress the constant index cases, but this
should be fixed once we have the most basic of shift combines.
The element size > 64 case is pretty much ported from the exisiting
DAG implementation for extract element promote. The increasing element
size case is new.
https://reviews.llvm.org/D84909
Patch adds two new GICombinerRules, one for G_INTTOPTR and one for
G_PTRTOINT. The G_INTTOPTR elides ptr2int(int2ptr(x)) to a copy of x, if
the cast is within the same address space. The G_PTRTOINT elides
int2ptr(ptr2int(x)) to a copy of x. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
Patch by mkitzan
I still think it's highly questionable that we have two intrinsics
with identical behavior and only vary by the name of the libcall used
if it happens to be lowered that way, but try to reduce the feature
delta between SDAG and GlobalISel for recently added intrinsics. I'm
not sure which opcode should be considered the canonical one, but
lower roundeven back to round.
In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.
Differential Revision: https://reviews.llvm.org/D84866
I think these were added as a workaround for SelectionDAG lacking half
legalization support in the past. I think they should probably be
removed from the IR, but clang does still have a target control to
emit these instead of the native half fpext/fptrunc.
Widen or narrow a type to a type with the same scalar size as
another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar
operand to match the bitwidth of the pointer type. Use this to
disallow narrower types for G_PTRMASK.
This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(),
and llvm.smax() intrinsics specified in D81829. For SelectionDAG,
the ISD opcodes and all the legalization and lowering already exist,
so this just wires them up to the intrinsic in the SDAG builder and
adds rudimentary tests. For GlobalISel only the min/max intrinsics
are wired up, as llvm.abs() will require the addition of a G_ABS op,
and corresponding legalization support.
Differential Revision: https://reviews.llvm.org/D84125
Add support in LegalizerHelper for lowering G_SADDSAT etc. either
using add/subtract-with-overflow or using max/min instructions.
Enable this lowering for AMDGPU so it can be tested. The legalization
rules are still approximate and skips out on using the clamp bit to
treat these as legal, which has never been used before. This also
doesn't yet try to deal with expanding SALU cases.
Summary: We do this already for output operands, but missed it for (non-tied) input operands.
Reviewers: arsenm, Petar.Avramovic
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, llvm-commits, kerbowa
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83763
This was structured in a way that implied every split argument is in
memory, or in registers. It is possible to pass an original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.
This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code.
I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).
There are a few questionable things about this intrinsic and existing
DAG implementation. For some reason the intrinsic hardcodes the second
operand to be scalar-only i32, and SelectionDAG builder makes a
legalization decision based on whether the operand is constant.
Try to make the behavior more consistent with getGCDType, and bias
towards returning something closer to the source type whenever there's
an ambiguity.
Try harder to find a canonical unmerge type when trying to cover the
desired target type. Handle finding a compatible unmerge type for two
vectors with different element types. This will return the largest
multiple of the source vector element that will evenly divide the
target vector type.
Also make the handling mixing scalars and vectors, and prefer the
source element type as the unmerge target type.
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.
Differential Revision: https://reviews.llvm.org/D84010
Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP.
Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP
followed by s32->s16 G_FPTRUNC.
Differential Revision: https://reviews.llvm.org/D83880
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
Check that input size matches size of destination reg class.
Attempt to extend input size when needed.
Differential Revision: https://reviews.llvm.org/D83384
handleAssignments was assuming every argument type is an MVT, and
assignArg would always fail. This fixes one of the hacks in the
current AMDGPU calling convention code that pre-processes the
arguments.
In an earlier commit 584d0d5c17 I
added functionality to allow AArch64 CodeGen support for falling
back to DAG ISel when Global ISel encounters scalable vector
types. However, it seems that we were not falling back early
enough as llvm::getLLTForType was still being invoked for scalable
vector types.
I've added a new fallback function to the call lowering class in
order to catch this problem early enough, rather than wait for
lowerFormalArguments to reject scalable vector types.
Differential Revision: https://reviews.llvm.org/D82524
This matches the DAG behavior where this is called after the loop
checking for calls. The AMDGPU implementation depends on knowing if
there are calls in the function or not, so move this later.
Another problem is finalizeLowering is actually called twice; I was
seeing weird inconsistencies since the first call would produce
unexpected results and the second run would correct them in some
contexts. Since this requires disabling the verifier, and it's useful
to serialize the MIR immediately after selection, FinalizeISel should
probably not be a real pass.
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).
When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:
extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')
This patch fixes both issues.
Reviewers: david-arm, efriedma, spatel
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82910
The translation of cmpxchg added by
9481399c0f specifically skipped weak
cmpxchg due to not understanding the meaning. Weak cmpxchg was added
in 420a216817. As explained in the
commit message, the weak mode is implicit in how
ATOMIC_CMP_SWAP_WITH_SUCCESS is lowered. If it's expanded to a regular
ATOMIC_CMP_SWAP, it's replaced with a strong cmpxchg.
This handling seems weird to me, but this was already following the
DAG behavior. I would expect the strong IR instruction to not have the
boolean output. Failing that, I might expect the IRTranslator to emit
ATOMIC_CMP_SWAP and a constant for the boolean.
At the moment we use Global ISel by default at -O0, however it is
currently not capable of dealing with scalable vectors for two
reasons:
1. The register banks know nothing about SVE registers.
2. The LLT (Low Level Type) class knows nothing about scalable
vectors.
For now, the easiest way to avoid users hitting issues when using
the SVE ACLE is to fall back on normal DAG ISel when encountering
instructions that operate on scalable vector types.
I've added a couple of RUN lines to existing SVE tests to ensure
we can compile at -O0. I've also added some new tests to
CodeGen/AArch64/GlobalISel/arm64-fallback.ll
that demonstrate we correctly fallback to DAG ISel at -O0 when
lowering formal arguments or translating instructions that involve
scalable vector types.
Differential Revision: https://reviews.llvm.org/D81557
This was passing in all the parameters needed to construct a
LegalizerHelper in the custom legalization, when it's simpler to just
pass in the existing helper.
This is slightly more annoying to use in the common case where you
don't need the legalizer helper, but we could add back the common
parameters back in addition to the helper.
I didn't propagate this to all the internal target changes that this
logically implies, but did update a sample one for
legalizeMinNumMaxNum.
This is in preparation for moving AMDGPU load/store legalization
entirely into custom lowering. The current set of legalization actions
is really constraining and not really capable of expressing all the
actions needed to legalize loads/stores. In particular there's no way
to express when the memory access itself needs to change size vs. the
result type. There's also a lot of redundancy since the same
split/widen actions need to be applied in both vector and scalar
cases. All of the sub-cases logically belong as steps in the legalizer
helper, but it will be easier to consider everything at once in custom
lowering.
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.
Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.
This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.
Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer
Reviewed By: ostannard
Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75169
It's possible to end up with a zext or something in the way of a G_CONSTANT,
even pre-legalization. This can happen with memsets.
e.g.
https://godbolt.org/z/Bjc8cw
To make sure we can catch these cases, use `getConstantVRegValWithLookThrough`
instead of `mi_match`.
Differential Revision: https://reviews.llvm.org/D81875
This implements the following combines:
((0-A) + B) -> B-A
(A + (0-B)) -> A-B
Porting over the basic algebraic combines from the DAGCombiner. There are
several combines which fold adds away into subtracts. This is just the simplest
one.
I noticed that add combines are some of the most commonly hit across CTMark,
(via print statements when they fire), so I'm porting over some of the obvious
ones.
This gives some minor code size improvements on CTMark at -O3 on AArch64.
Differential Revision: https://reviews.llvm.org/D77453
Put AND before ADD in LegalizerHelper::lowerFPTRUNC_F64_TO_F16
in order to match algorithm from AMDGPUTargetLowering::LowerFP_TO_FP16.
Differential Revision: https://reviews.llvm.org/D81666
Summary:
Fix crash when using -debug caused by the GlobalISel observer trying to print
an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder
using buildInstr, which immediately inserts the instruction causing print,
instead of using BuildMI to first build up the instruction and using
insertInstr when finished.
Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure
no crash is happening.
Also fixed a missing %s in the 2nd RUN-line of the same test.
Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm
Reviewed By: arsenm
Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76934
If the target explicitly requested custom legalization, it should be
required to implement this. Also move default legalizeIntrinsic
implementation into the header so it's next to the related
legalizeCustom.
It was annoying enough that every custom lowering needed to set the
insert point, but this was made worse since now these all needed to be
updated to setInstrAndDebugLoc. Consolidate these so every
legalization action has the right insert position by default.
This should fix dropping debug info in every custom AMDGPU
legalization.
The current relationship between LegalizerHelper and MachineIRBuilder
confuses me, because the LegalizerHelper modifies the MachineIRBuilder
which it does not own. Constructing a LegalizerHelper destroys the
insert point, since the constructor calls setMF, which clears all the
fields. Try to separate these functions, so it's possible to construct
a LegalizerHelper from an existing MachineIRBuilder without losing the
insert point/debug loc.
The construction APIs for MachineIRBuilder don't make much sense, and
it's been annoying to sort through it with these trivial functions
separate from the declaration.
New instructions were getting printed both in createdInstr, and in the
final printNewInstrs, so it made it look like the same instructions
were created twice. This overall made reading the debug output
harder. Stop printing the initial construction and only print new
instructions in the summary at the end. This avoids printing the less
useful case where instructions are sometimes initially created with no
operands.
I'm not sure this is the correct instance to remove; now the visible
ordering is different. Now you will typically see the one erased
instruction message before all the new instructions in order. I think
this is the more logical view of typical legalization changes,
although it's mechanically backwards from the normal
insert-new-erase-old pattern.
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override.
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81374
Just computing the alignment makes sense without caring about the
general known bits, such as for non-integral pointers. Separate the
two and start calling into the TargetLowering hooks for frame indexes.
Start calling the TargetLowering implementation for FrameIndexes,
which improves the AMDGPU matching for stack addressing modes. Also
introduce a new hook for returning known alignment of target
instructions. For AMDGPU, it would be useful to report the known
alignment implied by certain intrinsic calls.
Also stop using MaybeAlign.
The AMDGPU lowering for unconstrained G_FDIV sometimes needs to
introduce a mode switch in the middle, so it's helpful to have
constrained instructions available to legalize this. Right now nothing
is preventing reordering of the mode switch with the other
instructions in the expansion.
I inverted the mask when I ported to the new form of G_PTRMASK in
8bc03d2168.
I don't think this really broke anything, since G_VASTART isn't
handled for types with an alignment higher than the stack alignment.
During legalization we can end up with extends of loads, which in the case of
zexts causes us to not hit tablegen imported patterns.
The caveat here is that we don't want anyext load forming, since some variants
are illegal. This change also prevents the combine from creating any illegal
loads.
Differential Revision: https://reviews.llvm.org/D80458
I get confused by a lot of the predicate names here, since I would
assume they apply to vectors as well. Rename to reflect they only
apply to scalars.
Also add a few predicates AMDGPU uses that should be generally useful.
Also add any() to complement all. I've wanted to use this a few times
but then worked around it not being there.
If we have a memory instruction (e.g. a load), we shouldn't combine it away in
some trivial combine.
It's possible that, say, a call lives between the instructions. This could
modify the value loaded, making the load instructions not safe to fold.
Differential Revision: https://reviews.llvm.org/D80053
Summary:
Previously, we only added early-clobber flags to the 'group' immediate flag operand
of an inline asm operand.
However, we also have to add the EarlyClobber flag to the MachineOperand itself.
This fixes PR46028
Reviewers: arsenm, leonardchan
Reviewed By: arsenm, leonardchan
Subscribers: phosek, wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80467
Confusingly, these were unrelated and had different semantics. The
G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a
different format. G_PTR_MASK only allows clearing the low bits of a
pointer, and only a constant number of bits. The ptrmask intrinsic
allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic.
Only selects the cases that look like the old instruction. More work
is needed to select the general case. Also new legalization code is
still needed to deal with the case where the incoming mask size does
not match the pointer size, which has a specified behavior in the
langref.
Replace with forward declaration and move dependency down to source files that actually need it.
Both TargetLowering.h and TargetMachine.h are 2 of the most expensive headers (top 10) in the ClangBuildAnalyzer report when building llc.
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.
I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.
I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.
Differential Revision: https://reviews.llvm.org/D80455
(This patch is by Jessica, I'm just committing it on her behalf because I need
a post-legalizer combiner for something else).
This supersedes D77250, which did equivalent work in the selector. This can be
done pre-legalization or post-legalization. Post-legalization is more likely to
hit, since G_IMPLICIT_DEFs tend to appear during legalization. There's no reason
to not do it pre-legalization though-- if it can be caught earlier, great.
(I also think that it might be worth reimplementing D78769 using a
target-specific post-legalization combine too after thinking about it for a
while.)
Differential Revision: https://reviews.llvm.org/D78852
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
This was looking for a compare condition, and copying the compare
flags. I don't think this was ever correct outside of certain min/max
patterns which aren't checked, but this probably predates select
instructions having fast math flags.
Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
Summary:
D78319 introduced basic support for inline asm input operands in GlobalISel.
However, that patch did not handle the case where a memory input operand still needs to
be indirectified. Later code asserts that the memory operand is already indirect.
This patch adds an early return false to trigger the SelectionDAG fallback for now.
Reviewers: arsenm, paquette
Reviewed By: arsenm
Subscribers: thakis, wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79955
Summary:
D78319 introduced basic support for inline asm input operands in GlobalISel.
However, that patch did not handle the case where a memory input operand still needs to
be indirectified. Later code asserts that the memory operand is already indirect.
This patch adds an early return false to trigger the SelectionDAG fallback for now.
Reviewers: arsenm, paquette
Reviewed By: arsenm
Subscribers: wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79955
Summary:
ConstantExprs involving operations on <1 x Ty> could translate into MIR
that failed to verify with:
*** Bad machine code: Reading virtual register without a def ***
The problem was that translate(const Constant &C, Register Reg) had
recursive calls that passed the same Reg in for the translation of a
subexpression, but without updating VMap for the subexpression first as
translate(const Constant &C, Register Reg) expects.
Fix this by using the same translateCopy helper function that we use for
translating Instructions. In some cases this causes extra G_COPY
MIR instructions to be generated.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45576
Reviewers: arsenm, volkan, t.p.northover, aditya_nandakumar
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78378
Currently this code exists in widenScalar for G_MERGE_VALUE
sources. I'm not sure if the existing expansion in widenScalar should
be removed or not. The widenScalar variant tries to extend to the
requested size, but this just uses the original bitwidth.
This fixes a verifier failure on a bot:
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O0-g/
```
*** Bad machine code: MBB has duplicate entries in its successor list. ***
- function: foo
- basic block: %bb.5 indirectgoto (0x7fe3d687ca08)
```
One of the GCC torture suite tests (pr70460.c) has an indirectbr instruction
which has duplicate blocks in its destination list.
According to the langref this is allowed:
> Blocks are allowed to occur multiple times in the destination list, though
> this isn’t particularly useful.
(https://www.llvm.org/docs/LangRef.html#indirectbr-instruction)
We don't allow this in MIR. So, when we translate such an instruction, the
verifier screams.
This patch makes `translateIndirectBr` check if a successor has already been
added to a block. If the successor is present, it is skipped rather than added
twice.
Differential Revision: https://reviews.llvm.org/D79609
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
The tl;dr story is that this causes jumps in the emitted line
tables, even at `-O0`. We could at some point consider more fancy
solutions to preserve locations, but it doesn't seem to be worth
the effort for now.
<rdar://problem/62460788>
Differential Revision: https://reviews.llvm.org/D78947
A previous bug fix for varargs introduced a regression where we would
incorrectly widen some stores to memory when passing i8/i16 parameters on the
stack. This didn't show up seemingly because it only happens when there is
no signext/zeroext parameter attribute, which I think for Darwin clang adds.
Swift however seems to be a different story, and a plain anyext on the parameter
triggered the bug.
To fix this, I've added a new ValueHandler::assignValueToAddress type override
which lets us distiguish between varargs and fixed args (we still need this
widening behaviour for varargs to fix the original bug in 2018).
rdar://61353552
Loosen the restriction on what kinds of opcodes can be CSEd as
targets may want to CSE some generic target specific pseudos.
NFC as far as this change is concerned as CSEConfig still pretty much is
a subset of this check.
Differential Revision: https://reviews.llvm.org/D78684
Summary:
While lowering memory intrinsics, GIsel attempts to form a tail call to
a library routine.
There might be a DBG_LABEL or something after the intrinsic call,
though: in that case, GIsel should still be able to form the tail call,
and should also delete the debug insts after the tail call as the
transform makes them invalid.
Reviewers: dsanders, aemerson
Subscribers: hiraditya, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78335
Summary:
Fix an issue which could result in ElideBrByInvertingCond or
CombineIndexedLoadStore being missed when debug info is present. In both
cases the fix is s/hasOneUse/hasOneNonDbgUse/.
Reviewers: aemerson, dsanders
Subscribers: hiraditya, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78254
Summary:
This fixes several issues where the presence of debug instructions could
disable certain combines, due to dominance queries finding uses/defs that
don't actually exist.
Reviewers: dsanders, fhahn, paquette, aemerson
Subscribers: hiraditya, arphaman, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78253
Summary:
It looks like RegBankSelect can try to assign a bank based on a
DBG_VALUE instead of ignoring it. This eventually leads to an assert
in AArch64RegisterBankInfo::getInstrMapping because there is some info
missing from the DBG_VALUE MachineOperand (I see: `Assertion failed:
(RawData != 0 && "Invalid Type"), function getScalarSizeInBits`).
I'm not 100% sure it's safe to insert DBG_VALUE instructions right
before RegBankSelect (that's what -debugify-and-strip-all-safe is
doing). Any advice appreciated.
Depends on D78135.
Reviewers: ab, qcolombet, dsanders, aprantl
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78137
Summary:
Similar to the CallLowering class used for lowering LLVM IR calls to MIR calls,
we introduce a separate class for lowering LLVM IR inline asm to MIR INLINEASM.
There is no functional change yet, all existing tests should pass.
Reviewers: arsenm, dsanders, aemerson, volkan, t.p.northover, paquette
Reviewed By: aemerson
Subscribers: gargaroff, wdng, mgorny, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78316
Summary:
This verifier tries to ensure that DebugLoc's don't just disappear as
we transform the MIR. It observes the instructions created, erased, and
changed and at checkpoints chosen by the client algorithm verifies the
locations affected by those changes.
In particular, it verifies that:
* Every DebugLoc for an erased/changing instruction is still present on
at least one new/changed instruction
* Failing that, that there is a line-0 location in the new/changed
instructions. It's not possible to confirm which locations were merged so
it conservatively assumes all unaccounted for locations are accounted
for by any line-0 location to avoid false positives.
If that fails, it prints the lost locations in the debug output along with
the instructions that should have accounted for them.
In theory, this is usable by the legalizer, combiner, selector and any other
pass that performs incremental changes to the MIR. However, it has so far
only really been tested on the legalizer (not including the artifact
combiner) where it has caught lots of lost locations, particularly in Custom
legalizations. There's only one example here as my initial testing was on an
out-of-tree target and I haven't done a pass over the in-tree targets yet.
Depends on D77575, D77446
Reviewers: bogner, aprantl, vsk
Subscribers: jvesely, nhaehnle, mgorny, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77576
Summary:
This will allow us to fix the issue where the lost locations
verifier causes CodeGen changes on lost locations because it
falls back on DAGISel
Reviewers: qcolombet, bogner, aprantl, vsk, paquette
Subscribers: rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78261
Summary:
As a follow up to https://reviews.llvm.org/D29014, add translation
support for freeze.
Introduce a new generic instruction G_FREEZE and translate freeze to it.
Reviewers: dsanders, aqjune, arsenm, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson
Reviewed By: aqjune, arsenm
Subscribers: fhahn, lebedev.ri, wdng, rovka, hiraditya, jfb, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77795
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: stoklund, sdesmalen, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77272
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.
This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74060
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.
To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.
Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.
Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar
Reviewed By: arsenm
Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76598
Summary: I think it would be better to require the alignment to be >= 1. It is currently confusing to allow both values.
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77372
The legalizer has a tendency to lose DebugLoc's when expanding or
combining instructions. The verifier that detected these isn't ready
for upstreaming yet but this patch fixes the cases that came up when
applying it to our out-of-tree backend's CodeGen tests.
This pattern comes up a few more times in this file and probably in
the backends too but I'd prefer to fix the others separately (and
preferably when the lost-location verifier detects them).
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
When we have
```
a = G_OR x, x
```
or
```
b = G_AND y, y
```
We can drop the G_OR/G_AND and just use x/y respectively.
Also update arm64-fallback.ll because there was an or in there which hits this
transformation.
Differential Revision: https://reviews.llvm.org/D77105
Implement identity combines for operations like the following:
```
%a = G_SUB %b, 0
```
This can just be replaced with %b.
Over CTMark, this gives some minor size improvements at -O3.
Differential Revision: https://reviews.llvm.org/D76640
This reverts commit b3297ef051.
This change is incorrect. The current semantic of null in the IR is a
pointer with the bitvalue 0. It is not a cast from an integer 0, so
this should preserve the pointer type.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
When we see this:
```
%a = COPY $physreg
...
SOMETHING implicit-def $physreg
...
%b = COPY $physreg
```
The two copies are not equivalent, and so we shouldn't perform any folding
on them.
When we have two instructions which use a physical register check that they
define the same virtual register(s) as well.
e.g., if we run into this case
```
%a = COPY $physreg
...
%b = COPY %a
```
we can say that the two copies are the same, and can be folded.
Differential Revision: https://reviews.llvm.org/D76890
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76925
Summary:
The existing helper function can only create a libcall to functions available in
RTLIB. Add a helper function that can create a libcall to a given function name
using the provided calling convention.
Reviewers: aditya_nandakumar, t.p.northover, rovka, arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76845
For some operations, the type is unimportant and only the number of
bits matters. For example I don't want to treat <4 x s8> as a legal
type, but I also don't want to decompose loads of this into smaller
pieces to get legal register types.
On AMDGPU in SelectionDAG, we legalize a number of operations (most
notably load and store) by coercing all types to vectors of i32. For
GlobalISel, I'm trying very hard to avoid doing this for every type,
but I don't think this strategy can be completely avoided. I'm trying
to avoid bitcasts for any legitimately legal type we can operate on,
since the intervening bitcasts have proven to be a hassle.
For loads, I think I can get away without ever casting the result
type, and handling any arbitrary bitwidth during selection (I will
eventually want new tablegen support to help with this, rather than
having to add every possible type as legal). The unmerge required to
do anything with the value should expand to the expected shifts. This
is trickier for stores, since it would now require handling a wide
array of truncates during selection which I don't want.
Future potentially interesting case are for vector indexing, where
sub-dword type should be indexed in s32 pieces.
When we find something like this:
```
%a:_(s32) = G_SOMETHING ...
...
%select:_(s32) = G_SELECT %cond(s1), %a, %a
```
We can remove the select and just replace it entirely with `%a` because it's
always going to result in `%a`.
Same if we have
```
%select:_(s32) = G_SELECT %cond(s1), %a, %b
```
where we can deduce that `%a == %b`.
This implements the following cases:
- `%select:_(s32) = G_SELECT %cond(s1), %a, %a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %some_copy_from_a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %b` -> `%a` when `%a` and `%b`
are defined by identical instructions
This gives a few minor code size improvements on CTMark at -O3 for AArch64.
Differential Revision: https://reviews.llvm.org/D76523
I think we can save the MRI argument from these since it's in
GISelKnownBits already, but currently not accessible.
Implementation deferred to avoid dependency on other patches.
Summary:
Add new generic MIR opcodes G_SADDSAT etc. Add support in IRTranslator
for translating the saturating add/subtract intrinsics to the new
opcodes.
Reviewers: aemerson, dsanders, paquette, arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76600
Summary:
Widening G_UNMERGE_VALUES to a type which is larger than the
original source type is the same as widening it to the same
type as the source type: in both cases, G_UNMERGE_VALUES has
to be replaced with bit arithmetic which. Although the arithmetic
itself is independent of whether the source type is smaller
or equal to the widen type, widening the source type to the
widen type should result in less artifacts being emitted,
since this is the type that the user explicitly requested.
Reviewers: arsenm, dsanders, aemerson, aditya_nandakumar
Reviewed By: arsenm, dsanders
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76494
This reverts commit e9f22fd429.
When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
Port over the following:
- shuffle undef, undef, any_mask -> undef
- shuffle anything, anything, undef_mask -> undef
This sort of thing shows up a lot when you try to bugpoint code containing
shufflevector.
Differential Revision: https://reviews.llvm.org/D76382
This patch generates TableGen descriptions for the specified register
banks which contain a list of register sizes corresponding to the
available HwModes. The appropriate size is used during codegen according
to the current HwMode. As this HwMode was not available on generation,
it is set upon construction of the RegisterBankInfo class. Targets
simply need to provide the HwMode argument to the
<target>GenRegisterBankInfo constructor.
The RISC-V RegisterBankInfo constructor has been updated accordingly
(plus an unused argument removed).
Differential Revision: https://reviews.llvm.org/D76007
This ports some combines from DAGCombiner.cpp which perform some trivial
transformations on instructions with undef operands.
Not having these can make it extremely annoying to find out where we differ
from SelectionDAG by looking at existing lit tests. Without them, we tend to
produce pretty bad code generation when we run into instructions which use
undef operands.
Also remove the nonpow2_store_narrowing testcase from arm64-fallback.ll, since
we no longer fall back on the add.
Differential Revision: https://reviews.llvm.org/D76339
I used the implementation for floor instead of round. It also turns
out the OpenCL builtin library wasn't using the round builtin, but
implemented the expanded form.
Summary: When narrowing a scalar G_EXTRACT where the destination lines up perfectly with a single result of the emitted G_UNMERGE_VALUES a COPY should be emitted instead of unconditionally trying to emit a G_MERGE_VALUES.
Reviewers: arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75743
If the loaded memory size was smaller than the result size, this would
produce out of bounds memory accesses. I'm wondering if we need a
distinct narrow memory legalize action type, since a case I care about
is decomposing a 4-byte unaligned access into 4 extending loads, which
would leave the original result register type. I'm currently awkwardly
using narrowScalar to handle unaligned accesses that need to be split.
This reverts commit 5583c2f2fb.
The lldb bot failure was a test that was fragile and sensitive to irrelevant
changes in instruction ordering. Re-committing this as the test should have
been skipped for AArch64 now.
Differential Revision: https://reviews.llvm.org/D75555
This changes the localizer to attempt intra-block localizer of instructions
that have local uses. This is useful because sometimes the entry block itself
has many uses of constant-like instructions, which would benefit from shortening
live ranges. Previously if an inst had no non-local uses, we wouldn't add it to
the list of instructions to attempt further intra-block localization.
This gives a 0.7% geomean code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D75555
Unlike what I claimed in my previous commit. The caching is
actually not NFC on PHIs.
When we put a big enough max depth, we end up simulating loops.
The cache is effectively cutting the simulation short and we
get less information as a result.
E.g.,
```
v0 = G_CONSTANT i8 0xC0
jump
v1 = G_PHI i8 v0, v2
v2 = G_LSHR i8 v1, 1
```
Let say we want the known bits of v1.
- With cache:
Set v1 cache to we know nothing
v1 is v0 & v2
v0 gives us 0xC0
v2 gives us known bits of v1 >> 1
v1 is in the cache
=> v1 is 0, thus v2 is 0x80
Finally v1 is v0 & v2 => 0x80
- Without cache and enough depth to do two iteration of the loop:
v1 is v0 & v2
v0 gives us 0xC0
v2 gives us known bits of v1 >> 1
v1 is v0 & v2
v0 is 0xC0
v2 is v1 >> 1
Reach the max depth for v1...
unwinding
v1 is know nothing
v2 is 0x80
v0 is 0xC0
v1 is 0x80
v2 is 0xC0
v0 is 0xC0
v1 is 0xC0
Thus now v1 is 0xC0 instead of 0x80.
I've added a unittest demonstrating that.
NFC
Changes the handling of odd breakdowns, and avoids using
G_EXTRACT/G_INSERT. Pad with undef to a wider size, and unmerge. Also
avoid introducing instructions for the fully undef components.
This patch adds a cache that is valid only for the duration of a call
to getKnownBits. With such short lived cache we avoid all the problems
of cache invalidation while still getting the benefits of reusing
the information we already computed.
This cache is useful whenever an instruction occurs more than once
in a chain of computation.
E.g.,
v0 = G_ADD v1, v2
v3 = G_ADD v0, v1
Previously we would compute the known bits for:
v1, v2, v0, then v1 again and finally v3.
With the patch, now we won't have to recompute v1 again.
NFC
When analyzing PHIs, we gather the known bits for every operand and
merge them together to get the known bits of the result of the PHI.
It is not unusual that merging the information leads to know nothing
on the result (e.g., phi a: i8 3, b: i8 unknown, ..., after looking at the
second argument we know we will know nothing on the result), thus, as
soon as we reach that state, stop analyzing the following operand (i.e.,
on the previous example, we won't process anything after looking at `b`).
This improves compile time in particular with PHIs with a large number
of operands.
NFC.
https://reviews.llvm.org/D67133
While investigating some non determinism (CSE doesn't produce wrong
code, it just doesn't CSE some times) in GISel CSE on an out of tree
target, I realized that the core issue was that there were lots of code
that mutates (setReg, setRegClass etc), but doesn't notify observers
(CSE in this case but this could be any other observer). In order to
make the Observer be available in various parts of code and to avoid
having to thread it through various API, the MachineFunction now has the
observer as field. This allows it to be easily used in helper functions
such as constrainOperandRegClass.
Also added some invariant verification method in CSEInfo which can
catch these issues (when CSE is enabled).
Produce an unmerge to a narrower type and introduce a narrower shift
if needed. I wasn't sure if there was a better way to parameterize the
target's preferred shift type for the GICombineRule, so manually call
the combine helper.
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).
This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
Like COPY instructions explained in D70616, we don't check the constraints
when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check
if registers can be replaced, or a COPY instruction needs to be built.
https://reviews.llvm.org/D70564
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.
I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.
NFC.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74482
A downstream test exposed a simple logic bug with the manual pointer
stripping code, fix that by just using stripPointerCasts() on the value.
I don't think there's a way to expose this issue upstream.
Calls to ObjC's objc_msgSend function are done by bitcasting the function global
to the required function type signature. This patch looks through this bitcast
so that we can do a direct call with bl on arm64 instead of using an indirect blr.
Differential Revision: https://reviews.llvm.org/D74241
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.
Differential Revision: https://reviews.llvm.org/D74223
This is a one off special case, since actually implementing full inline asm
support will be much more involved. This lets us compile a lot more code as a
common simple case.
Differential Revision: https://reviews.llvm.org/D74201
The type passed to lower was invalid, so I'm not sure how this was
even working before. The source and destination type also do not have
to match, so make sure to use the right ones.
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
This reverts commit ed29dbaafa.
I'm backing out D68945, which as the discussion for D73526 shows, doesn't
seem to handle the -O0 path through the codegen backend correctly. I'll
reland the patch when a fix is worked out, apologies for all the churn.
The two parent commits are part of this revert too.
Conflicts:
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
llvm/test/DebugInfo/X86/dbg-addr-dse.ll
SelectionDAGBuilder conflict is due to a nearby change in e39e2b4a79
that's technically unrelated. dbg-addr-dse.ll conflicted because
41206b61e3 (legitimately) changes the order of two lines.
There are further modifications to dbg-value-func-arg.ll: it landed after
the patch being reverted, and I've converted indirection to be represented
by the isIndirect field rather than DW_OP_deref.
The legalizer produces a lot of these, and they make reading legalized
MIR annoying. For some reason, this does seem to sometimes introduce
copies of implicit def, which is dumb.
contractCrossBankCopyIntoStore() finds the instruction defines the
source register and uses its output to replace the register. There are,
however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES.
Current implementation hardcodes to operand 0 and has no way of knowing
which output should be used.
This change adds another function to directly return the register that
is the source of the register and use that for folding.
This fixes https://bugs.llvm.org/show_bug.cgi?id=44783
Differential Revision: https://reviews.llvm.org/D74005
Start using a new strategy with a combination of merge and unmerges.
This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
Summary:
Applying this cleanup:
- MIRBuilder.buildInstr(TargetOpcode::G_ASHR)
- .addDef(Shifted)
- .addUse(Res)
- .addUse(ShiftAmt);
+ MIRBuilder.buildAShr(Shifted, Res, ShiftAmt);
caused an assertion failure here:
llc: /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr *llvm::MachineRegisterInfo::getVRegDef(unsigned int) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.
#4 0x00000000050a6d96 in llvm::MachineRegisterInfo::getVRegDef (this=0x74606a0, Reg=2147483650) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403
#5 0x00000000066148f6 in llvm::getConstantVRegValWithLookThrough (VReg=2147483650, MRI=..., LookThroughInstrs=false, HandleFConstant=true) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:244
#6 0x00000000066147da in llvm::getConstantVRegVal (VReg=2147483650, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:210
#7 0x0000000006615367 in llvm::ConstantFoldBinOp (Opcode=101, Op1=2147483650, Op2=2147483656, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:341
#8 0x000000000657eee0 in llvm::CSEMIRBuilder::buildInstr (this=0x7465010, Opc=101, DstOps=..., SrcOps=..., Flag=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp:160
#9 0x0000000003645958 in llvm::MachineIRBuilder::buildAShr (this=0x7465010, Dst=..., Src0=..., Src1=..., Flags=...) at /home/jayfoad2/git/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1298
#10 0x00000000065c35b1 in llvm::LegalizerHelper::lower (this=0x7fffffffb5f8, MI=..., TypeIdx=0, Ty=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2020
because at this point there are two instructions defining Res: the
original G_SMULO/G_UMULO and the new G_MUL that we built. The fix is
to modify the original mul in place, so that there is only ever one
definition of Res.
Reviewers: arsenm, aditya_nandakumar
Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72842
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73785
One of the exit criteria of computeKnownBits is whether we reach the max
recursive call depth. Before this patch we would check that the
depth is exactly equal to max depth to exit.
Depth may get bigger than max depth if it gets passed to a different
GISelKnownBits object.
This may happen when say a generic part uses a GISelKnownBits object
with some max depth, but then we hit TL.computeKnownBitsForTargetInstr
which creates a new GISelKnownBits object with a different and smaller
depth. In that situation, when we hit the max depth check for the first
time in the target specific GISelKnownBits object, depth may already
be bigger than the current max depth. Hence we would continue to compute
the known bits, until we ran through the full depth of the chain of
computation or ran out of stack space.
For instance, let say we have
GISelKnownBits Info(/*MaxDepth*/ = 10);
Info.getKnownBits(Foo)
// 9 recursive calls to computeKnownBitsImpl.
// Then we hit a target specific instruction.
// The target specific GISelKnownBits does this:
GISelKnownBits TargetSpecificInfo(/*MaxDepth*/ = 6)
TargetSpecificInfo.computeKnownBitsImpl() // <-- next max depth checks would
// always return false.
This commit does not have any test case, none of the in-tree targets
use computeKnownBitsForTargetInstr.
We can have geps that have a scalar base pointer, and a vector index value, which
means that the base pointer must be splatted into a vector of pointers.
This fixes crashes on arm64 GlobalISel with optimizations enabled.
Commit 9965b12fd1 was supposed to change the offset constant when
lowering load/stores, but only introduced this change for loads. This
patch adds the same fix for stores.
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.
I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.
Vector support not yet implemented.
Differential Revision: https://reviews.llvm.org/D73659
Summary:
This is mostly NFC. computeForAddSub may give more precise results in
some cases, but that doesn't seem to affect any existing GlobalISel
tests.
Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73431
G_CTPOP is generated from llvm.ctpop.<type> intrinsics, clang generates
these intrinsics from __builtin_popcount and __builtin_popcountll.
Add lower and narrow scalar for G_CTPOP.
Lower G_CTPOP for MIPS32.
Differential Revision: https://reviews.llvm.org/D73216
llvm.cttz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
G_CTTZ is generated from llvm.cttz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_ctz and
__builtin_ctzll.
G_CTTZ_ZERO_UNDEF comes from llvm.cttz.<type> (<type> <src>, i1 true).
Clang generates such intrinsics as parts of expansion of builtin_ffs
and builtin_ffsll. It is also traditionally part of and many
algorithms that are now predicated on avoiding zero-value inputs.
Add narrow scalar (algorithm uses G_CTTZ_ZERO_UNDEF) for G_CTTZ.
Lower G_CTTZ and G_CTTZ_ZERO_UNDEF for MIPS32.
Differential Revision: https://reviews.llvm.org/D73215
llvm.ctlz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
MIPS clz instruction returns 32 for zero input.
G_CTLZ is generated from llvm.ctlz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_clz and
__builtin_clzll.
G_CTLZ_ZERO_UNDEF can also be generated from llvm.ctlz with true as
second argument. It is also traditionally part of and many algorithms
that are now predicated on avoiding zero-value inputs.
Add narrow scalar for G_CTLZ (algorithm uses G_CTLZ_ZERO_UNDEF).
Lower G_CTLZ_ZERO_UNDEF and select G_CTLZ for MIPS32.
Differential Revision: https://reviews.llvm.org/D73214
Teach the GISelKnowBits analysis how to deal with PHI operations.
PHIs are essentially COPYs happening on edges, so we can just reuse
the code for COPY.
This is NFC COPY-wise has we leave Depth untouched when calling
computeKnownBitsImpl for COPYs, like it was before this patch.
Increasing Depth is however required for PHIs as they may loop back to
themselves and we would end up in an infinite loop if we were not
increasing Depth.
Differential Revision: https://reviews.llvm.org/D73317
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
This was assuming the narrow target was the source type. Respect the
requested type when these don't match by using intermediate
merges. This avoids producing very wide, illegal shift expansions.
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
This now develops the same problem G_ZEXT/G_ANYEXT have where the
requested type is assumed to be the source type. This will be fixed
separately by creating intermediate merges.
Bitcast only really applies between scalars and vectors. Implement as
an unmerge and remerge. The test needs to tolerate failure since one
of the unmerges currently fails to legalize.
We're planning to remove the shufflemask operand from ShuffleVectorInst
(D72467); fix GlobalISel so it doesn't depend on that Constant.
The change to prelegalizercombiner-shuffle-vector.mir happens because
the input contains a literal "-1" in the mask (so the parser/verifier
weren't really handling it properly). We now treat it as equivalent to
"undef" in all contexts.
Differential Revision: https://reviews.llvm.org/D72663
As an intermediate step, some TLI functions can be converted to using
LLT instead of MVT. Move this somewhere out of GlobalISel so DAG
functions can use these.
AMDGPU can't unambiguously go back from the selected instruction
register class to the register bank without knowing if this was used
in a boolean context.
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Recommit notes:
Introduce temporary variables in order to make sure
instructions get inserted into MachineFunction in same order
regardless of compiler used to build llvm.
Differential Revision: https://reviews.llvm.org/D71363
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Differential Revision: https://reviews.llvm.org/D71363
G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates
these intrinsics from __builtin_bswap32 and __builtin_bswap64.
Add lower and narrowscalar for G_BSWAP.
Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later.
Differential Revision: https://reviews.llvm.org/D71362
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).
Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.
Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.
This matches what was done for X86 in 6bf108d77a.
Differential Revision: https://reviews.llvm.org/D71721