Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.
No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
The following changes are necessasy to get the generated tree
matcher to compile:
- In CodeExpansions::declare(), the assert() prevents connecting
two instructions. E.g. the match code
(match (MUL $t, $s1, $s2),
(SUB $d, $t, $s3)),
results in two declarations of $t, one for the def and one for
the use. Removing the assertion allows this construct.
If $t is later used, it is one of the operands, which should be
perfectly fine.
- The code emitted in GIMatchTreeVRegDefPartitioner::generatePartitionSelectorCode()
is not compilable:
- The value of NewInstrID should be emitted, not the name
- Both calls involving getOperand() end with one parenthesis too many
- Swaps generated condition for the partition code in the latter function
It also changes the rules i2p_to_p2i, fabs_fabs_fold, and fneg_fneg_fold
to use the tree matcher for a linear match. These rules are tested by:
CodeGen/AArch64/GlobalISel/combine-fabs.mir
CodeGen/AArch64/GlobalISel/combine-fneg.mir
CodeGen/AArch64/GlobalISel/combine-ptrtoint.mir
CodeGen/AMDGPU/GlobalISel/combine-add-nullptr.mir
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D133257
This is a partial port of the code used by the SelectionDAGBuilder to
translate selects.
In particular, see matchSelectPattern in ValueTracking.cpp. This is a
GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum.
I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler.
On the AArch64-end, it seems like the FP cases are more important for perf
right now, so I bit the bullet and went at the more complicated problem. :)
I elected to do this as a post-legalize combine rather than in the
IRTranslator because
Deciding which fmax/fmin to use can depend on legalization rules
Philosophically-speaking (TM), putting it in a combine just feels cleaner
Being able to enable/disable the combine is handy
Another option would be to use the ValueTracking code in the IRTranslator and
match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat
annoying since we'd need to write lowerings back into the selects in the
legalizer. I'm not strongly opposed to the approach.
We'd also want to be careful with vector selects once that's implemented,
which explicitly check if a vector select is legal on the target. That'd
probably need a hook.
From what I can tell, doing this as a combine is probably a cleaner option
long-term.
Differential Revision: https://reviews.llvm.org/D116702
Simplify extended add/sub (with carry-in and carry-out) to add/sub with
carry (with carry-out only) if carry-in is known to be zero.
Differential Revision: https://reviews.llvm.org/D133702
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
We shouldn't use getOpcodeDef() if we need to guarantee the def has only one
user since under the hood it may look through copies and optimization hints,
which themselves may have multiple users.
SelectionDAG has a target hook, getExtendForAtomicOps, which it uses
in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty
ugly (as is having a separate load opcode for atomics), so instead
allow making use of atomic zextload. Enable this for AArch64 since the
DAG path defaults in to the zext behavior.
The tablegen changes are pretty ugly, but partially helps migrate
SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with
atomic memory operands. For now the DAG emitter will emit matchers for
patterns which the DAG will not produce.
I'm still a bit confused by the intent of the isLoad/isStore/isAtomic
bits. The DAG implementation rejects trying to use any of these in
combination. For now I've opted to make the isLoad checks also check
isAtomic, although I think having isLoad and isAtomic set on these
makes most sense.
Patch adds new GICombineRules for G_ADD:
G_ADD(x, G_SUB(y, x)) -> y
G_ADD(G_SUB(y, x), x) -> y
Patch additionally adds new combine tests for AArch64 target for
these new rules.
Reviewed by: paquette
Differential Revision: https://reviews.llvm.org/D87936
I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold
FMIN/MAX instrs with NaNs.
The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes:
G_FMINNUM(X, NaN) -> X
G_FMAXNUM(X, NaN) -> X
G_FMINIMUM(X, NaN) -> NaN
G_FMAXIMUM(X, NaN) -> NaN
The patch adds AArch64 tests for these combines as well.
Reviewed by: arsenm
Differential revision: https://reviews.llvm.org/D125819
This change adds the constant splat versions of m_ICst() (by using
getBuildVectorConstantSplat()) and uses it in
matchOrShiftToFunnelShift(). The getBuildVectorConstantSplat() name is
shortened to getIConstantSplatVal() so that the *SExtVal() version would
have a more compact name.
Differential Revision: https://reviews.llvm.org/D125516
This wraps up from D119053. The 2 headers are moved as described,
fixed file headers and include guards, updated all files where the old
paths were detected (simple grep through the repo), and `clang-format`-ed it all.
Differential Revision: https://reviews.llvm.org/D119876
This will do the combine in cases that should fold, but don't
now. e.g. we're relying on the CSEMIRBuilder's incomplete constant
folding. For instance it doesn't handle FP operations or vectors (and
we don't have separate constant folding combines either to catch
them).
Similar to the G_*MULO change.
The code for checking if a constant is legal/pre-legalize is shared between
these, and is kind of hairy. So, factor it out into a new function:
`isConstantLegalOrBeforeLegalizer`.
To make the refactoring clean, further refactor `isLegalOrBeforeLegalizer` into
a wrapper for two functions:
- `isPreLegalize`
- `isLegal`
This is a bit easier to read in general.
https://godbolt.org/z/KW7oszP1o
Differential Revision: https://reviews.llvm.org/D118655
Similar to the following combine in `DAGCombiner::visitMULO`:
```
// fold (mulo x, 0) -> 0 + no carry out
if (isNullOrNullSplat(N1))
return CombineTo(N, DAG.getConstant(0, DL, VT),
DAG.getConstant(0, DL, CarryVT));
```
This fixes some generally poor codegen for `*mulo`:
https://godbolt.org/z/eTxYsvz8f
Differential Revision: https://reviews.llvm.org/D118635
This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0
and C1 are constants where C0 + C1 is the bit-width of the shift
instructions.
Differential Revision: https://reviews.llvm.org/D116529
The GlobalISel combiner currently uses sign extension when manipulating
the LHS constant when combining a sequence of the following sequence of
machine instructions into a single constant:
```
%0:_(s32) = G_CONSTANT i32 <CONSTANT>
%1:_(p0) = G_INTTOPTR %0:_(s32)
%2:_(s64) = G_CONSTANT i64 <CONSTANT>
%3:_(p0) = G_PTR_ADD %1:_, %2:_(s64)
```
This causes an issue when the bit width of the first contant and the
target pointer size are different, as G_INTTOPTR has no sign extension
semantics.
This patch fixes this by capture an arbitrary precision in when matching
the constant, allowing the matching function to correctly zero extend
it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D116941
Fma combine assumes that MRI.getVRegDef(Reg)->getOperand(0).getReg() = Reg
which is not true when Reg is defined by instruction with multiple defs
e.g. G_UNMERGE_VALUES.
Fix is to keep register and the instruction that defines register in
DefinitionAndSourceRegister and use when needed.
Differential Revision: https://reviews.llvm.org/D117032
Change CombinerHelper::matchBitfieldExtractFromShrAnd to use
getPreferredShiftAmountTy for the shift-amount-like operands of G_UBFX
just like all the other G_[SU]BFX combines do. This better matches the
AMDGPU legality rules for these instructions.
Differential Revision: https://reviews.llvm.org/D116803
1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality
with the correct types for the G_UBFX that it builds.
2. Fix AMDGPUTargetLowering::isConstantUnsignedBitfieldExtractLegal to
match the legality rules: result and first operand can be s32 or s64
but the "shift amount" operands are always s32.
3. Add AMDGPU tests where the post-legalizer combiner would create
illegal MIR without the above fixes.
Differential Revision: https://reviews.llvm.org/D116802
Expanding on D109750.
Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune
unused register operands as their defs are erased. Consequently, this renders
`MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a
substantial performance improvement.
The only necessary changes involve making relevant passes consider invalid
DBG_VALUE vregs uses as valid.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D112852
This change exposes isBuildVectorConstantSplat() to the llvm namespace
and uses it to implement the constant splat versions of
m_SpecificICst().
CombinerHelper::matchOrShiftToFunnelShift() can now work with vector
types and CombinerHelper::matchMulOBy2()'s match for a constant splat is
simplified.
Differential Revision: https://reviews.llvm.org/D114625