We were missing some instruction costs when converting vectors of
floating point half types into integers, so I've added those here.
I also manually generated assembly code for each FP->int case and
looked at the number of instructions generated, which meant
adjusting some of the existing costs too.
I've updated an existing test to reflect the new costs:
Analysis/CostModel/AArch64/sve-fptoi.ll
Differential Revision: https://reviews.llvm.org/D99935
This reverts commit ea1a0d7c9a.
While this is strictly more powerful, it is also strictly slower.
InstSimplify intentionally does not perform many folds that it
is allowed to perform, if doing so requires a KnownBits calculation
that will be repeated in InstCombine.
Maybe it's worthwhile to do this here, but that needs a more
explicitly stated motivation, evaluated in a review.
New registers FRM, FFLAGS and FCSR was defined. They represent
corresponding system registers. The new registers are necessary to
properly order floating point instructions in non-default modes.
Differential Revision: https://reviews.llvm.org/D99083
PHIElimination may insert copy instructions in multiple basic
blocks. Moving debug locations across basic block boundaries would be
misleading as illustrated by the test case.
rdar://75463656
Differential Revision: https://reviews.llvm.org/D100886
This is just a cleanup of the very high level stuff. I'm sure there is
more to update here but I'll leave that to others and/or a followup.
Differential Revision: https://reviews.llvm.org/D100888
FunctionAnalysisManagerCGSCCProxy should not be preserved if any of its
keys may be invalid. Since we are not removing/adding functions in
FuncAttrs, it's fine to preserve it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D100893
This reverts commit 13ec913bdf.
This commit introduces new uses of the overflow checking intrinsics that
depend on implementations in compiler-rt, which Windows users generally
do not link against. I filed an issue (somewhere) to make clang
auto-link the builtins library to resolve this situation, but until that
happens, it isn't reasonable for the optimizer to introduce new link
time dependencies.
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.
This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.
After this lands and has baked for a couple days, planned cleanups:
Make GCStatepointInst a IntrinsicInst subclass.
Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
Do the same in SelectionDAG.
Do the same in FastISEL.
Differential Revision: https://reviews.llvm.org/D99976
This is a more convoluted form of the same pattern "sext of NSW trunc",
but in this case the operand of trunc was a right-shift,
and the truncation chops off just the zero bits that were shifted-in.
We already special-cased a few interesting patterns,
but that is strictly less powerful than using KnownBits.
So instead get the known bits for the operand of `and`,
and iff all the unset bits of the `and`-mask are known to be zeros
in the operand, we can omit said `and`.
Introduced the cost of thre reverse shuffles for AArch64, currently just
copied the costs for PermuteSingleSrc.
Differential Revision: https://reviews.llvm.org/D100871
I'd reverted this in commit 3b6acb1797 due to buildbot failures. This patch contains the fix for said issue. I'd forgotten to handle the case where two phis in the same block have different operand order. We canonicalize away from this, but it's still valid IR. The tests included in this change (as opposed to simply having test output changed), crashed without the fix.
Original commit message follows...
This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.
(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)
Differential Revision: https://reviews.llvm.org/D99912
af7925b4dd added a custom DAG combine for recognizing fp-to-ints of
extract_subvectors that could be lowered to f64x2.convert_low_i32x4_{s,u}
instructions. This commit extends the combines to recognize equivalent
extract_subvectors of fp-to-ints as well.
Differential Revision: https://reviews.llvm.org/D100790
Adding the switches to reduce diffs. I'm about to split that into an lshr part and an ashr part, doing the NFC part first makes it easier to maintain both diffs.
This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.
(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)
Differential Revision: https://reviews.llvm.org/D99912
Summary:
This patch registers OpenMPOpt as a Module pass in addition to a CGSCC
pass. This is so certain optimzations that are sensitive to intact
call-sites can happen before inlining. The old `openmpopt` pass name is
changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass.
The current module pass only runs a single check but will be expanded in
the future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D99202
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D100495
This fixes a subtle and nasty bug in my 86664638. The problem is that free(nullptr) is well defined (and common).
The specification for the nofree attributes talks about memory objects, and doesn't explicitly address null, but I think it's reasonable to assume that nofree doesn't disallow a call to free(nullptr). If it did, we'd have to prove nonnull on an argument to ever infer nofree which doesn't seem to be the intent.
This was found by Nuno and Alive2 over in https://reviews.llvm.org/D100141#2697374.
Differential Revision: https://reviews.llvm.org/D100779
This is currently built on top of the SelectionDAG call lowering, but
does not use it the same way. SelectionDAG passes legalized types to
the assignment functions, and the tablegenerated assignment functions
may change the value types expected for registers. This does not
change the types used, just moves the register creation to help fix
this in the future.
Defer the register creation until after all of the assignment
decisions have been made. This will also help have correct tail call
compatibility checking in a future change. Currently it does not work
as expected for any arguments split across multiple registers.
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.
AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.
Differential Revision: https://reviews.llvm.org/D100770
apple-m1 has the same level of ISA support as apple-a14,
so this is a straightforward mechanical change. However, that
also means this inherits apple-a14's v8.5a+nobti quirkiness.
rdar://68287159
The generic SoftFloatVectorExtract.ll test was failing when run on arm
machines, as it tries to create a f64 under soft float. Limit the
transform to when f64 is legal.
Also add a missing override, as reported in D100244.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D100495
Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types,
and lower them to the appropriate SVE instructions.
Additionally now that the MULH nodes are legal, integer divides can be
expanded into a more performant code sequence.
Differential Revision: https://reviews.llvm.org/D100487
This adds a combine for extract(x, n); extract(x, n+1) ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a single instruction, and thanks to the other VMOVRRD folds
we have added recently can help reduce the amount of executed
instructions. Floating point types are very similar, but will include a
bitcast to an integer type.
This also adds a shouldRewriteCopySrc, to prevent copy propagation from
DPR to SPR, which can break as not all DPR regs can be extracted from
directly. Otherwise the machine verifier is unhappy.
Differential Revision: https://reviews.llvm.org/D100244
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D100495