As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.
llvm-svn: 363048
This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:
llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul
and adds functionality to auto-upgrade existing LLVM IR and bitcode.
Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D60261
llvm-svn: 363035
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.
This re-enables it for FastISel with the corresponding fix.
This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.
FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.
This fix moves the replacement of registers before the emission of
copies for the live-ins.
The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).
A few tests are affected by this:
* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed
Differential Revision: https://reviews.llvm.org/D62361
llvm-svn: 362963
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D62897
llvm-svn: 362921
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.
No test changes as it's NFC.
Differential Revision: https://reviews.llvm.org/D62745
llvm-svn: 362857
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
llvm-svn: 362735
This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:
1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.
2 - The merged load\store node must retain the non-temporal flag.
Differential Revision: https://reviews.llvm.org/D62910
llvm-svn: 362723
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.
The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.
llvm-svn: 362642
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.
Reviewers: sunfish, stephenwlin, niravd, arsenm
Subscribers: wdng, hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59917
llvm-svn: 362546
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.
Fixes PR42118
Differential Revision: https://reviews.llvm.org/D62828
llvm-svn: 362533
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.
As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.
This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/vMd3
Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma
Reviewed By: RKSimon
Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62774
llvm-svn: 362488
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.
The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.
It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.
Differential Revision: https://reviews.llvm.org/D62777
llvm-svn: 362486
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D61843
llvm-svn: 362472
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.
Reviewers: qcolombet, spatel
Reviewed By: qcolombet
Subscribers: nemanjai, jsji
Differential Revision: https://reviews.llvm.org/D62552
llvm-svn: 362439
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction
Differential Revision: https://reviews.llvm.org/D62807
llvm-svn: 362397
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.
Reviewers: niravd, spatel, craig.topper, rupprecht
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D62633
llvm-svn: 362350
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.
Differential Revision: https://reviews.llvm.org/D59188
llvm-svn: 362324
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.
PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.
Differential Revision: https://reviews.llvm.org/D62783
llvm-svn: 362323
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.
I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.
llvm-svn: 362315
Just copy all of the operands except the chain and call MorphNode on that.
This removes the IsUnary and IsTernary flags.
Also always get the result type from the result type of the original
nodes. Previously we got it from the operand except for two nodes
where that didn't work.
llvm-svn: 362269
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: https://reviews.llvm.org/D62546
llvm-svn: 362241
I don't have a test case for these, but there is a test case for D62266
where, even after all the constant-folding patches, we still end up
with endless combine loop. Which makes sense, since we don't constant
fold for opaque constants.
llvm-svn: 362156
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.
No surprising test changes.
https://rise4fun.com/Alive/pbT
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62257
llvm-svn: 362146
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 362145
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 362144
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 362143
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 362142
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62664
llvm-svn: 362133
Summary:
No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62663
llvm-svn: 362132
Summary:
This was the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, spatel, craig.topper, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62662
llvm-svn: 362131
Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62661
llvm-svn: 362130
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362128
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: http://reviews.llvm.org/D62546
llvm-svn: 362112
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.
Some cases may highlight missing constant-folds.
Reverts r361871,r361872,r361873,r361874.
llvm-svn: 362109
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362012
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D62017
llvm-svn: 361875
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 361874
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 361873
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 361872
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 361871
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 361856
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 361855
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.
No surprising test changes.
https://rise4fun.com/Alive/pbT
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62257
llvm-svn: 361854
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 361853
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 361852
Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings.
llvm-svn: 361821
This is derived from the related fold for build vectors.
We also have a version of this in DAGCombiner. The benefit of
having this fold at node creation time is (1) efficiency and
(2) preventing infinite looping from creating patterns that
should not exist in the first place.
Currently, the inf-loop could happen with MergeConsecutiveStores()
because it naively creates concat of extracts when forming a wider
vector store. That could fight with target-specific store narrowing.
llvm-svn: 361780
There's a possible missing fold here for extracting from the
same source vector. It's similar to a check that we use to
squash a build vector with all extracted elements from the
same source vector.
llvm-svn: 361778
Summary:
- The current implementation simplifies the case where the source of
`copyto` is `implicit-def`ed. However, it only works when that
`implicit-def` is single-used since it detects that from
`implicit-def` and cannot determine which destination vreg should be
used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
that `copyto`'s source is defined from `implicit-def`, it simplifies
it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
code generation. However, it helps to detect 'implicit-def` in a
straight-forward manner in some passes, such as `si-i1-copies`. A test
case is added.
Reviewers: sunfish, nhaehnle
Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62342
llvm-svn: 361777
The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to.
llvm-svn: 361773
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.
llvm-svn: 361741
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.
llvm-svn: 361696
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Differential Revision: https://reviews.llvm.org/D61887
llvm-svn: 361620
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.
llvm-svn: 361547
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D55720
llvm-svn: 361289
DAGCombiner simplifies this more liberally as:
// If inserting an UNDEF, just return the original vector.
if (N1.isUndef())
return N0;
So there's no way to make this visible in output AFAIK, but
doing this at node creation time should be slightly more efficient.
llvm-svn: 361287
getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS():
// Concat of UNDEFs is UNDEF.
if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); }))
return DAG.getUNDEF(VT);
llvm-svn: 361284
There are no FP callers of DAGCombiner::reassociateOps() currently,
but we can add a fast-math check to make sure this API is not being
misused.
This was noted as a potential risk (and that risk might increase) with:
D62191
llvm-svn: 361268
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.
When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.
This approach was recommended by Eli Friedman.
Originally reported in https://github.com/avr-rust/rust/issues/129.
Patch by Carl Peto.
Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma
Reviewed By: efriedma
Subscribers: JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62003
llvm-svn: 361222
Since INLINEASM_BR is a terminator we need to flush the pending exports before
emitting it. If we don't do this, a TokenFactor can be inserted between it and
the BR instruction emitted to finish the callbr lowering.
It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit
the TokenFactor before that.
Differential Revision: https://reviews.llvm.org/D59981
llvm-svn: 361177
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.
Differential Revision: https://reviews.llvm.org/D62026
llvm-svn: 361169
This changes the isShift variable to include the constant operand
check that was previously in the if statement.
While there fix an 80 column violation and an unnecessary use of
getNode. Also fix variable name capitalization.
llvm-svn: 361168
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.
Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.
There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.
Differential Revision: https://reviews.llvm.org/D61692
llvm-svn: 361166