As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR
Differential Revision: https://reviews.llvm.org/D61463
llvm-svn: 359836
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.
There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.
Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.
Differential Revision: https://reviews.llvm.org/D61331
llvm-svn: 359791
In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG
builder code so that pointers are allowed to have a larger type when "live" in
the DAG compared to memory.
Pointers get zero-extended whenever they are loaded, and truncated prior to
stores. In addition, a few not quite so obvious locations need updating:
* A GEP that has not been marked inbounds needs to enforce the IR-documented
2s-complement wrapping at the memory pointer size. Inbounds GEPs are
undefined if they overflow the address space, so no additional operations
are needed.
* Signed comparisons would give incorrect results if performed on the
zero-extended values.
This shouldn't affect CodeGen for now, but will become active when the AArch64
ILP32 support is committed.
llvm-svn: 359676
We don't have this restriction in IR, so it should not be here
either simply out of consistency. Code that wants to handle FP
exceptions is expected to use the 'strict' variants of these
nodes.
We don't get the frem case because frem by 0.0 produces NaN (invalid),
and that's the remaining check here (so the removed check for frem
was dead code AFAIK).
This is the only place in SDAG that uses "HasFPExceptions", so I
think we should remove that entirely as a follow-up patch.
llvm-svn: 359566
This was a local static funtion in SelectionDAG, which I've promoted to
TargetLowering so that I can reuse it to estimate the cost of a memory
operation in D59787.
Differential Revision: https://reviews.llvm.org/D59766
llvm-svn: 359543
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.
This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.
Differential Revision: https://reviews.llvm.org/D59785
llvm-svn: 359537
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.
There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().
llvm-svn: 358930
These are general queries, so they should not die when given
a degenerate input like an all undef mask. Callers should be
able to deal with an op that will eventually be simplified away.
llvm-svn: 358761
The arm64_32 ABI specifies that pointers (despite being 32-bits) should be
zero-extended to 64-bits when passed in registers for efficiency reasons. This
means that the SelectionDAG needs to be able to tell the backend that an
argument was originally a pointer, which is implmented here.
Additionally, some memory intrinsics need to be declared as taking an i8*
instead of an iPTR.
There should be no CodeGen change yet, but it will be triggered when AArch64
backend support for ILP32 is added.
llvm-svn: 358398
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.
This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.
Reviewers: spatel, RKSimon, nikic, lebedev.ri
Reviewed By: nikic
Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60460
llvm-svn: 358372
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.
I will try to follow this up with some better tests.
llvm-svn: 358113
Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).
This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........
Differential Revision: https://reviews.llvm.org/D60006
llvm-svn: 357765
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.
This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.
Also add a missing truncation on X86, found by testing of this patch.
Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa
Reviewers: bogner, craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59535
llvm-svn: 357745
Summary:
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations without fully pruning unused result values. This results
in nodes that are never added to the worklist and therefore can not be
pruned.
Add a node inserter for the combiner to make sure such nodes have the
chance of being pruned. This allows a number of additional peephole
optimizations.
Reviewers: efriedma, RKSimon, craig.topper, jyknight
Reviewed By: jyknight
Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58068
llvm-svn: 357279
We have the folds for fadd/fsub/fmul already in DAGCombiner,
so it may be possible to remove that code if we can guarantee that
these ops are zapped before they can exist.
llvm-svn: 357029
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations or not fully pruning unused result values. This can
result in nodes that are never added to the worklist and therefore can
not be pruned.
Add a node inserter as the current node deleter to make sure such
nodes have the chance of being pruned.
Many minor changes, mostly positive.
llvm-svn: 356996
First half of PR40800, this patch adds DAG undef handling to icmp instructions to match the behaviour in llvm::ConstantFoldCompareInstruction and SimplifyICmpInst, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).
This involved a lot of tweaking to reduced tests as bugpoint loves to reduce icmp arguments to undef........
Differential Revision: https://reviews.llvm.org/D59363
llvm-svn: 356938
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
* Exclude non-legal non-power-of-2 vector types from ComputeRegisterProp
mechanism in TargetLoweringBase::getTypeConversion.
* Cope with SETCC and VSELECT for odd-width i1 vector when the other
vectors are legal type.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58899
Change-Id: Ib5f23377dbef511be3a936211a0b9f94e46331f8
llvm-svn: 356350
First step towards PR40800 - I intend to move the float case in a separate future patch.
I had to tweak the (overly reduced) thumb2 test and the x86 widening test change is annoying (no longer rematerializable) but we should address this separately.
Differential Revision: https://reviews.llvm.org/D59244
llvm-svn: 356040
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.
This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58650
llvm-svn: 355099
These helpers extend the existing isConstOrConstSplat helper checks to support DemandedElts masks as well.
We already had a local version of this in SelectionDAG that computeKnownBits/ComputeNumSignBits made use of, but this adds the functionality directly to the BuildVectorSDNode node and extends isConstOrConstSplat etc. to use that.
This will allow us to reuse the functionality in SimplifyDemandedVectorElts/SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D58503
llvm-svn: 354797
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.
We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.
Reviewers: spatel, RKSimon, nikic
Reviewed By: nikic
Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58567
llvm-svn: 354753
If the LHS has known zeros, then the RHS immediate mask might have been simplified to remove those bits.
This patch adds a call to computeKnownBits to get the known zeroes to handle that possibility. I left an early out to skip the call if all of the demanded bits are set in the mask.
Differential Revision: https://reviews.llvm.org/D58464
llvm-svn: 354514
Second part of https://bugs.llvm.org/show_bug.cgi?id=40442.
This adds an extra UnrollVectorOverflowOp() method to SDAG, because
the general UnrollOverflowOp() method can't deal with multiple results.
Additionally we need to expand UMULO/SMULO during vector op
legalization, as it may result in unrolling, which may need additional
type legalization.
Differential Revision: https://reviews.llvm.org/D57997
llvm-svn: 354513
If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits.
This can prevent GetDemandedBits from recognizing that the AND is unnecessary.
llvm-svn: 354498
Summary:
A store to an object whose lifetime is about to end can be removed.
See PR40550 for motivation.
Reviewers: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57541
llvm-svn: 354244
For D57601, we need to know whether the instruction is volatile. We'd either have to pass yet another parameter, or just standardize on the MMO interface. I chose the second.
llvm-svn: 353989
The helper function was used by only two callers, and largely ended up providing distinct functionality based on optional arguments and opcode. Inline and simply to make the functionality much more clear.
llvm-svn: 353977
The version of FoldConstantArithmetic() that takes arbitrary nodes
was confusingly naming those nodes as constants when they might
not be; also "Cst" reads like "Cast".
llvm-svn: 352884
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This functionality is required at multiple places which potentially
create large operand lists, like SelectionDAGBuilder or DAGCombiner.
Differential Revision: https://reviews.llvm.org/D56739
llvm-svn: 351552
Summary:
Use this helper to make sure we use the same value at various places.
This will likely be needed at more places were we currently crash
because we use more operands than possible.
Also makes it easier to change in the future.
Reviewers: RKSimon, craig.topper, efriedma, aemerson
Reviewed By: RKSimon
Subscribers: hiraditya, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D56859
llvm-svn: 351537
The value returned by max() is the last valid value, adjust the
comparison accordingly.
The code added in D55073 creates TokenFactors with max() operands.
Reviewers: aemerson, efriedma, RKSimon, craig.topper
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D56738
llvm-svn: 351318
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.
My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.
Differential Revision: https://reviews.llvm.org/D56283
llvm-svn: 350432
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
This is an alternative to what I attempted in D56057.
GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.
I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.
Based on a patch that Simon Pilgrim sent me in email.
Fixes PR40142.
llvm-svn: 350059
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349628
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.
This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.
I've updated SelectionDAG::simplifyShift to demonstrate its use.
Differential Revision: https://reviews.llvm.org/D55819
llvm-svn: 349616
The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them.
Similarly for the assert above it.
I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868
llvm-svn: 349390
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).
llvm-svn: 349298
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 349016
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.
It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.
A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).
I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.
Differential Revision: https://reviews.llvm.org/D55426
llvm-svn: 348953
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 348843
This is a fix for PR39896, where dbg.value's of SDNodes that have been
optimised out do not lead to "DBG_VALUE undef" instructions being created.
Such undef instructions are necessary to terminate earlier variable
ranges, otherwise variable values leak past the point where they're valid.
The "invalidated" flag of SDDbgValue is currently being abused to mean two
things:
* The corresponding SDNode is now invalid
* This SDDbgValue should not be emitted
Of which there are several legitimate combinations of meaning:
* The SDNode has been invalidated and we should emit "DBG_VALUE undef"
* The SDNode has been invalidated but the debug data was salvaged, don't
emit anything for this SDDbgValue
* This SDDbgValue has been emitted
This patch introduces distinct "Emitted" and "Invalidated" fields to the
SDDbgValue class, updates users accordingly, and generates "undef"
DBG_VALUEs for invalidated records. Awkwardly, there are circumstances
where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll
which I've preserved.
Differential Revision: https://reviews.llvm.org/D55372
llvm-svn: 348751
These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway.
llvm-svn: 348670
There's a 64k limit on the number of SDNode operands, and some very large
functions with 64k or more loads can cause crashes due to this limit being hit
when a TokenFactor with this many operands is created. To fix this, create
sub-tokenfactors if we've exceeded the limit.
No test case as it requires a very large function.
rdar://45196621
Differential Revision: https://reviews.llvm.org/D55073
llvm-svn: 348324
This makes the SDAG behavior consistent with the way we do this in IR.
It's possible that we were getting the wrong answer before. For example,
'xor undef, undef --> 0' but 'xor undef, C' --> undef.
But the most practical improvement is likely as shown in the tests here -
for FP, we were overconstraining undef lanes to NaN, and that can prevent
vector simplifications/narrowing (see D51553).
llvm-svn: 348090
rL347502 moved the null sibling, so we should group all of these
together. I'm not sure why these aren't methods of the SDValue
class itself, but that's another patch if that's possible.
llvm-svn: 347523
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.
The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.
llvm-svn: 347502
Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code,
but copying the folds seems to be the standard method to ensure
that we don't miss these folds.
Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no
way to ensure that we do these kinds of simplifications unless the
code is repeated at node creation time and during combines.
There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167
I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.
This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023
llvm-svn: 347170
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.
This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.
X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.
I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.
The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.
Differential Revision: https://reviews.llvm.org/D54346
llvm-svn: 346784
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.
The rest of the patch is just changing all callers to use getNode directly.
llvm-svn: 346087
Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy.
This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle.
Differential Revision: https://reviews.llvm.org/D53760
llvm-svn: 345578
The DAGTypeLegalizer::getSETCCWidenedResultTy was widening the MaskVT, but the code in convertMask called after getSETCCWidenedResultTy had no idea this widening had occurred. So none of the operands were widened when convertMask created new setccs with the widened VT.
This patch removes the widening and adds some asserts to getNode to validate the types of setccs to prevent issues like this in the future.
Differential Revision: https://reviews.llvm.org/D53743
llvm-svn: 345428
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
llvm-svn: 345218
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...
We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.
- Ideally this would be handled by the ConstantHoist pass but it runs
too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
however SelectionDAG constantfolding would constantfold the
"trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
stop DAGCombiner from constant folding in this situation. (Similar to
how ConstantHoisting marks things as Opaque to avoid folding
ADD/SUB/etc.)
Differential Revision: https://reviews.llvm.org/D53181
llvm-svn: 345102
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.
There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.
llvm-svn: 344914
And use that to transform fsub with zero constant operands.
The integer part isn't used yet, but it is proposed for use in
D44548, so adding both enhancements here makes that
patch simpler.
llvm-svn: 343865
VerifyDAGDiverence costs compilation time, avoid running it in non-debug
builds.
Differential Revision: https://reviews.llvm.org/D52454
llvm-svn: 343086
This is the final (I hope!) problem pattern mentioned in PR37749:
https://bugs.llvm.org/show_bug.cgi?id=37749
We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops.
We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like
extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches
that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op.
The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test,
we have this vector-type-legalized sequence:
t29: v8i32 = concat_vectors t27, t28
t30: v4i64 = bitcast t29
t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ...
t31: v4i64 = bitcast t18
t32: v4i64 = xor t30, t31
t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ...
t34: v4i64 = bitcast t9
t35: v4i64 = and t32, t34
t36: v8i32 = bitcast t35
t37: v4i32 = extract_subvector t36, Constant:i64<0>
t38: v4i32 = extract_subvector t36, Constant:i64<4>
Differential Revision: https://reviews.llvm.org/D52318
llvm-svn: 343008
x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant.
Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code.
No functional change intended.
I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the
helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be
there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps.
Differential Revision: https://reviews.llvm.org/D52285
llvm-svn: 342669
The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits,
and the test diff in add.ll is from a DAGCombiner transform.
llvm-svn: 342594
This patch fixes the debug info handling for SelectionDAG legalization
of DAG nodes with two results. When an replaced SDNode has more than
one result, transferDbgValues was always copying the SDDbgValue from
the first result and attaching them to all members. In reality
SelectionDAG::ReplaceAllUsesWith() is given an array of SDNodes
(though the type signature doesn't make this obvious (cf. the call
site code in ReplaceNode()).
rdar://problem/44162227
Differential Revision: https://reviews.llvm.org/D52112
llvm-svn: 342264
Summary:
This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).
The purpose of this patch is to free up the name DivergenceAnalysis for the new generic
implementation. The generic implementation class will be shared by specialized
divergence analysis classes.
Patch by: Simon Moll
Reviewed By: nhaehnle
Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50434
Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb
llvm-svn: 341071
Summary: This is split out from D41062 to cover the code in LegalVectorTypes.cpp
Reviewers: RKSimon, spatel, efriedma
Reviewed By: efriedma
Subscribers: sdardis, jvesely, nhaehnle, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D51337
llvm-svn: 340891
Summary:
Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions.
This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly.
X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree.
Reviewers: RKSimon, delena, hfinkel, eli.friedman
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50402
llvm-svn: 340689
Having the KnownBits as an output parameter is kind of awkward to use
and a holdover from when it was two separate APInts. Instead, just
return a KnownBits object.
I'm leaving the existing interface in place for now, since updating
the callers all at once would be thousands of lines of diff.
llvm-svn: 340594
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.
Handle the case where the sign bit extends to only part of the small elements.
llvm-svn: 340169
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.
The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements.
llvm-svn: 340143
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.
Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.
This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.
This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.
As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]
Differential Revision: https://reviews.llvm.org/D50680
llvm-svn: 339740
Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR
when zero extending the demanded elements mask if it is already as long as the
source vector.
Differential Revision: https://reviews.llvm.org/D49574
llvm-svn: 339600
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.
Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.
llvm-svn: 338910
There is nothing x86-specific about this code, so it'd be nice to make this available for other targets to use in the future (and get it out of X86ISelLowering!).
Differential Revision: https://reviews.llvm.org/D50083
llvm-svn: 338586
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.
The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).
This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.
One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.
Testing:
- check-llvm, and an end-to-end test using lldb to debug an optimized
program.
- Existing unit tests for DIExpression::appendToStack fully cover the
new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)
Differential Revision: https://reviews.llvm.org/D49454
llvm-svn: 338069
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).
llvm-svn: 336779
This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.
llvm-svn: 336576
Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.
This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.
Differential Revision: https://reviews.llvm.org/D48969
llvm-svn: 336492
This removes debug locations from ConstantSDNode and ConstantSDFPNode.
When this kind of node is materialized we no longer create a line table
entry which jumps back to the constant's first point of use. This makes
single-stepping behavior smoother, and it matches the model used by IR,
where Constants have no locations. See this thread for more context:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html
I'd like to handle constant BuildVectorSDNodes and to try to eliminate
passing SDLocs to SelectionDAG::getConstant*() in follow-up commits.
Differential Revision: https://reviews.llvm.org/D48468
llvm-svn: 335497
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47909
llvm-svn: 334996
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.
Reviewers: spatel, arsenm, hfinkel, javed.absar
Reviewed By: spatel
Subscribers: nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47388
llvm-svn: 334242
Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
Reviewers: spatel, hfinkel
Reviewed By: spatel
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D47389
llvm-svn: 334037
This is the FP sibling of D43141 with the corresponding IR change in rL327212.
We can't propagate undef here because if a variable operand is a NaN, these
binops must propagate NaN. Neither global nor node-level fast-math makes a
difference. If we have 'nnan', I think later folds can turn the NaN into undef.
The tests in X86/fp-undef.ll are meant to be the definitive verification for
these folds - everything reduces identically now.
The other test changes are collateral damage. They may need to be altered to
preserve their intent.
Differential Revision: https://reviews.llvm.org/D47026
llvm-svn: 332920
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.
Differential Revision https://reviews.llvm.org/D46477
llvm-svn: 332482
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
In order to convert LLVM IR to MachineInstr, we need a new TargetOpcode,
DBG_LABEL, to ‘lower’ intrinsic llvm.dbg.label. The patch
creates this new TargetOpcode and convert intrinsic llvm.dbg.label to
MachineInstr through SelectionDAG.
In SelectionDAG, debug information is stored in SDDbgInfo. We create a
new data member of SDDbgInfo for labels and use the new data member,
SDDbgLabel, to create DBG_LABEL MachineInstr.
The new DBG_LABEL MachineInstr uses label metadata from LLVM IR as its
parameter. So, the backend could get metadata information of labels from
DBG_LABEL MachineInstr.
Differential Revision: https://reviews.llvm.org/D45341
Patch by Hsiangkai Wang.
llvm-svn: 331842
Summary:
getNode optimizes (ext (trunc x)) to x and the dbgvalue node on trunc is lost. The fix calls transferDbgValues to add the dbgvalue to x.
Add DebugInfo/AArch64/dbg-value-i16.ll
Patch by Sejong Oh!
Reviewers: aprantl, javed.absar, llvm-commits, vsk
Reviewed By: aprantl, vsk
Subscribers: kristof.beyls, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46348
llvm-svn: 331665
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
Summary:
When building the selection DAG at ISel all PHI nodes are
selected and lowered to Machine Instruction PHI nodes before
we start to create any SDNodes. So there are no SDNodes for
values produced by the PHI nodes.
In the past when selecting a dbg.value intrinsic that uses
the value produced by a PHI node we have been handling such
dbg.value intrinsics as "dangling debug info". I.e. we have
not created a SDDbgValue node directly, because there is
no existing SDNode for the PHI result, instead we deferred
the creationg of a SDDbgValue until we found the first use
of the PHI result.
The old solution had a couple of flaws. The position of the
selected DBG_VALUE instruction would end up quite late in a
basic block, and for example not directly after the PHI node
as in the LLVM IR input. And in case there were no use at all
in the basic block the dbg.value could be dropped completely.
This patch introduces a new VREG kind of SDDbgValue nodes.
It is similar to a SDNODE kind of node, but it refers directly
to a virtual register and not a SDNode. When we do selection
for a dbg.value that is using the result of a PHI node we
can do a lookup of the virtual register directly (as it already
is determined for the PHI node) and create a SDDbgValue node
immediately instead of delaying the selection until we find a
use.
This should fix a problem with losing debug info at ISel
as seen in PR37234 (https://bugs.llvm.org/show_bug.cgi?id=37234).
It does not resolve PR37234 completely, because the debug info
is dropped later on in the BranchFolder (see D46184).
Reviewers: #debug-info, aprantl
Reviewed By: #debug-info, aprantl
Subscribers: rnk, gbedwell, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46129
llvm-svn: 331182
Summary:
This just refactors the lowering of the atomic memory intrinsics to more
closely match the code patterns used in the lowering of the non-atomic
memory intrinsics. Specifically, we encapsulate the lowering in
SelectionDAG::getAtomicMem*() functions rather than embedding
the code directly in the SelectionDAGBuilder code.
llvm-svn: 330603
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: bogner, rnk, MatzeB, RKSimon
Reviewed By: rnk
Subscribers: JDevlieghere, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45133
llvm-svn: 329435
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.
The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.
Differential Revision: https://reviews.llvm.org/D45017
llvm-svn: 328806
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)
llvm-svn: 328395
The BITCAST handling in computeKnownBits() previously only worked for little
endian.
This patch reverses the iteration over elements for a big endian target which
allows this to work in this case also.
SystemZ test case.
Review: Eli Friedman
https://reviews.llvm.org/D44249
llvm-svn: 327764
The code to match and produce more x86 vector blends was enabled for all
architectures even though the transform may pessimize the code for other
architectures that do not provide a vector blend instruction.
Added an aarch64 testcase to check that a VZIP instruction is generated instead
of byte movs.
Differential Revision: https://reviews.llvm.org/D44118
llvm-svn: 327132
This allows us to improve vector constant matching in more DAG code (backends, TargetLowering etc.).
Differential Revision: https://reviews.llvm.org/D43466
llvm-svn: 325815
If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits (zeros/ones) will be at least the minimum of the LO and HI constants.
ComputeKnownBits equivalent of D43338.
Differential Revision: https://reviews.llvm.org/D43463
llvm-svn: 325521
If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits will be at least the minimum of the LO and HI constants.
I haven't bothered with the UMIN/UMAX equivalent as (1) we don't have any current use cases and (2) I wonder if we'd be better off immediately falling back for ComputeKnownBits for UMIN/UMAX which already has optimization patterns useful for unsigned cases.
Differential Revision: https://reviews.llvm.org/D43338
llvm-svn: 325450
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.
Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html
I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.
Differential Revision: https://reviews.llvm.org/D42123
llvm-svn: 325102
The bug has been lying dormant, but apparently was never exposed, until
after rL324941 because we didn't return the correct result
for shifts with undef operands.
llvm-svn: 325010
This started by noticing that scalar and vector types were producing different results with div ops in PR36305:
https://bugs.llvm.org/show_bug.cgi?id=36305
...but the problem is bigger. I couldn't keep it straight without a table, so I'm attaching that as a PDF to
the review. The x86 tests in undef-ops.ll correspond to that table.
Green means that instsimplify and the DAG agree on the result for all types.
Red means the DAG was returning undef when IR was not.
Yellow means the DAG was returning a non-undef result when IR returned undef.
This patch assumes that we're currently doing the right thing in IR.
Note: I couldn't find any problems with lowering vector constants as the code comments were warning,
but those comments were written long ago in rL36413 .
Differential Revision: https://reviews.llvm.org/D43141
llvm-svn: 324941
Armv8.1-A added an atomic load-clear instruction (which performs bitwise
and with the complement of it's operand), but not a load-and
instruction. Our current code-generation for atomic load-and always
inserts an MVN instruction to invert its argument, even if it could be
folded into a constant or another instruction.
This adds lowering early in selection DAG to convert a load-and
operation into an xor with -1 and a load-clear, allowing the normal DAG
optimisations to work on it.
To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't
see any easy way to do this with an AArch64-specific ISD node, because
the code-generation for atomic operations assumes the SDNodes are of
type AtomicSDNode.
I've left the old tablegen patterns in because they are still needed for
global isel.
Differential revision: https://reviews.llvm.org/D42478
llvm-svn: 324908
Many in SimplifySetCC and FoldSetCC try to create true or false constants. Some of them query getBooleanContents to figure out whether to use all ones or just 1 for true. But many places do not check and just use 1 without ensuring the VT has an i1 scalar type. Note sure if those places only trigger before type legalization so they only see an i1
type?
To cleanup the inconsistency and reduce some duplicated code, this patch adds a getBoolConstant method to SelectionDAG that takes are of querying getBooleanContents and doing the right thing.
Differential Revision: https://reviews.llvm.org/D43037
llvm-svn: 324634
Better to assume that any value type may be commuted, not just MVTs.
No test case right now, but discovered while investigating possible shuffle combines.
llvm-svn: 324179
When getNode() is called to create an EXTRACT_VECTOR_ELT, assert that
the result VT is at least as wide as the vector element type.
Review: Eli Friedman
llvm-svn: 324061
The second return value of ATOMIC_CMP_SWAP_WITH_SUCCESS is known to be a
boolean, and should therefore be treated by computeKnownBits just like
the second return values of SMULO / UMULO.
Differential Revision: https://reviews.llvm.org/D42067
llvm-svn: 322985
Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend.
Most of this patch is just making sure we copy the scale around everywhere.
Differential Revision: https://reviews.llvm.org/D40055
llvm-svn: 322210
Ingredients in this patch:
1. Add HANDLE_LIBCALL defs for finite mathlib functions that correspond to LLVM intrinsics.
2. Plumbing to send TargetLibraryInfo down to SelectionDAGLegalize.
3. Relaxed math and library checking in SelectionDAGLegalize::ConvertNodeToLibcall() to choose finite libcalls.
There was a bug about determining the availability of the finite calls that should be fixed with:
rL322010
Not in this patch:
This doesn't resolve the question/bug of clang creating the intrinsic IR in the first place.
There's likely follow-up work needed to support the long double variants better.
There's room for improvement to reduce the code duplication.
Create finite calls that don't originate from a corresponding intrinsic or DAG node?
Differential Revision: https://reviews.llvm.org/D41338
llvm-svn: 322087
Handle this in DAGCombiner::visitEXTRACT_VECTOR_ELT the same as we already do in SelectionDAG::getNode and use APInt instead of getZExtValue.
This should also fix oss-fuzz #4910
llvm-svn: 321767
Summary:
I have been getting rather difficult to reproduce SIGBUS crashes when
compiling certain FreeBSD sources, and their stack traces pointed
squarely at `SelectionDAG::salvageDebugInfo()`:
```
Core was generated by `/usr/obj/share/dim/src/freebsd/clang600-import/amd64.amd64/tmp/usr/bin/cc -cc1 -'.
Program terminated with signal SIGBUS, Bus error.
#0 isInvalidated () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SDNodeDbgValue.h:115
115 bool isInvalidated() const { return Invalid; }
(gdb) bt
#0 isInvalidated () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SDNodeDbgValue.h:115
#1 salvageDebugInfo () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7116
#2 0x00000000033b2516 in operator() () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3595
#3 __invoke<(lambda at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3593:59) &, llvm::SDNode *, llvm::SDNode *> () at /usr/include/c++/v1/type_traits:4323
#4 __call<(lambda at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3593:59) &, llvm::SDNode *, llvm::SDNode *> () at /usr/include/c++/v1/__functional_base:349
#5 operator() () at /usr/include/c++/v1/functional:1562
#6 0x00000000033b0817 in operator() () at /usr/include/c++/v1/functional:1916
#7 NodeDeleted () at /share/dim/src/freebsd/clang600-import/contrib/llvm/include/llvm/CodeGen/SelectionDAG.h:293
#8 0x0000000003529dde in RemoveDeadNodes () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:610
#9 0x00000000035556df in MorphNodeTo () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:6794
#10 0x00000000033a9acc in MorphNode () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:2594
#11 0x00000000033ac80b in SelectCodeCommon () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3601
#12 0x00000000023d464b in SelectCode () at /usr/obj/share/dim/src/freebsd/clang600-import/amd64.amd64/tmp/obj-tools/lib/clang/libllvm/X86GenDAGISel.inc:282902
#13 Select () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:3072
#14 0x00000000033a5afa in DoInstructionSelection () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:988
#15 0x00000000033a4e1a in CodeGenAndEmitDAG () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:868
#16 0x00000000033a2643 in SelectAllBasicBlocks () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1624
#17 0x000000000339f158 in runOnMachineFunction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:466
#18 0x00000000023d03c4 in runOnMachineFunction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:175
#19 0x00000000035cc8c2 in runOnFunction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/MachineFunctionPass.cpp:62
#20 0x00000000030dca9a in runOnFunction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/IR/LegacyPassManager.cpp:1520
#21 0x00000000030dccf3 in runOnModule () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/IR/LegacyPassManager.cpp:1541
#22 0x00000000030dd228 in runOnModule () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/IR/LegacyPassManager.cpp:1597
#23 run () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/IR/LegacyPassManager.cpp:1700
#24 0x00000000014db578 in EmitAssembly () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:815
#25 EmitBackendOutput () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:1181
#26 0x00000000014d5b26 in HandleTranslationUnit () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:292
#27 0x0000000001c4c332 in ParseAST () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/Parse/ParseAST.cpp:159
#28 0x00000000015d546c in Execute () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:897
#29 0x0000000001cec311 in ExecuteAction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/Frontend/CompilerInstance.cpp:991
#30 0x00000000014b4f81 in ExecuteCompilerInvocation () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:252
#31 0x00000000014aa73f in cc1_main () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/tools/driver/cc1_main.cpp:221
#32 0x00000000014b2928 in ExecuteCC1Tool () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/tools/driver/driver.cpp:309
#33 main () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/tools/driver/driver.cpp:388
(gdb) frame 1
#1 salvageDebugInfo () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7116
7116 if (DV->isInvalidated())
(gdb) disassemble
Dump of assembler code for function salvageDebugInfo():
[...]
0x0000000003557348 <+744>: nopl 0x0(%rax,%rax,1)
0x0000000003557350 <+752>: mov (%r12),%r13
=> 0x0000000003557354 <+756>: cmpb $0x0,0x31(%r13)
0x0000000003557359 <+761>: jne 0x35573b0 <salvageDebugInfo()+848>
(gdb) info registers
[...]
r13 0x5a5a5a5a5a5a5a5a 6510615555426900570
```
The `0x5a5a5a5a5a5a5a5a` value in `r13` indicates the memory was either
uninitialized, or already freed.
Unfortunately I do not have a simple self-contained test case for this.
However, it seems pretty clear that the call to `AddDbgValue()` in
`salvageDebugInfo()` causes the problems, since it modifies
`SelectionDag::DbgInfo` while looping through one of its DenseMaps:
```
void SelectionDAG::salvageDebugInfo(SDNode &N) {
[...]
for (auto DV : GetDbgValues(&N)) {
if (DV->isInvalidated())
continue;
[...]
AddDbgValue(Clone, N0.getNode(), false);
[...]
}
}
```
At least, if I comment out the `AddDbgValue()` call, the crashes go
away. I propose to change this function slightly, similar to the
`SelectionDAG::transferDbgValues()` function just above it, to save the
cloned SDDbgValues in a separate SmallVector, and only call
AddDbgValue() on them after the for loop is done.
Reviewers: aprantl, bogner, bkramer, davide
Reviewed By: davide
Subscribers: davide, krytarowski, JDevlieghere, emaste, llvm-commits
Differential Revision: https://reviews.llvm.org/D41589
llvm-svn: 321545
This makes it work better with some build_vector and concat_vectors creations.
Adjust the NewSDValueDbgMsg in getConstant to avoid duplicating the print when it calls getSplatBuildVector since getSplatBuildVector didn't trigger a print before.
llvm-svn: 320783
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.
On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.
llvm-svn: 320746
If we have a non-splat constant shift amount, the minimum shift amount can be used to infer the number of zero upper bits of the result. There's probably a lot more that we can do here, but this
fixes a case where I wanted to infer the sign bit as zero when all the shift amounts are non-zero.
llvm-svn: 319639
Two issues found when doing codegen for splitting vector with non-zero alloca addr space:
DAGTypeLegalizer::SplitVecRes_INSERT_VECTOR_ELT/SplitVecOp_EXTRACT_VECTOR_ELT uses dummy pointer info for creating
SDStore. Since one pointer operand contains multiply and add, InferPointerInfo is unable to
infer the correct pointer info, which ends up with a dummy pointer info for the target to lower
store and results in isel failure. The fix is to introduce MachinePointerInfo::getUnknownStack to
represent MachinePointerInfo which is known in alloca address space but without other information.
TargetLowering::getVectorElementPointer uses value type of pointer in addr space 0 for
multiplication of index and then add it to the pointer. However the pointer may be in an addr
space which has different size than addr space 0. The fix is to use the pointer value type for
index multiplication.
Differential Revision: https://reviews.llvm.org/D39758
llvm-svn: 319622
TransferDbgValues (capital 'T') is wired into ReplaceAllUsesWith, and
transferDbgValues (lowercase 't') is used elsewhere (e.g in Legalize).
Both functions should be doing the exact same thing. This patch
consolidates the logic into one place.
This was reverted in r318455 because some newly introduced asserts,
which I thought were NFC, were firing. I filed PR35338. For now I've
weakened the asserts.
Testing: check-llvm, check-clang, and a stage2 Rel+Deb build of clang
Differential Revision: https://reviews.llvm.org/D40104
llvm-svn: 318498
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
TransferDbgValues (capital 'T') is wired into ReplaceAllUsesWith, and
transferDbgValues (lowercase 't') is used elsewhere (e.g in Legalize).
Both functions should be doing the exact same thing. This patch
consolidates the logic into one place.
Differential Revision: https://reviews.llvm.org/D40104
llvm-svn: 318448
Some of the AMDGPU stack addressing modes require knowing the sign
bit is zero. We used to accomplish this by custom lowering
frame indexes, and then putting an AssertZext around a
TargetFrameIndex. This required specifically looking for
the AssextZext + frame index pattern which was moderately
disgusting. The same could probably be accomplished
with a target specific node, but would still
require special handling of frame indexes.
llvm-svn: 317671
Introduce a isConstOrDemandedConstSplat helper function that can recognise a constant splat build vector for at least the demanded elts we care about.
llvm-svn: 316866
For cases where we know the floating point representations match the bitcasted integer equivalent, allow bitcasting to these types.
This is especially useful for the X86 floating point compare results which return all/zero bits but as a floating point type.
Differential Revision: https://reviews.llvm.org/D39289
llvm-svn: 316831
Not having the subclass data on an MemIntrinsicSDNodes means it was possible
to try to fold 2 nodes with the same operands but differing MMO flags. This
would trip an assertion when trying to refine the alignment between the 2
MachineMemOperands.
Differential Revision: https://reviews.llvm.org/D38898
llvm-svn: 316737
Similar to how llvm::salvagDebugInfo hooks into InstCombine, this adds
a hook that can be invoked before an SDNode that is associated with an
SDDbgValue is erased to capture the effect of the deleted node in a
DIExpression.
The motivating example is an SDDebugValue attached to an ADD operation
that gets folded into a LOAD+OFFSET operation.
rdar://problem/32121503
llvm-svn: 316525
We don't need to do any additional recursion, we just need to analyze the APInt stored in the node. This matches what the ValueTracking versions do for IR.
llvm-svn: 316256
I don't know if we ever hit this case or not. Turning it into an assert only fired on expanding some atomic operation in a SystemZ lit test.
llvm-svn: 315648
For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements.
We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results.
Differential Revision: https://reviews.llvm.org/D37849
llvm-svn: 313543
Use RotAmt.urem(VTBits) instead of AND(RotAmt, VTBits - 1)
TBH I don't expect non-power-of-2 types to be created, but it makes the logic clearer and matches what we do in other rotation combines.
llvm-svn: 313245
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.
Reviewers: majnemer
Subscribers: eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D36904
llvm-svn: 312569
This partially reverts r311429 in favor of making ISD::isConstantSplatVector do something not confusing. Turns out the only other user of it was also having to deal with the weird property of it returning a smaller size.
So rather than continue to deal with this quirk everywhere, just make the interface do something sane.
Differential Revision: https://reviews.llvm.org/D37039
llvm-svn: 311510
This adds debug messages to various functions that create new SDValue nodes.
This is e.g. useful to have during legalization, as otherwise it can prints
legalization info of nodes that did not appear in the dumps before.
Differential Revision: https://reviews.llvm.org/D36984
llvm-svn: 311444
ISD::isConstantSplatVector can shrink to the smallest splat width. But we don't check the size of the resulting APInt at all. This can cause us to misinterpret the results.
This patch just adds a flag to prevent the APInt from changing width.
Fixes PR34271.
Differential Revision: https://reviews.llvm.org/D36996
llvm-svn: 311429
This patch teaches the SDag type legalizer how to split up debug info for
integer values that are split into a hi and lo part.
(re-commit)
Differential Revision: https://reviews.llvm.org/D36805
llvm-svn: 311181
This patch teaches the SDag type legalizer how to split up debug info for
integer values that are split into a hi and lo part.
Differential Revision: https://reviews.llvm.org/D36805
llvm-svn: 311102
into vextract(vNiX,Idx) when creating vextract with getNode().
This case appeared in AVX512 after fixing pr33349 in r310552.
Differential revision: https://reviews.llvm.org/D36571
llvm-svn: 310828
Summary:
Preserve chain dependecies between old and new loads constructed to
prevent loads from reordering below later stores.
Fixes PR34088.
Reviewers: craig.topper, spatel, RKSimon, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36528
llvm-svn: 310604
In FoldConstantArithmetic, handle BUILD_VECTOR nodes that do implicit truncation on the elements.
This is similar to what is done in FoldConstantVectorArithmetic.
Differential Revision:
https://reviews.llvm.org/D36506
llvm-svn: 310593
This patch is in 2 parts:
1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.
2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.
Differential Revision: https://reviews.llvm.org/D35896
llvm-svn: 309486
This patch moves the DAGCombiner::GetDemandedBits function to SelectionDAG::GetDemandedBits as a first step towards making it easier for targets to get to the source of any demanded bits without the limitations of SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D35841
llvm-svn: 308983
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
Relanding after rewriting undef.ll test to avoid host-dependant
endianness.
As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.
Tests of note:
* test/CodeGen/X86/build-vector* - Improved.
* test/CodeGen/BPF/undef.ll - Improved store alignment allows an
additional store merge
* test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
case we already do not handle well. Here, the DAG is improved, but
scheduling causes a code size degradation.
Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D34472
llvm-svn: 307114
As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.
Tests of note:
* test/CodeGen/X86/build-vector* - Improved.
* test/CodeGen/BPF/undef.ll - Improved store alignment allows an
additional store merge
* test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
case we already do not handle well. Here, the DAG is improved, but
scheduling causes a code size degradation.
Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D34472
llvm-svn: 306819
When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable.
This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer.
This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure.
Differential Revision: https://reviews.llvm.org/D34467
llvm-svn: 306209
This step is just intended to reduce code duplication rather than change any functionality.
A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.
Differential Revision: https://reviews.llvm.org/D33649
llvm-svn: 305192
This prevents against assertion errors like PR32659 which occur from a
replacement deleting a node after it's been added to the list argument
of RemoveDeadNodes. The specific failure from PR32659 does not
currently happen, but it is still potentially possible. The underlying
cause is that the callers of the change dfunction builds up a list of
nodes to delete after having moved their uses and it possible that a
move of a later node will cause a previously deleted nodes to be
deleted.
Reviewers: bkramer, spatel, davide
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33731
llvm-svn: 305070
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
Currently getOptimalMemOpType returns i32 for large enough sizes without
checking for alignment, leading to poor code generation when misaligned accesses
aren't permitted as we generate a word store then later split it up into byte
stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for
memset we splat the memset value into a word then immediately split it up
again.
Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type
to use, but also fix a bug there where it wasn't correctly checking if
misaligned memory accesses are allowed.
Differential Revision: https://reviews.llvm.org/D33442
llvm-svn: 303990
Refactor the strlen optimization code to work for both strlen and wcslen.
This especially helps with programs in the wild where people pass
L"string"s to const std::wstring& function parameters and the wstring
constructor gets inlined.
This also fixes a lingerind API problem/bug in getConstantStringInfo()
where zeroinitializers would always give you an empty string (without a
length) back regardless of the actual length of the initializer which
did not work well in the TrimAtNul==false causing the PR mentioned
below.
Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG
memcpy lowering and may lead to some cases for out-of-bounds
zeroinitializer accesses not getting optimized anymore. So some code
with UB may produce out of bound memory reads now instead of just
producing zeros.
The refactoring "accidentally" fixes http://llvm.org/PR32124
Differential Revision: https://reviews.llvm.org/D32839
llvm-svn: 303461
This patch adds min/max population count, leading/trailing zero/one bit counting methods.
The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.
Differential Revision: https://reviews.llvm.org/D32931
llvm-svn: 302925
- This change allows targets to opt-in to using them instead of the log2
shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
factored out into LoopUtils, and now have a unified interface for generating
reductions regardless of the preference of the target. LoopUtils now uses TTI
to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.
Differential Revision: https://reviews.llvm.org/D30086
llvm-svn: 302514
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.
Differential Revision: https://reviews.llvm.org/D32637
llvm-svn: 302262
This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible.
Differential Revision: https://reviews.llvm.org/D32784
llvm-svn: 302088
PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats.
This patch adds support for extension/truncation for both integer and float types.
Differential Revision: https://reviews.llvm.org/D32391
llvm-svn: 301910
This is the SelectionDAG version of D32521. If know where at least one 1 is located in the input to these intrinsics we can place an upper bound on the number of bits needed to represent the count and thus increase the number of known zeros in the output.
I think we can also refine this further for CTTZ_UNDEF/CTLZ_UNDEF by assuming that the answer will never be BitWidth. I've left this out for now because it caused other test failures across multiple targets. Usually because of turning ADD into OR based on this new information.
I'll fix CTPOP in a future patch.
Differential Revision: https://reviews.llvm.org/D32692
llvm-svn: 301806
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29872
llvm-svn: 301775
Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit.
Reviewers: RKSimon, spatel, davide
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32651
llvm-svn: 301747
Reapplied r299221 after fix for nondeterminism in ThinLTO builder (rL301599), with extra check for implicit truncation of inserted element.
llvm-svn: 301644
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.
This is largely a mechanical transformation from KnownZero to Known.Zero.
Differential Revision: https://reviews.llvm.org/D32569
llvm-svn: 301620
This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version.
llvm-svn: 301618
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.
Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.
I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.
Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.
Differential Revision: https://reviews.llvm.org/D32376
llvm-svn: 301432
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.
For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."
PR32754.
llvm-svn: 301111
This enables use after free and uninit memory checking for memory
returned by a recycler. SelectionDAG currently relies on the opcode of a
free'd node being ISD::DELETED_NODE, so poke a hole in the asan poison
for SDNode opcodes. This means that we won't find some issues, but only
in SDag.
llvm-svn: 300868
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.
However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.
This patch fixes that.
Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.
AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.
Differential Revision: https://reviews.llvm.org/D32021
llvm-svn: 300864
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
Differential Revision: https://reviews.llvm.org/D32108
llvm-svn: 300856
This patch adds a few helper functions to obtain new vector
value types based on existing ones without needing to care
about whether they are scalable or not.
I've confined their use to a few common locations right now,
and targets that don't have scalable vectors should never
need to care about these.
Patch by Graham Hunter.
Differential Revision: https://reviews.llvm.org/D32017
llvm-svn: 300838
This is preparation for a clang change to improve the [[nodiscard]] warning to not be ignored on methods that return a class marked [[nodiscard]] that are defined in the class itself. See D32207.
We should consider adding wrapper methods to APInt that return the overflow flag directly and discard the APInt result. This would eliminate the void casts and the need to create a bool before the call to pass to the out param.
llvm-svn: 300758
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.
This adds an lshrInPlace(const APInt &) version as well.
Differential Revision: https://reviews.llvm.org/D32155
llvm-svn: 300566
Since the BUILD_VECTOR has already been checked by
isBuildVectorOfConstantSDNodes() in SelectionDAG::getNode() for a
SIGN_EXTEND_INREG, it can be assumed that Op is always either undef or a
ConstantSDNode, and Ops.size() will always equal VT.getVectorNumElements().
llvm-svn: 299647
When DAGCombiner visits a SIGN_EXTEND_INREG of a BUILD_VECTOR with
constant operands, a new BUILD_VECTOR node will be created transformed
constants.
Llvm-stress found a case where the new BUILD_VECTOR had constant operands
of an illegal type, because the (legal) element type is in fact not a legal
scalar type.
This patch changes this so that the new BUILD_VECTOR has the same operand
type as the old one.
Review: Eli Friedman, Nirav Dave
https://bugs.llvm.org//show_bug.cgi?id=32422
llvm-svn: 299540
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.
Followup to D25691.
Differential Revision: https://reviews.llvm.org/D31311
llvm-svn: 299219
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.
Differential Revision: https://reviews.llvm.org/D31249
llvm-svn: 299201
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.
Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.
SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.
Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).
Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed. For now, use
'isEnabled' as a crude replacement.
This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'. Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.
This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.
Differential Revision: https://reviews.llvm.org/D31405
llvm-svn: 299093
We make the assumption in most of our constant folding code that a fp2int will target an integer of 128-bits or less, calling the APFloat::convertToInteger with only uint64_t[2] of raw bits for the result.
Fuzz testing (PR24662) showed that we don't handle other cases at all, resulting in stack overflows and all sorts of crashes.
This patch uses the APSInt version of APFloat::convertToInteger instead to better handle such cases.
Differential Revision: https://reviews.llvm.org/D31074
llvm-svn: 298226
Handle TokenFactors more aggressively in
SDValue::reachesChainWithoutSideEffects. This isn't really a
very effective change anymore because of other changes to
chain handling, but it's a cheap check, and the expanded
comments are still useful.
It might be possible to loosen the hasOneUse() requirement with a
deeper analysis, but a naive implementation of that check would be
expensive.
Differential Revision: https://reviews.llvm.org/D29845
llvm-svn: 298156
This patch replaces ORs with getHighBits/getLowBits etc. with setLowBits/setHighBits/setBitsFrom.
In a few of the places we weren't ORing, but the KnownZero/KnownOne vectors were already initialized to zero. We exploit this in most places already there were just some that were inconsistent.
Differential Revision: https://reviews.llvm.org/D30965
llvm-svn: 297860
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.
ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.
At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.
Differential Revision: https://reviews.llvm.org/D29639
llvm-svn: 297780
Summary:
Depends on D30379
This improves the state of things for the sub class of operation.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30436
llvm-svn: 297482
Summary: As per title. This is extracted from D29872 and I threw SADDO in.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30379
llvm-svn: 297479
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).
This is another of the compile time issues identified in PR32037 (see also D30265).
This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.
Differential Revision: https://reviews.llvm.org/D30780
llvm-svn: 297458
As discussed in the review thread for rL297026, this is actually 2 changes that
would independently fix all of the test cases in the patch:
1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before
foldBinopIntoSelect() as a matter of efficiency.
I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().
I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).
Differential Revision:
https://reviews.llvm.org/D30741
llvm-svn: 297384
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.
We're missing a couple of shuffle combines that will be added in a future patch for review.
Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.
Differential Revision: https://reviews.llvm.org/D30549
llvm-svn: 296985
Summary:
This can be used to optimize large multiplications after legalization.
Depends on D29565
Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29587
llvm-svn: 296711
The patch comes in 2 parts:
1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.
2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.
Fix for PR30760
Differential Revision: https://reviews.llvm.org/D29568
llvm-svn: 294749
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.
We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.
This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.
Differential Revision: https://reviews.llvm.org/D29399
llvm-svn: 294183
Summary:
This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations.
For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there.
Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines.
Reviewers: RKSimon, delena
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29000
llvm-svn: 292876
This patch improves the knownbits logic for unsigned integer min/max opcodes.
For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.
This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.
Differential Revision: https://reviews.llvm.org/D28853
llvm-svn: 292528
The usage of some MIPS MSA instrinsics that took immediates could crash LLVM
during lowering. This patch addresses that behaviour. Crucially this patch
also makes the use of intrinsics with out of range immediates as producing an
internal error.
The ld,st instrinsics would trigger an assertion failure for MIPS64 as their
lowering would attempt to add an i32 offset to a i64 pointer.
Reviewers: vkalintiris, slthakur
Differential Revision: https://reviews.llvm.org/D25438
llvm-svn: 291571
There are helpers for testing for constant or constant build_vector,
and for splat ConstantFP vectors, but not for a constantfp or
non-splat ConstantFP vector.
llvm-svn: 290317
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.
Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.
Differential Revision: https://reviews.llvm.org/D27714
llvm-svn: 289654
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.
Differential Revision: https://reviews.llvm.org/D26671
llvm-svn: 289647
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.
llvm-svn: 289232
Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements.
llvm-svn: 289214
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.
We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.
Differential Revision: https://reviews.llvm.org/D27129
llvm-svn: 289200
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.
llvm-svn: 289194
We have the following DAGCombiner transformations:
(mul (shl X, c1), c2) -> (mul X, c2 << c1)
(mul (shl X, C), Y) -> (shl (mul X, Y), C)
(shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.
Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.
Differential Revision: https://reviews.llvm.org/D26605
llvm-svn: 287766
Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size.
Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results.
Differential Revision: https://reviews.llvm.org/D26851
llvm-svn: 287635
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.
I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.
DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.
This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander!
Differential Revision: https://reviews.llvm.org/D25691
llvm-svn: 285494