Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher.
Original commit message:
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355784
This saves needing to call getInt32 ourselves. Making the code a little shorter.
The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue.
This cleanup because I plan to write similar code for expandload/compressstore.
llvm-svn: 355767
There are special cases in the scalarization for constant masks. If we hit one of the special cases we don't need to reset the iteration.
Noticed while starting work on adding expandload/compressstore to this pass.
llvm-svn: 355754
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().
This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.
Differential Revision: https://reviews.llvm.org/D59139
llvm-svn: 355751
This avoids breaking possible value dependencies when sorting loads by
offset.
AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.
Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.
llvm-svn: 355728
many valnos.
Recently we found compile time out problem in several cases when
SpeculativeLoadHardening was enabled. The significant compile time was spent
in register coalescing pass, where register coalescer tried to join many other
live intervals with some very large live intervals with many valnos.
Specifically, every time JoinVals::mapValues is called, computeAssignment will
be called by getNumValNums() times of the target live interval. If the large
live interval has N valnos and has N copies associated with it, trying to
coalescing those copies will at least cost N^2 complexity.
The patch adds some limit to the effort trying to join those very large live
intervals with others. By default, for live interval with > 100 valnos, and
when it has been coalesced with other live interval by more than 100 times,
we will stop coalescing for the live interval anymore. That put a compile
time cap for the N^2 algorithm and effectively solves the compile time
problem we saw.
Differential revision: https://reviews.llvm.org/D59143
llvm-svn: 355714
Summary:
The logic in the -unreachableblockelim pass does the following:
1. It traverses the function it's given in depth-first order and
creates a set of basic blocks that are unreachable from the
function's entry node.
2. It iterates over each of those unreachable blocks and (1) removes any
successors' references to the dead block, and (2) replaces any uses of
instructions from the dead block with null.
The logic in (2) above is identical to what the `llvm::DeleteDeadBlocks`
function from `BasicBlockUtils.h` does. The only difference is that
`llvm::DeleteDeadBlocks` replaces uses of instructions from dead blocks
not with null, but with undef.
Replace the duplicate logic in the -unreachableblockelim pass with a
call to `llvm::DeleteDeadBlocks`. This results in less code but no
functional change (NFC).
Reviewers: mkazantsev, wmi, davidxl, silvas, davide
Reviewed By: davide
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59064
llvm-svn: 355634
Restore a reverted commit, with the silly mistake fixed. Sorry for the previous breakage.
Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead.
Differential Revision: https://reviews.llvm.org/D58760
llvm-svn: 355540
Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well.
Requested by @spatel on D59006
llvm-svn: 355533
Summary:
In r354298 a DominatorTree construction was added via new function
combineToUSubWithOverflow, which was subsequently restructured into
replaceMathCmpWithIntrinsic in r354689. We are hitting a very long
compile time due to this repeated construction, once per math cmp in
the function.
We shouldn't need to build the DominatorTree more than once per
function, except when a transformation invalidates it. There is already
a boolean flag that is returned from these methods indicating whether
the DT has been modified. We can simply build the DT once per
Function walk in CodeGenPrepare::runOnFunction, since any time a change
is made we break out of the Function walk and restart it.
I modified the code so that both replaceMathCmpWithIntrinsic as well as
mergeSExts (which was also building a DT) use the DT constructed by the
run method.
From -mllvm -time-passes:
Before this patch: CodeGen Prepare user time is 328s
With this patch: CodeGen Prepare user time is 21s
Reviewers: spatel
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58995
llvm-svn: 355512
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
Differential Revision: https://reviews.llvm.org/D58996
llvm-svn: 355507
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.
Differential Revision: https://reviews.llvm.org/D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 355490
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.
Differential Revision: https://reviews.llvm.org/D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 355483
Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead.
Differential Revision: https://reviews.llvm.org/D58760
llvm-svn: 355453
This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures.
llvm-svn: 355433
These arrays are both keyed by CPU name and go into the same tablegenerated file. Merge them so we only need to store keys once.
This also removes a weird space saving quirk where we used the ProcDesc.size() to create to build an ArrayRef for ProcSched.
Differential Revision: https://reviews.llvm.org/D58939
llvm-svn: 355431
The description for CPUs was just the CPU name wrapped with "Select the " and " processor". We can just do that directly in the help printer instead of making a separate version in the binary for each CPU.
Also remove the Value field that isn't needed and was always 0.
Differential Revision: https://reviews.llvm.org/D58938
llvm-svn: 355429
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html
While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?
llvm-svn: 355345
This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality
Differential Revision: https://reviews.llvm.org/D58884
llvm-svn: 355324
Summary:
Before when we implemented the first EH proposal, 'catch <tag>'
instruction may not catch an exception so there were multiple EH pads an
exception can unwind to. That means a BB could have multiple EH pad
successors.
Now after we switched to the new proposal, every 'catch' instruction
catches an exception, and there is only one catchpad per catchswitch, so
we at most have one EH pad successor, making `ThrowUnwindDest` map in
`WasmEHInfo` unnecessary.
Keeping `ThrowUnwindDest` map in `WasmEHInfo` has its own problems,
because other optimization passes can split a BB that contains possibly
throwing calls (previously invokes), and we have to update the map every
time that happens, which is not easy for common CodeGen passes.
This also correctly updates successor info in LateEHPrepare when we add
a rethrow instruction.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58486
llvm-svn: 355296
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355224
Summary:
In the clang UI, replaces -mthread-model posix with -matomics as the
source of truth on threading. In the backend, replaces
-thread-model=posix with the atomics target feature, which is now
collected on the WebAssemblyTargetMachine along with all other used
features. These collected features will also be used to emit the
target features section in the future.
The default configuration for the backend is thread-model=posix and no
atomics, which was previously an invalid configuration. This change
makes the default valid because the thread model is ignored.
A side effect of this change is that objects are never emitted with
passive segments. It will instead be up to the linker to decide
whether sections should be active or passive based on whether atomics
are used in the final link.
Reviewers: aheejin, sbc100, dschuff
Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, steven_wu, dexonsmith, rupprecht, jfb, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D58742
llvm-svn: 355112
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.
This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58650
llvm-svn: 355099
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).
This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.
To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..
Previously landed
D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
D57596 [CodeGen] Be conservative about atomic accesses as for volatile
D57802 Be conservative about unordered accesses for the moment
rL353959: [Tests] First batch of cornercase tests for unordered atomics.
rL353966: [Tests] RMW folding tests w/unordered atomic operations.
rL353972: [Tests] More unordered atomic lowering tests.
rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
rL354740: [Hexagon, SystemZ] Be super conservative about atomics
rL354800: [Lanai] Be super conservative about atomics
rL354845: [ARM] Be super conservative about atomics
Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)
Differential Revision: https://reviews.llvm.org/D57601
llvm-svn: 355025
When using full LTO it is possible that template function definition DIE
is bound to one compilation unit and it's declaration to another. We should
add function declaration attributes on behalf of its owner CU otherwise
we may end up with malformed file identifier in function declaration
DW_AT_decl_file attribute.
Differential revision: https://reviews.llvm.org/D58538
llvm-svn: 354978
If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results.
I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit.
Differential Revision: https://reviews.llvm.org/D58637
llvm-svn: 354866
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.
llvm-svn: 354828
These helpers extend the existing isConstOrConstSplat helper checks to support DemandedElts masks as well.
We already had a local version of this in SelectionDAG that computeKnownBits/ComputeNumSignBits made use of, but this adds the functionality directly to the BuildVectorSDNode node and extends isConstOrConstSplat etc. to use that.
This will allow us to reuse the functionality in SimplifyDemandedVectorElts/SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D58503
llvm-svn: 354797
Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold
Differential Revision: https://reviews.llvm.org/D58585
llvm-svn: 354793
OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2.
This saves ~3000 bytes in the X86 table.
llvm-svn: 354763
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.
We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.
Reviewers: spatel, RKSimon, nikic
Reviewed By: nikic
Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58567
llvm-svn: 354753
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.
Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.
The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.
This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486https://bugs.llvm.org/show_bug.cgi?id=31754
llvm-svn: 354746
The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist.
X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here.
Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change.
llvm-svn: 354738
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."
These were reverted in r354713 as their context depended on other patches that were reverted for a bug.
llvm-svn: 354734
r354363 caused https://crbug.com/934963#c1, which has a plain C reduced
test case.
I also had to revert some dependent changes:
- r354648
- r354647
- r354640
- r354511
llvm-svn: 354713
Summary:
Prior to r310876 one of our out-of-tree targets was enabling IPRA by modifying
the TargetOptions::EnableIPRA. This no longer works on current trunk since the
useIPRA() hook overrides any values that are set in advance. This patch adjusts
the behaviour of the hook so that API users and useIPRA() can both enable it
but useIPRA() cannot disable it if the API user already enabled it.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: wdng, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D38043
llvm-svn: 354692
We need to enhance the uaddo matching to handle special-cases
as seen in PR40486 and PR31754. That means we won't necessarily
have a def-use pattern, so we'll need to check dominance to
determine where to place the intrinsic (as we already do for
usubo). This preliminary patch is just rearranging the code,
so the planned follow-up to improve uaddo will be more clear.
llvm-svn: 354689
Don't skip incrementing the frame index number
if the object is dead. Instructions can still be
referencing the old frame index number, and this
doesn't attempt to remap those. The resulting
MIR then fails to load because the use instructions
use a higher frame index number than recorded
list of stack objects.
I'm not sure it's possible to craft a testcase
with the existing set of passes. It requires
selectively marking some stack objects
dead in an essentially random order.
StackSlotColoring condenses towards
the low indexes. This avoids a regression in a
future AMDGPU commit when some frame indexes
are lowered separately from PEI.
llvm-svn: 354688
Will allow re-using the machinery for independent
sets of register allocators.
This will allow AMDGPU to use separate command line
options for the allocator to use for SGPRs separate
from VGPRs.
llvm-svn: 354687
This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect.
Differential Revision: https://reviews.llvm.org/D58393
llvm-svn: 354682
When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits.
We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results.
So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR.
llvm-svn: 354655
Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those.
But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results.
By promoting the input at the same time we can create only a single any_extend or truncate.
There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch.
llvm-svn: 354647
This fold can occur during legalization, so it can fight with promotion
to the larger type. It apparently takes a special sequence and subtarget
to avoid more basic simplifications that would hide the problem.
But there's a bigger question raised here: why does distributeTruncateThroughAnd()
even exist? It duplicates functionality from a more minimal pattern that we
already have. But getting rid of this function requires some preliminary steps.
https://bugs.llvm.org/show_bug.cgi?id=40793
llvm-svn: 354594
For AMDGPU, if an operand requires an SGPR but is only available as a
VGPR, a loop needs to be introduced to execute the instruction with
each unique combination of values across all lanes. The rest of the
instructions in the block will be moved to a new block following the
loop. Check if the next instruction's parent changed, and update the
iterators and insertion block if this happened.
Tests will be included in a future patch.
llvm-svn: 354591
If the LHS has known zeros, then the RHS immediate mask might have been simplified to remove those bits.
This patch adds a call to computeKnownBits to get the known zeroes to handle that possibility. I left an early out to skip the call if all of the demanded bits are set in the mask.
Differential Revision: https://reviews.llvm.org/D58464
llvm-svn: 354514
Second part of https://bugs.llvm.org/show_bug.cgi?id=40442.
This adds an extra UnrollVectorOverflowOp() method to SDAG, because
the general UnrollOverflowOp() method can't deal with multiple results.
Additionally we need to expand UMULO/SMULO during vector op
legalization, as it may result in unrolling, which may need additional
type legalization.
Differential Revision: https://reviews.llvm.org/D57997
llvm-svn: 354513
If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits.
This can prevent GetDemandedBits from recognizing that the AND is unnecessary.
llvm-svn: 354498
We may leave behind incorrect dead flags on instructions that are CSE'd. Make
sure we remove the dead flags on physical registers to prevent other incorrect
code motion.
Differential Revision: https://reviews.llvm.org/D58115
llvm-svn: 354443
Summary:
This is a follow-up to r353988 where tryEvict was extended to take last
chance recoloring into account. Now we do the same thing for trySplit and
tryAssign.
Now we always pass a "FixedRegisters" argument to canEvictInterference and
tryEvict so it doesn't need to have a default value anymore.
The need for this was found long ago in an out-of-tree target.
Unfortunately I don't have a reproducer for an in-tree target.
Reviewers: qcolombet, rudkx
Reviewed By: qcolombet, rudkx
Subscribers: rudkx, MatzeB, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58376
llvm-svn: 354439
Summary:
Rename MemoryIndex to InitFlags and implement logic for determining
data segment layout in ObjectYAML and MC. Also adds a "passive" flag
for the .section assembler directive although this cannot be assembled
yet because the assembler does not support data sections.
Reviewers: sbc100, aardappel, aheejin, dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57938
llvm-svn: 354397
Directly use the correct shift amount type if it is possible, and
future-proof the code against vectors. The added test makes sure that
bitwidths that do not fit into the shift amount type do not assert.
Split out from D57997.
llvm-svn: 354359
Legalize/select llvm.ctlz.*
Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and
arm64-vclz.ll to show that we perform valid transformations in optimized builds,
and document where GISel can improve.
Differential Revision: https://reviews.llvm.org/D58155
llvm-svn: 354299
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.
Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.
This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.
Differential Revision: https://reviews.llvm.org/D57789
llvm-svn: 354298
Summary:
A store to an object whose lifetime is about to end can be removed.
See PR40550 for motivation.
Reviewers: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57541
llvm-svn: 354244
In preparation for supporting vector expansion.
Add an isPostTypeLegalization flag to makeLibCall(), because this
expansion relies on the legalized form using MERGE_VALUES. Drop
the corresponding variant of ExpandLibCall, which is no longer used.
Differential Revision: https://reviews.llvm.org/D58006
llvm-svn: 354226
Testing based on the total size of the elements failed to catch a few
invalid scenarios, so explicitly check the number of elements/operands
and types.
This failed to catch situations like
<4 x s16> = G_BUILD_VECTOR s32, s32 since the total size added
up. This also would fail to catch an implicit conversion between
pointers and scalars.
llvm-svn: 354139
https://reviews.llvm.org/D58073
Speed up insertion during the initial populating phase into the
GISelWorkList by deferring repeatedly resizing the DenseMap.
This results in ~10% improvement in the combiner passes, and
~3% speedup in the Legalizer.
reviewed by: aemerson.
llvm-svn: 354093
Select G_BR and G_BRCOND for MIPS32.
Unconditional branch G_BR does not have register operand,
for that reason we only add tests.
Since conditional branch G_BRCOND compares register to zero on MIPS32,
explicit extension must be performed on i1 condition in order to set
high bits to appropriate value.
Differential Revision: https://reviews.llvm.org/D58182
llvm-svn: 354022
While rebasing a refactor in r353950 I accidentally swapped two function
arguments; one is SelectionDAGBuilders "current" DebugLoc, the other is the one
from the "current" debug intrinsic. They're probably always identical, but I
haven't proved that yet.
llvm-svn: 354019
Summary:
The declarative tablegen definitions split rules into match and apply steps.
Prepare for that by doing the same in the C++ implementations. This aids
some of the migration effort while the tablegen version is incomplete.
Reviewers: bogner, volkan, aditya_nandakumar, paquette, aemerson
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, Petar.Avramovic, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58150
llvm-svn: 353996
For D57601, we need to know whether the instruction is volatile. We'd either have to pass yet another parameter, or just standardize on the MMO interface. I chose the second.
llvm-svn: 353989
Last chance recoloring inserts into FixedRegisters those virtual
registers it is attempting to assign a physical register to.
We must consider these when we consider candidates for eviction so that
we do not end up evicting something while we are attempting to recolor
to assign it.
This is hitting in an out-of-tree target and no longer reproduces on
trunk. That does not appear to be a result of it having been fixed, but
rather, it appears that optimization changes and/or other changes to
register allocation mask the problem.
I haven't found a way to come up with a reasonable test case for this
(i.e. one that I can actually commit to open source, is reasonable
in size, and actually reproduces the issue).
rdar://problem/45708741
llvm-svn: 353988
The helper function was used by only two callers, and largely ended up providing distinct functionality based on optional arguments and opcode. Inline and simply to make the functionality much more clear.
llvm-svn: 353977
In this patch SelectionDAG tries to salvage any dbg.values that are going to be
dropped, in case they can be recovered from Values in the current BB. It also
strengthens SelectionDAGs handling of dangling debug data, so that dbg.values
are *always* emitted (as Undef or otherwise) instead of dangling forever.
The motivation behind this patch exists in the new test case: a memory address
(here a bitcast and GEP) exist in one basic block, and a dbg.value referring to
the address is left in the 'next' block. The base pointer is live across all
basic blocks. In current llvm trunk the dbg.value cannot be encoded, and it
isn't even emitted as an Undef DBG_VALUE.
The change is simply: if we're definitely going to drop a dbg.value, repeatedly
apply salvageDebugInfo to its operand until either we find something that can
be encoded, or we can't salvage any further in which case we produce an Undef
DBG_VALUE. To know when we're "definitely going to drop a dbg.value",
SelectionDAG signals SelectionDAGBuilder when all IR instructions have been
encoded to force salvaging. This ensures that any dbg.value that's dangling
after DAG creation will have a corresponding DBG_VALUE encoded.
Differential Revision: https://reviews.llvm.org/D57694
llvm-svn: 353954
This is a pure copy-and-paste job, moving the logic for lowering dbg.value
intrinsics to SDDbgValues into its own function. This is ahead of adding some
more users of this logic.
Differential Revision: https://reviews.llvm.org/D57697
llvm-svn: 353950
SelectionDAGBuilder has special handling for dbg.value intrinsics that are
understood to define the location of function parameters on entry to the
function. To enable this, we avoid recording a dbg.value as a virtual register
reference if it might be such a parameter, so that it later hits
EmitFuncArgumentDbgValue.
This patch reduces the set of circumstances where we avoid recording a
dbg.value as a virtual register reference, to allow more "normal" variables
to be recorded that way. We now only bypass for potential parameters if:
* The dbg.value operand is an Argument,
* The Variable is a parameter, and
* The Variable is not inlined.
meaning it's very likely that the dbg.value is a function-entry parameter
location.
Differential Revision: https://reviews.llvm.org/D57584
llvm-svn: 353948
Summary:
This is a follow-up to D57510. This patch stops DebugHandlerBase from
changing the starting label for the first non-overlapping,
register-described parameter DBG_VALUEs to the beginning of the
function. That code did not consider what defined the registers, which
could result in the ranges for the debug values starting before their
defining instructions. We currently do not emit debug values for
constant values directly at the start of the function, so this code is
still useful for such values, but my intention is to remove the code
from DebugHandlerBase completely when we get there. One reason for
removing it is that the code violates the history map's ranges, which I
think can make it quite confusing when troubleshooting.
In D57510, PrologEpilogInserter was amended so that parameter DBG_VALUEs
now are kept at the start of the entry block, even after emission of
prologue code. That was done to reduce the degradation of debug
completeness from this patch. PR40638 is another example, where the
lexical-scope trimming that LDV does, in combination with scheduling,
results in instructions after the prologue being left without locations.
There might be other cases where the DBG_VALUEs are pushed further down,
for which the DebugHandlerBase code may be helpful, but as it now quite
often result in incorrect locations, even after the prologue, it seems
better to remove that code, and try to work our way up with accurate
locations.
In the long run we should maybe not aim to provide accurate locations
inside the prologue. Some single location descriptions, at least those
referring to stack values, generate inaccurate values inside the
epilogue, so we maybe should not aim to achieve accuracy for location
lists. However, it seems that we now emit line number programs that can
result in GDB and LLDB stopping inside the prologue when doing line
number stepping into functions. See PR40188 for more information.
A summary of some of the changed test cases is available in PR40188#c2.
Reviewers: aprantl, dblaikie, rnk, jmorse
Reviewed By: aprantl
Subscribers: jdoerfert, jholewinski, jvesely, javed.absar, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D57511
llvm-svn: 353928
This is a recommit of r335091 Add more test cases for deopt-operands via regalloc, and r335077 [InlineSpiller] Fix a crash due to lack of forward progress from remat specifically for STATEPOINT. They were reverted due to a crash.
This change includes the text of both original changes, but also includes three aditional pieces:
1) A bug fix for the observed crash. I had failed to record the failed remat value as live which resulted in an instruction being deleted which still had uses. With the machine verifier, this is caught quickly. Without it, we fail in StackSlotColoring due to an empty live interval from LiveStack.
2) A test case which demonstrates the fix for (1). See @test11.
3) A control flag which defaults to disabling this for the moment. Once I've run more extensive validaton, I will switch the default and then remove this flag.
llvm-svn: 353871
Instead of only having this code work for unary intrinsics, have it work for
an arbitrary number of parameters.
Factor out the cases that fall under this (fma, pow).
This makes it a bit easier to add more intrinsics which don't require any
special work.
Differential Revision: https://reviews.llvm.org/D58079
llvm-svn: 353863
This teaches the IRTranslator to emit G_BSWAP when it runs into
Intrinsic::bswap. This allows us to select G_BSWAP for non-vector types in
AArch64.
Add a select-bswap.mir test, and add global isel checks to a couple existing
tests in test/CodeGen/AArch64.
This doesn't handle every bswap case, since some of these rely on known bits
stuff. This just lets us handle the naive case.
Differential Revision: https://reviews.llvm.org/D58081
llvm-svn: 353861
If we're comparing some value for equality against 2 constants
and those constants have an absolute difference of just 1 bit,
then we can offset and mask off that 1 bit and reduce to a single
compare against zero:
and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) -->
setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq
https://rise4fun.com/Alive/XslKj
This transform is disabled by default using a TLI hook
("convertSetCCLogicToBitwiseLogic()").
That should be overridden for AArch64, MIPS, Sparc and possibly
others based on the asm shown in:
https://bugs.llvm.org/show_bug.cgi?id=40611
llvm-svn: 353859
Summary:
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, the legalizer cannot directly call the intrinsic
with the wide arguments; instead, it "pre-lowers" them by splitting
them in halves.
rL283203 made sure that on big endian targets, the legalizer passes
the argument halves in the correct order. It did not do the same
for the return value halves because the existing code used a hack;
it put an illegal type into DAG and hoped that nothing would break
and it would be correctly lowered elsewhere.
rL307207 fixed this, handling return value halves similar to how
argument handles are handled, but did not take big-endian targets
into account.
This commit fixes the expansion on big-endian targets, such as
the out-of-tree OR1K target.
Reviewers: eli.friedman, vadimcn
Subscribers: george-hopkins, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D45355
llvm-svn: 353854
We need to clear the kill flags on both SingleValReg and OldReg, to ensure they remain
conservatively correct.
Differential Revision: https://reviews.llvm.org/D58114
llvm-svn: 353847
Summary:
This is a preparatory change for removing the code from
DebugHandlerBase::beginFunction() which changes the starting label for
the first non-overlapping DBG_VALUEs of parameters to the beginning of
the function. It does that to be able to show parameters when entering a
function. However, that code does not consider what defines the values,
which can result in the ranges for the debug values starting before
their defining instructions. That code is removed in a follow-up patch.
When prologue code is inserted, it leads to DBG_VALUEs that start
directly in the entry block being moved down after the prologue
instructions. This patch fixes that by stashing away DBG_VALUEs for
parameters before emitting the prologue, and then reinserts them at the
start of the block. This assumes that there is no target that somehow
clobbers parameter registers in the frame setup; there is no such case
in the lit tests at least.
See PR40188 for more information.
Reviewers: aprantl, dblaikie, rnk, jmorse
Reviewed By: aprantl
Subscribers: bjope, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D57510
llvm-svn: 353823
This configuration (due to r349207) was intended not to emit any DWO CU,
but a degenerate CU was still being emitted - containing a header and a
DW_TAG_compile_unit with no attributes.
Under that situation, emit nothing to the .dwo file. (since this is a
dynamic property of the input the .dwo file is still emitted, just with
nothing in it (so a valid, but empty, ELF file) - if some other CU
didn't satisfy this criteria, its DWO CU would still go there, etc)
llvm-svn: 353771
Background: As described in https://reviews.llvm.org/D57601, I'm working towards separating volatile and atomic in the MMO uses for atomic instructions.
In https://reviews.llvm.org/D57593, I fixed a bug where isUnordered was returning the wrong result, but didn't account for the fact I was getting slightly ahead of myself. While both uses of isUnordered are correct (as far as I can tell), we don't have tests to demonstrate this and being aggressive gets in the way of having the removal of volatile truly be non-functional. Once D57601 lands, I will return to these call sites, revert this patch, and add the appropriate tests to show the expected behaviour.
Differential Revision: https://reviews.llvm.org/D57802
llvm-svn: 353766
Summary:
This patch fixes PR40587.
When a dbg.value instrinsic is emitted to the DAG
by using EmitFuncArgumentDbgValue the resulting
DBG_VALUE is hoisted to the beginning of the entry
block. I think the idea is to be able to locate
a formal argument already from the start of the
function.
However, EmitFuncArgumentDbgValue only checked that
the value that was used to describe a variable was
originating from a function parameter, not that the
variable itself actually was an argument to the
function. So when for example assigning a local
variable "local" the value from an argument "a",
the assocated DBG_VALUE instruction would be hoisted
to the beginning of the function, even if the scope
for "local" started somewhere else (or if "local"
was mapped to other values earlier in the function).
This patch adds some logic to EmitFuncArgumentDbgValue
to check that the variable being described actually
is an argument to the function. And that the dbg.value
being lowered already is in the entry block. Otherwise
we bail out, and the dbg.value will be handled as an
ordinary dbg.value (not as a "FuncArgumentDbgValue").
A tricky situation is when both the variable and
the value is related to function arguments, but not
neccessarily the same argument. We make sure that we
do not describe the same argument more than once as
a "FuncArgumentDbgValue". This solution works as long
as opt has injected a "first" dbg.value that corresponds
to the formal argument at the function entry.
Reviewers: jmorse, aprantl
Subscribers: jyknight, hiraditya, fedor.sergeev, dstenb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57702
llvm-svn: 353735
This teaches the legalizer about G_FFLOOR, and lets us select G_FFLOOR in
AArch64.
It updates the existing floating point tests, and adds a select-floor.mir test.
Differential Revision: https://reviews.llvm.org/D57486
llvm-svn: 353722
After the changes introduced in r353586, this instruction doesn't cause any
issues for any backend.
Original review: https://reviews.llvm.org/D57485
llvm-svn: 353720
`CallBase` class rather than `CallSite` wrappers.
I pushed this change down through most of the statepoint infrastructure,
completely removing the use of CallSite where I could reasonably do so.
I ended up making a couple of cut-points: generic call handling
(instcombine, TLI, SDAG). As soon as it hit truly generic handling with
users outside the immediate code, I simply transitioned into or out of
a `CallSite` to make this a reasonable sized chunk.
Differential Revision: https://reviews.llvm.org/D56122
llvm-svn: 353660
Now that we have vector support for [US](ADD|SUB)O we no longer
need to scalarize when expanding [US](ADD|SUB)SAT.
This matches what the cost model already does.
Differential Revision: https://reviews.llvm.org/D57348
llvm-svn: 353651
Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero.
Differential Revision: https://reviews.llvm.org/D58009
llvm-svn: 353645
SimplifySetCC still has much room for improvement, but this should
fix the remaining problem examples from:
https://bugs.llvm.org/show_bug.cgi?id=40657
The initial fix for this problem was rL353615.
llvm-svn: 353639
There's effectively no difference for the cases with variables.
We just trade a sub for an add on those. But the case with a
subtract from constant would require an extra move instruction
on x86, so this looks like a reasonable generic combine.
llvm-svn: 353619
In preparation for supporting vector expansion.
Also drop a variant of ExpandLibCall, of which the MULO expansions
were the only user.
llvm-svn: 353611
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.
Patch by Whitney Tsang (Whitney)
Differential revision: https://reviews.llvm.org/D57434
llvm-svn: 353557
Replace OR(SHL,SRL) pattern with ISD::FSHR (legalization expands this later if necessary) - this helps with the scale == 0 'undefined' drop-through case that was discussed on D55720.
llvm-svn: 353546
Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and
G_SEXTLOAD. That is perform widenScalarDst to size given by the target
and avoid additional checks in common code. Targets can reorder or add
additional rules in LegalizeRuleSet for the opcode to achieve desired
behavior.
Select extending load that does not have specified type of extension
into zero extending load.
Select truncating store that stores number of bytes indicated by size
in MachineMemoperand.
Differential Revision: https://reviews.llvm.org/D57454
llvm-svn: 353520
This is part of https://bugs.llvm.org/show_bug.cgi?id=40442.
Vector legalization is implemented for the add/sub overflow opcodes.
UMULO/SMULO are also handled as far as legalization is concerned, but
they don't support vector expansion yet (so no tests for them).
The vector result widening implementation is suboptimal, because it
could result in a legalization loop.
Differential Revision: https://reviews.llvm.org/D57639
llvm-svn: 353464
Move the (add (umax X, C), -C) --> (usubsat X, C) X86 combine into generic DAGCombiner
First of a number of saturated arithmetic folds that can be moved out of X86-specific code for PR40111.
Differential Revision: https://reviews.llvm.org/D57754
llvm-svn: 353457
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.
This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.
llvm-svn: 353455
I noticed that we are missing this canonicalization in IR:
rL352515
...and then realized that we don't get this right in SDAG either,
so this has to be fixed first regardless of what we choose to do in IR.
The existing fold was limited to scalars and using the wrong predicate
to guard the transform. We have a boolean contents TLI query that can
be used to decide which direction to fold.
This may eventually lead back to the problems/question in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but it makes no difference to that yet.
Differential Revision: https://reviews.llvm.org/D57401
llvm-svn: 353433
Introduce a new function which handles instructions with multiple type
indices, but have the same number of vector elements.
Also legalize v2s16 shifts when applicable.
llvm-svn: 353432
Summary: This code tries to handle the case where IBB is an EHPad, but there's an earlier check that uses PBB->hasEHPadSuccessor(). Where PBB is a predecessor of IBB. The hasEHPadSuccessor function would have visited IBB and seen that it was an EHPad and returned false. This would prevent us from reaching this code with IBB as an EHPad.
Looks like this code was originally added in rL37427 (ancient) and made dead in rL143001.
Reviewers: rnk, void, efriedma
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57358
llvm-svn: 353375
There was a lot of repeated code wrt unary math intrinsics in
translateKnownIntrinsic. This factors out the repeated MIRBuilder code into
two functions: translateSimpleUnaryIntrinsic and getSimpleUnaryIntrinsicOpcode.
This simplifies adding simple unary intrinsics, since after this, all you have
to do is add the mapping to SimpleUnaryIntrinsicOpcodes.
Differential Revision: https://reviews.llvm.org/D57774
llvm-svn: 353316
Summary:
According to
https://docs.nvidia.com/cuda/archive/10.0/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf,
the compiler should emit the DW_AT_address_class attribute for all
variable and parameter. It means, that DW_AT_address_class attribute
should be used in the non-standard way to support compatibility with the
cuda-gdb debugger.
Clang is able to generate the information about the variable address
class. This information is emitted as the expression sequence
`DW_OP_constu <DWARF Address Space> DW_OP_swap DW_OP_xderef`. The patch
tries to find all such expressions and transform them into
`DW_AT_address_class <DWARF Address Space>` if target is NVPTX and the debugger is gdb.
If the expression is not found, then default values are used. For the
local variables <DWARF Address Space> is set to ADDR_local_space(6), for
the globals <DWARF Address Space> is set to ADDR_global_space(5). The
values are taken from the table in the same section 5.2. CUDA-Specific
DWARF Definitions.
Reviewers: echristo, probinson
Subscribers: jholewinski, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D57157
llvm-svn: 353203
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.
I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.
The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.
Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.
Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.
This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.
As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.
Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D57377
llvm-svn: 353152
The fewerElementsVectors implementation for load/stores
handles the scalar reduction case just as well, so drop
the redundant code in narrowScalar. This also introduces
support for narrowing irregular size breakdowns for
scalars.
llvm-svn: 353125
Summary:
If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size.
This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store.
This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size.
I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57616
llvm-svn: 353124
Don't handle vector conditions.
I think this can be merged in the future with
fewerElementsVectorSelect, although this becomes slightly tricky with
a vector condition.
llvm-svn: 353122
Try to use the underlying source registers.
This enables legalization in more cases where some irregular
operations are widened and others narrowed.
This seems to make the test_combines_2 AArch64 test worse, since the
MERGE_VALUES has multiple uses. Since this should be required for
legalization, a hasOneUse check is probably inappropriate (or maybe
should only be used if the merge is legal?).
llvm-svn: 353121
A number of of tests were using imm operands, not cimm. Since CSE
relies on the exact ConstantInt* pointer used, and implicit
conversions are generally evil, also enforce the bitsize of the types.
llvm-svn: 353113
Aliases of functions are now marked as function symbols even if
they are bitcast to some other other non-function type.
This is important for WebAssembly where object and function
symbols can't alias each other.
Fixes PR38866
Differential Revision: https://reviews.llvm.org/D57538
llvm-svn: 353109
The LiveDebugValues pass recognizes spills but not restores, which can
cause large gaps in location information for some variables, depending
on control flow. This patch make LiveDebugValues recognize restores and
generate appropriate DBG_VALUE instructions.
This patch was posted previously with r352642 and reverted in r352666 due
to buildbot errors. A missing return statement was the cause for the
failures.
Reviewers: aprantl, NicolaPrica
Differential Revision: https://reviews.llvm.org/D57271
llvm-svn: 353089
This fixes two problems with CSE done in buildConstant. First, this
would hit an assert when used with a vector result type. Solve this by
allowing CSE on the vector elements, but not on the result vector for
now.
Second, this was also performing the CSE based on the input
ConstantInt pointer. The underlying buildConstant could potentially
convert the constant depending on the result type, giving in a
different ConstantInt*. Stop allowing the APInt and ConstantInt forms
from automatically casting to the result type to avoid any similar
problems in the future.
llvm-svn: 353077
This reverts commit 8bbd570fd5205a04d88d2e5513a6e4adbd028039.
Apparently adding ffloor breaks AMDGPU somehow, so I need to back this out
while I look into it.
llvm-svn: 353064
Add an intrinsic that takes 2 unsigned integers with the scale of them
provided as the third argument and performs fixed point multiplication on
them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D55625
llvm-svn: 353059
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's
insertion point.
I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.
The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.
llvm-svn: 353056
Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case.
We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage.
llvm-svn: 353044
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).
Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".
These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.
When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.
Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.
Differential Revision: https://reviews.llvm.org/D57244
llvm-svn: 353043
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
of condition registers. That diff allows PPC to produce
slightly better (dot-instructions should be generally good)
code.
Note: someone that cares about PPC codegen might want to
look closer at that output because it seems like we could
still improve this.
2. We (probably?) should not bother trying to form uaddo (or
other overflow ops) when there's no target support for such
an op. This goes beyond checking whether the op is expanded
because both PPC and AArch64 show better codegen for standard
types regardless of whether the op is legal/custom.
llvm-svn: 353001
This is not truly NFC because we are bailing out without
a TLI now. That should not be a real concern though because
there should be a TLI in any real-world scenario.
That seems better than passing around a pointer and then
checking it for null-ness all over the place.
The motivation is to fix what appears to be an unintended
restriction on the uaddo transform -
hasMultipleConditionRegisters() shouldn't be reason to limit
the transform.
llvm-svn: 352988
For the scalar case only.
Also move the similar G_MERGE_VALUES handling to a separate function
and cleanup to make them look more similar.
llvm-svn: 352979
We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts.......
I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary.
llvm-svn: 352961
Background: At the moment, we record the AtomicOrdering of an access in the MMO, but also mark any atomic access as volatile in SelectionDAG. I'm working towards separating that. See https://reviews.llvm.org/D57601 for context.
Update all usages of isVolatile in lib/CodeGen to preserve behaviour once atomic MMOs stop being also volatile. This is NFC in it's current form, but is essential for correctness once we make that final change.
It useful to keep in mind that AtomicSDNode is not a parent of LoadSDNode, StoreSDNode, or LSBaseSDNode. As a result, any call to isVolatile on one of those static types doesn't need a companion isAtomic check. We should probably adjust that class hierarchy long term, but for now, that seperation is useful.
I'm deliberately being conservative about handling. I want the change to stop adding volatile to be NFC itself, and then will work through places where we can be less conservative for atomics one by one in separate changes w/tests.
Differential Revision: https://reviews.llvm.org/D57596
llvm-svn: 352937
Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present.
Reviewers: rnk, efriedma, ssijaric, TomTan
Reviewed By: rnk, efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D57183
llvm-svn: 352923
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57173
llvm-svn: 352913
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
The version of FoldConstantArithmetic() that takes arbitrary nodes
was confusingly naming those nodes as constants when they might
not be; also "Cst" reads like "Cast".
llvm-svn: 352884
This might be the start of tracking all vector element constants generally if we take it to its
logical conclusion, but let's stop here and make sure this is correct/beneficial so far.
The affected tests require a convoluted path before they get simplified currently because we
don't call SimplifyDemandedVectorElts() from binops directly and don't modify the binop operands
directly in SimplifyDemandedVectorElts().
That's why the tests all have a trailing shuffle to induce a chain reaction of transforms. So
something like this is happening:
1. Improve the knowledge of undefs in the binop via a SimplifyDemandedVectorElts() call that
originates from a shuffle.
2. Transfer that undef knowledge back to the shuffle mask user as more undef lanes.
3. Combine the modified shuffle by calling SimplifyDemandedVectorElts() again.
4. Translate the improved shuffle mask as undemanded lanes of build vector constants causing
those to become full undef constants.
5. Simplify the binop now that it has a full undef operand.
As we can see from the unchanged 'and' and 'or' tests, tracking undefs alone isn't a full solution.
We would need to track zero and all-ones constants to improve those opcodes. We'd probably need to
track NaN for FP ops too (assuming we don't have fast-math-flags set).
Differential Revision: https://reviews.llvm.org/D57066
llvm-svn: 352880
Previously, LiveRegUnits was assuming that if a block has no successors
and does not return, then no registers are live at the end of it
(because the end of the block is unreachable). This was causing the
register scavenger to use callee-saved registers to materialise stack
frame addresses without saving them in the prologue. This would normally
be fine, because the end of the block is unreachable, but this is not
legal if the block ends by throwing a C++ exception. If this happens,
the scratch register will be modified, but its previous value won't be
preserved, so it doesn't get restored by the exception unwinder.
Differential revision: https://reviews.llvm.org/D57381
llvm-svn: 352844
For targets where i32 is not a legal type (e.g. 64-bit RISC-V),
LegalizeIntegerTypes must promote the integer operand of ISD::FPOWI. As this
is a signed value, this should be sign-extended.
This patch enables all tests in test/CodeGen/RISCVfloat-intrinsics.ll for
RV64, as prior to this patch that file couldn't be compiled for RV64 due to an
assertion when performing codegen for fpowi.
Differential Revision: https://reviews.llvm.org/D54574
llvm-svn: 352832
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
This reverts commit f47d6b38c7 (r352791).
Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.
llvm-svn: 352800
This patch fixes pr39098.
For the attached test case, CombineZExtLogicopShiftLoad can optimize it to
t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
t58: i64 = srl t57, Constant:i8<1>
t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64
But later visitANDLike transforms it to
t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
t61: i32 = truncate t57
t63: i32 = srl t61, Constant:i8<1>
t64: i32 = and t63, Constant:i32<524287>
t65: i64 = zero_extend t64
t58: i64 = srl t57, Constant:i8<1>
t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64
And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop.
Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free.
Differential Revision: https://reviews.llvm.org/D57491
llvm-svn: 352792
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352791
While dangling nodes will eventually be pruned when they are
considered, leaving them disables combines requiring single-use.
Reviewers: Carrot, spatel, craig.topper, RKSimon, efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D57520
llvm-svn: 352784
r zero scale SMULFIX, expand into MUL which produces better code for X86.
For vector arguments, expand into MUL if SMULFIX is provided with a zero scale.
Otherwise, expand into MULH[US] or [US]MUL_LOHI.
Differential Revision: https://reviews.llvm.org/D56987
llvm-svn: 352783
This ensures that if we make it to the backend w/o lowering widenable_conditions first, that we generate correct code. Doing it in CGP - instead of isel - let's us fold control flow before hitting block local instruction selection.
Differential Revision: https://reviews.llvm.org/D57473
llvm-svn: 352779
And instead just generate a libcall. My motivating example on ARM was a simple:
shl i64 %A, %B
for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.
Differential Revision: https://reviews.llvm.org/D57386
llvm-svn: 352736
This change reverts r351626.
The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.)
llvm-svn: 352722
Also fix an alignment bug getMachineMemOperand. If the
tracked value is null, the offset isn't tracked so the
base alignment needs to be reduced.
llvm-svn: 352716
Summary:
Fixes PR40267, in which the removed assertion was triggering on
perfectly valid IR. As far as I can tell, constant out of bounds
indices should be allowed when splitting extract_vector_elt, since
they will simply be propagated as out of bounds indices in the
resulting split vector and handled appropriately elsewhere.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya
Differential Revision: https://reviews.llvm.org/D57471
llvm-svn: 352702
These can be triggered by mistakenly using a 64-bit mode only intrinsics with a -mtriple=i686. Using report_fatal_error gives a better experience for this mistake in release builds instead of probably crashing.
We already do this for some of the vector type legalization handles.
llvm-svn: 352699
This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also
allows us to select G_FEXP.
It...
- Updates the legalizer-info tests
- Adds a test for legalizing exp
- Updates the existing fp tests to show that we can now select G_FEXP
https://reviews.llvm.org/D57483
llvm-svn: 352692
This extends the existing transform for:
add X, 0/1 --> sub X, 0/-1
...to allow the sibling subtraction fold.
This pattern could regress with the proposed change in D57401.
llvm-svn: 352680
This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters
it.
It updates the existing floating point tests to show that we don't fall back on
the intrinsic, and select the correct instructions. It also adds a legalizer
test for G_FLOG2.
https://reviews.llvm.org/D57357
llvm-svn: 352673