VarLoc based LiveDebugValues will abandon variable location propagation if
there are too many blocks and variable assignments in the function. If it
didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end
up with 1 million DBG_VALUEs just at the start of blocks.
Instruction-referencing LiveDebugValues should honour this limitation too
(because the same limitation applies to it). Hoist the relevant command
line options into LiveDebugValues.cpp and pass it down into the
implementation classes as an argument to ExtendRanges. I've duplicated all
the run-lines in live-debug-values-cutoffs.mir to have an
instruction-referencing flavour.
Differential Revision: https://reviews.llvm.org/D107823
In streaming mode most of the NEON instruction set is illegal, disable
NEON when compiling with `+streaming-sve`, unless NEON is explictly
requested.
Subsequent patches will add support for the small subset of NEON
instructions that are legal in streaming mode.
Reviewed By: paulwalker-arm, david-arm
Differential Revision: https://reviews.llvm.org/D107902
Reset cl::Positional, cl::Sink and cl::ConsumeAfter options as well in cl::ResetCommandLineParser().
Reviewed By: rriddle, sammccall
Differential Revision: https://reviews.llvm.org/D103356
This enables printing of the mnemonics that contain the predicate
in the Intel printer. This requires accounting for the memory size
that is explicitly printed in Intel syntax. Those changes have been
synced to the ATT printer as well.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D108093
This allows commuting any immediate value. The previous code only
commuted equality immediates. This was inherited from an earlier
version of VCMPSSZrm/VCMPSDZrm.
This ensures that debug_types references aren't looked for in
debug_info section.
Behavior is still going to be questionable in an unlinked object file -
since cross-cu references could refer to symbols in another .debug_info
(or, in theory, .debug_types) chunk - but if a producer only uses
ref_addr to refer to things within the same .debug_info chunk in an
object file (eg: whole program optimization/LTO - producing two CUs into
a single .debug_info section in an object file - the ref_addrs there
could be resolved relative to that .debug_info chunk, not needing to
consider comdat (DWARFv5 type units or other creatures) chunks of
.debug_info, etc)
They were already added to findCommuteOpIndices, but they also
need to be in X86InstrInfo::commuteInstructionImpl in order
to adjust the immediate control.
visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when
removing "redundant" INSERT_SUBVECTOR operations. This patch adds
an extra check to ensure such combines only occur after operation
legalisation if any resulting BITBAST is itself legal.
Differential Revision: https://reviews.llvm.org/D108086
This adds a call to matchSAddSubSat from smin/smax instrinsics, allowing
the same patterns to match if the canonical form of a min/max is an
intrinsics, not a icmp/select.
Differential Revision: https://reviews.llvm.org/D108077
Currently/previously, while SCEV guaranteed that it produces the same value,
the way it was produced may be illegal IR, so we have an ugly check that
the replacement is valid.
But now that the SCEV strictness wrt the pointer/integer types has been improved,
i believe this invariant is already upheld by the SCEV itself, natively.
I think we should add an assertion, wait for a week, and then, if all is good,
rip out all this checking.
Or we could just do the latter directly i guess.
This reverts commit rL127839.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D108043
After recent changes, exit counts and BE taken counts are always
integers, so convert these to assertions.
While here, also convert the loop invariance checks to asserts.
Exit counts are always loop invariant.
libgcc and libunwind have different flavours of __register_frame. Both
flavours are already correctly handled, except that the code to handle
the libunwind flavour is guarded by __APPLE__. This change uses the
presence of __unw_add_dynamic_fde in libunwind instead to detect whether
libunwind is used, rather than hardcoding it as Apple vs. non-Apple.
Fixes PR44074.
Thanks to Albert Jin <albert.jin@gmail.com> and Chris Schafmeister
<chris.schaf@verizon.net> for identifying the problem.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D106129
Previously we emitted a "does not support scalable vectors"
remark for all targets whenever vectorisation is attempted. This
pollutes the output for architectures that don't support scalable
vectors and is likely confusing to the user.
Instead this patch introduces a debug message that reports when
scalable vectorisation is allowed by the target and only issues
the previous remark when scalable vectorisation is specifically
requested, for example:
#pragma clang loop vectorize_width(2, scalable)
Differential Revision: https://reviews.llvm.org/D108028
This is a non-intrusive fix for
https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport
to the 13.x release branch. It expands on the current hack by
distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have
the obvious meaning and 2 means "anything else". The new optimization
from D98564 should only be performed for CmpValue of 0 or 1.
For main, I think we should switch the analyzeCompare() and
optimizeCompare() APIs to use int64_t instead of int, which is in
line with MachineOperand's notion of an immediate, and avoids this
problem altogether.
Differential Revision: https://reviews.llvm.org/D108076
This patch unify optimizeELF_x86_64_GOTAndStubs and optimizeMachO_x86_64_GOTAndStubs into a pure optimize_x86_64_GOTAndStubs
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D108025
The current LIR does not deal with runtime-determined memset-size. This patch
utilizes SCEV and check if the PointerStrideSCEV and the MemsetSizeSCEV are equal.
Before comparison the pass would try to fold the expression that is already
protected by the loop guard.
Testcase file `memset-runtime.ll`, `memset-runtime-debug.ll` added.
This patch deals with proper loop-idiom. Proceeding patch wants to deal with SCEV-s
that are inequal after folding with the loop guards.
Reviewed By: lebedev.ri, Whitney
Differential Revision: https://reviews.llvm.org/D107353
Another simple cleanups set in DSE. CheckCache is removed since 1f1145006b and in consequence KnownNoReads is useless.
Also update description of MemorySSAScanLimit which default value is 150 instead 100.
Differential Revision: https://reviews.llvm.org/D107812
This is a fairly common pattern:
```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```
We have combines to eliminate G_AND with a mask that does nothing.
If we combined the above to this:
```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```
We'd be able to take advantage of those combines using the trunc + zext.
For this to work (or be beneficial in the best case)
- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
value than performing it at the original width *after masking.*
Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj
At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.
Differential Revision: https://reviews.llvm.org/D107929
Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL,
so we still have the awkward unknown OS case to deal with. Previously
if the HSA ABI intrinsics appeared, we we would not add the ABI
registers to the function. We would emit an error later, but we still
need to produce some compile result. Start adding the registers to any
compute function, regardless of the OS. This keeps the internal state
more consistent, and will help avoid numerous test crashes in a future
patch which starts assuming the ABI inputs are present on functions by
default.
Previously we would allow promotion even if the byval/inalloca
attributes on the call and the callee didn't match.
It's ok if the byval/inalloca types aren't the same. For example, LTO
importing may rename types.
Fixes PR51397.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D107998
AttributeList::hasAttribute() is confusing. In an attempt to change the
name to something that suggests using other methods, fix up some
existing uses.
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().
Add hasRetAttr() since it was missing from AttributeList.
It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual
FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this
intrinsic needs to remain as a call as it cannot be lowered to any instruction, which
also means we need to disable CTR loop generation for fma involving the ppcf128 type.
This patch accomplishes this behaviour.
Differential Revision: https://reviews.llvm.org/D107914
These are lowered, matching SDAG behaviour. (See
llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll)
These fall back ~159 times on a build of clang with GISel enabled.
Differential Revision: https://reviews.llvm.org/D107777
Summary:
The assertion that both functions were not missing was incorrect and would
fail when one of the functions was missing. Fixed it and moved the
assertion earlier to check the input parameters to better capture
first-failure. Added lit test.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D107989
Since then, the SCEV pointer handling as been improved,
so the assertion should now hold.
This reverts commit b96114c1e1,
relanding the assertion from commit 141e845da5.
It might changed the condition of a branch into a constant,
so we should restart and constant-fold terminator,
instead of continuing with the tautological "conditional" branch.
This fixes the issue reported at https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff
There is an assertion failure in computeOverflowForUnsignedMul
(used in checkOverflow) due to the inner and outer trip counts
having different types. This occurs when the IV has been widened,
but the loop components are not successfully rediscovered.
This is fixed by some refactoring of the code in findLoopComponents
which identifies the trip count of the loop.
This patch uses a switch statement to map the ELF_x86_64's edge kind to generic edge kind, and merge the ELF_x86_64 's applyFixup function to the x86_64 's applyFixup function. Some edge kinds were not have corresponding generic edge kinds, so I added three generic edge kinds asa follows:
1. RequestGOTAndTransformToDelta64, which is similar to RequestGOTAndTransformToDelta32.
2. GOTDelta64. This generic kind is similar to Delta64, except the GOTDelta64 computes the delta relative to GOTSymbol
3. RequestGOTAndTransformToGOTDelta64. This edge kind was used to deal with ELF_x86_64's GOT64 edge kind, it request the fixGOTEdge function to change the target to GOT entry, and set the edge kind to generic edge kind GOTDelta64.
These added generic edge kinds may named haphazardly, or can't express its meaning well.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D107967
Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives.
CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards.
In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0
Also see discussion in D105706.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D107540
Besides SPMDization, other analysis and optimization for original, frontend-generated SPMD regions uses information from the AAKernelInfoFunction attribute. This fix makes sure disabling SPMDization through the corresponding option applies only to generic mode regions, which should not be SPMDized, while it leaves unaffected the attribute state of original SPMD regions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108001
We may use several COPY instructions to copy the needed sub-registers
during split. But the way we split the lanes during the COPYs may be
different from the subranges of the old register. This would fail when we
extend the subranges of the new register because the LaneMasks do not
match exactly between subranges of new register and old register.
Since we are bundling the COPYs, I think there is no need to further refine the
subranges of the new register based on the set of LaneMasks of the inserted COPYs.
I am not sure if there will be further breaking cases. But as the subranges of
new register are created based on the LaneMasks of the subranges of old register,
it will be highly possible we will always find an exact LaneMask match.
We can think about how to make the extendPHIKillRanges() work for
subrange mask mismatch case if we meet more such cases in the future.
The test case was from D105065 by @arsenm.
Differential Revision: https://reviews.llvm.org/D107829
For SjLj, we allocate a table to record setjmp buffer info in the entry
of each setjmp-calling function by inserting a `malloc` call, and insert
a `free` call to free the buffer before each `ret` instruction.
But this is not sufficient; we have to free the buffer before we throw.
In SjLj handling, normal functions that can possibly throw or longjmp
are wrapped with an invoke and caught within the function so they don't
end up escaping the function. But three functions throw and escape the
function:
- `__resumeException` (Emscripten library function used for Emscripten
EH)
- `emscripten_longjmp` (Emscripten library function used for Emscripten
SjLj)
- `__cxa_throw` (libc++abi function called when for C++ `throw` keyword)
The first two functions are used to rethrow the current
exception/longjmp when the caught exception/longjmp is not for the
current function. `__cxa_throw` is used for exception, and because we
consider that a function that cannot longjmp, it escapes the function
right away, before which we should free the buffer.
Currently `lsan.test_longjmp3` and `lsan.test_exceptions_longjmp3` fail
in Emscripten; this CL fixes these.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D107852
Currently, when Wasm EH is used with Emscripten SjLj, Emscripten SjLj
cannot handle `invoke` instructions - it assumes all `invoke`s have been
lowered away with Emscripten EH. But in Wasm EH they are lowered in
instruction selection, so they are still present in the IR stage. This
happens when
1. Wasm EH and Emscripten SjLj are used together
2. A function that calls `setjmp` uses exceptions, i.e., has `invoke`s
We were already erroring out with an assertion failure in this case, but
this CL makes it error out more properly with a valid error message.
Wasm EH + Wasm SjLj will not have this restrictions. (it will have
another restriction though, e.g., `setjmp` cannot be called within
`catch`. But why would anyone do that..)
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D107687
Add builtin and intrinsic for `__addex`.
This patch is part of a series of patches to provide builtins for
compatibility with the XL compiler.
Reviewed By: stefanp, nemanjai, NeHuang
Differential Revision: https://reviews.llvm.org/D107002
This is a re-try of 6de1dbbd09 which was reverted because
it missed a null check. Extra test for that failure added.
Original commit message:
This is an adaptation of D41603 and another step on the way
to canonicalizing to the intrinsic forms of min/max.
See D98152 for status.
This reverts the revert 28c04794df.
The failing MLIR test that caused the revert should be fixed in this
version.
Also includes a PPC test fix previously in 1f87c7c478.
For unit-stride and strided load/stores we set the SEW operand of
the pseudo instruction equal the EEW in the opcode. The LMUL
of the pseudo instruction is the LMUL we want.
These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use
this to avoid changing vtype if the SEW/LMUL of the previous
vtype matches the EEW/EMUL ratio we need for the instruction.
Due to how the global analysis works, we can only do this
optimization when the previous vsetvli was produced in the block
containing the store. We need to know in the first phase if the
vsetvli will be inserted so we can propagate information to
the successors in the second phase correctly. This means we can't
depend on predecessors.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D106601
We really shouldn't deal with a conditional branch that can be trivially
constant-folded into an unconditional branch.
Indeed, barring failure to trigger BB reprocessing, that should be true,
so let's assert as much, and hope the assertion never fires.
If it does, we have a bug to fix.
Mainly, i want to add an assertion that `SimplifyCFGOpt::simplifyCondBranch()`
doesn't get asked to deal with non-unconditional branches,
and if i do that, then said assertion fires on existing tests,
and this is what prevents it from firing.
This is a direct translation of the select folds added with
D53033 / D53036 and another step towards canonicalization
using the intrinsics (see D98152).
Currently, LNICM pass does not support sinking instructions out of loop nest.
This patch enables LNICM to sink down as many instructions to the exit block of outermost loop as possible.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D107219
Given a constant operand, the MVE and DAGCombine combines could fight,
each redistributing in the opposite order. Add a guard to the MVE
vecreduce distribution to prevent that.
When depth > 0, callee frame address is used to compute the return address of
callee producing improper return address. This patch adds the fix to use caller
frame address to compute the return address of callee.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D107646
This patch adjusts the intrinsics definition of
llvm.matrix.column.major.load and llvm.matrix.column.major.store to
allow overloading the type of the stride. The bitwidth of the stride is
used to perform the offset computation.
This fixes a crash when using __builtin_matrix_column_major_load or
__builtin_matrix_column_major_store on 32 bit platforms. The stride argument
of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit
platforms.
Note that we still perform offset computations with 64 bit width on 32
bit platforms for accesses that do not take a user-specified stride.
This can be fixed separately.
Fixes PR51304.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D107349
This patch enables extending loads for fixed length SVE code generation.
There is a slight regression here in the mulh tests; since these tests
load the parameter and then extend it these are treated as extending
loads which are merged, preventing the mulh instruction from being
generated. As this affects scalable SVE codegen as well this should be
addressed in a separate patch.
Reviewed By: bsmith
Differential Revision: https://reviews.llvm.org/D107057
The register classes are generated by TableGen, use them instead of
handwritten tables.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D107763
Originally committed as ffc3fb665d
Reverted in fcf2d5f402 due to an
assertion failure.
Original commit message:
Allow the folding even if there is an
intervening bitcast.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D106667
These rules were originally written when the new predicate based legalizer
was introduced in an attempt to preserve existing behaviour. It wasn't
properly kept up to date as things like vector support was split out into
G_CONCAT_VECTORS, and frankly, even if it was, it was too complex.
It's much easier to start from scratch with what we can actually support,
which is just a few type combinations. Anything illegal we should either
legalize, or should be eliminated as a side effect of artifact combination.
Differential Revision: https://reviews.llvm.org/D107937
The assertion can fail in some cases when an i16 constant is promoted
to i32.
e.g. in the added test case the value `i16 -32768` is within the range
of i16 but the assert fails when the constant is promoted to positive
`i32 32768` by an earlier call to DAG.getConstant().
Differential Revision: https://reviews.llvm.org/D107880
Change-Id: I2f6179783cbc9630e6acab149a762b43c65664de
release call
findSafeStoreForStoreStrongContraction checks whether it's safe to move
the release call to the store by inspecting all instructions between the
two, but was ignoring retain instructions. This was causing objects to
be released and deallocated before they were retained.
rdar://81668577
This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register
allocation, and Pass2 of MIRAddFSDiscriminatorsPass before
Block-Placement. This is still under --enable-fs-discrmininator
option (default false).
This would reduce the turn-around time for FSAFDO transition.
Differential Revision: https://reviews.llvm.org/D104579
This is a quick fix for a motivating case that looks like this:
https://godbolt.org/z/GeMqzMc38
As noted, we might be able to restore the min/max patterns
with select folds, or we just wait for this to become easier
with canonicalization to min/max intrinsics.
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX)
due to source changes (e.g. `#if` code runs for profile generation but not for profile use)
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.
Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.
Differential Revision: https://reviews.llvm.org/D104431
We are running into more and more cases where the liveouts of low
overhead loops do not validate. Add some extra debug messages to make it
clearer why.
The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris.
Initially, as reported in Bug 49437, it caused dozens of testsuite failures
on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but
`SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag
(in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a
completely different semantics, which confuses Solaris `ld` very much.
Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris
`ld` doesn't support, causing it to SEGV and break the build. The linker
is currently being hardened to not accept non-native OS ABIs to avoid this.
The need for linker support is already documented in
`clang/include/clang/Basic/AttrDocs.td`, but not currently checked.
This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all.
Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D107747
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX).
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.
Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D104431
When none of the translation units in the binary have been instrumented
we shouldn't need to link the profile runtime. However, because we pass
-u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still
be pulled in and incur some overhead. On Fuchsia which uses runtime
counter relocation, it also means that we cannot reference the bias
variable unconditionally.
This change modifies the InstrProfiling pass to pull in the profile
runtime only when needed by declaring the __llvm_profile_runtime symbol
in the translation unit only when needed. For now we restrict this only
for Fuchsia, but this can be later expanded to other platforms. This
approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation,
but that limitation may no longer apply, and it certainly doesn't apply
on platforms like Fuchsia.
Differential Revision: https://reviews.llvm.org/D98061
PHI nodes are not pass through but change their value, we have to
account for that to avoid missing stores.
Follow up for D107798 to fix PR51249 for good.
Differential Revision: https://reviews.llvm.org/D107808
AAPointerInfoFloating needs to visit all uses and some multiple times if
we go through PHI nodes. Attributor::checkForAllUses keeps a visited set
so we don't recurs endlessly. We now allow recursion for non-phi uses so
we track all pointer offsets via PHI nodes properly without endless
recursion.
This replaces the first attempt D107579.
Differential Revision: https://reviews.llvm.org/D107798
To avoid simplification with wrong constants we need to make sure we
know that we won't perform specific optimizations based on the users
request. The non-SPMDzation and non-CustomStateMachine flags did only
prevent the final transformation but allowed to value simplification
to go ahead.
Differential Revision: https://reviews.llvm.org/D107862
We were calling find and then using operator[]. Instead keep the
iterator from find and use it to get the value.
Just happened to notice while investigating how we decide what extends
to use between basic blocks.
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
Pseudo probe descriptors are created very early in the pipeline where function names just come from the front end and are not yet decorated. So calling getCanonicalFnName on the function names in probe desc is basically a no-op, which also addes a depenency from MC to ProfileData unnessesarily.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D107838
This patch removes the hand-rolled implementation of salvageDebugInfo
for cast and GEPs and replaces it with a call into
llvm::salvageDebugInfoImpl().
A side-effect of this is that additional redundant convert operations
are introduced, but those don't have any negative effect on the
resulting DWARF expression.
rdar://80227769
Differential Revision: https://reviews.llvm.org/D107384
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.
1. Change the return value to I.getOperand(0). Currently users of
salvageDebugInfoImpl() assume that the first operand is
I.getOperand(0). This patch makes this information explicit. A
nice side-effect of this change is that it allows us to salvage
expressions such as add i8 1, %a in the future.
2. Factor out the creation of a DIExpression and return an array of
DIExpression operations instead. This change allows users that
call salvageDebugInfoImpl() in a loop to avoid the costly
creation of temporary DIExpressions and to defer the creation of
a DIExpression until the end.
This patch does not change any functionality.
rdar://80227769
Differential Revision: https://reviews.llvm.org/D107383
When converting a store into a memset, we currently insert the new
MemoryDef after the store MemoryDef, which requires all uses to be
renamed to the new def using a whole block scan. Instead, we can
insert the new MemoryDef before the store and not rename uses,
because we know that the location is immediately overwritten, so
all uses should still refer to the old MemoryDef. Those uses will
get renamed when the old MemoryDef is actually dropped, which is
efficient.
I expect something similar can be done for some of the other MSSA
updates in MemCpyOpt. This is an alternative to D107513, at least
for this particular case.
Differential Revision: https://reviews.llvm.org/D107702
Clang diagnostics refer to identifier names in quotes.
This patch makes inline remarks conform to the convention.
New behavior:
```
% clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c
a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline]
int bar(int a) { return foo(a); }
^
```
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D107791
The intrinsics have an extra chunk of known bits logic
compared to the normal cmp+select idiom. That allows
folding the icmp in each case to something better, but
that then opposes the canonical form of min/max that
we try to form for a select.
I'm carving out a narrow exception to preserve all
existing regression tests while avoiding the inf-loop.
It seems unlikely that this is the only bug like this
left, but this should fix:
https://llvm.org/PR51419
We may call lowerRelativeReference in MC to determine whether target
supports this lowering. We should return nullptr instead of crashing
when we haven't implemented the real lowering.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D107830
The requested register class priorities weren't respected
globally. Not sure why this is a target option, and not just the
expected behavior (recently added in
1a6dc92be7). This avoids an allocation
failure when many wide tuple spills are introduced. I think this is a
workaround since I would not expect the allocation priority to be
required, and only a performance hint. The allocator should be smarter
about when only a subregister needs to be spilled and restored.
This does regress a couple of degenerate store stress lit tests which
shouldn't be too important.
Similar for sub except sub isn't commutative.
Modify the existing and/or/xor folds to also work on ISD::SELECT
and not just RISCVISD::SELECT_CC. This is needed to make sure
we do this transform before type legalization turns i32 add/sub
into add/sub+sign_extend_inreg on RV64. If we don't do this before
that, the sign_extend_inreg will still be after the select.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D107603
This is already done within InstCombine:
https://alive2.llvm.org/ce/z/MiGE22
...but leaving it out of analysis makes it
harder to avoid infinite loops there.
This is already done within InstCombine:
https://alive2.llvm.org/ce/z/MiGE22
...but leaving it out of analysis makes it
harder to avoid infinite loops there.
This was checking extends as shuffles, where as we should be checking
the operands. This helps sink the shuffles, creating more addl/subl
instructions.
Differential Revision: https://reviews.llvm.org/D107623
If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be
shifted individually by the constant shift amount.
However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization
code was used, producing intermediate shifts that are potentially illegal on some targets.
This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D89100
Avoid stack overflow errors on systems with small stack sizes
by removing recursion in FoldCondBranchOnPHI.
This is a simple change as the recursion was only iteratively
calling the function again on the same arguments.
Ideally this would be compiled to a tail call, but there is
no guarantee.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D107803
This changes a couple of calls to LiveRegs.contains to
!LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a
test to look more correct to me, given r7 should be the frame pointer so
is not available), and another in the ARMLoadStoreOptimizer, that I
don't have a test for, it was just found by inspection.
Differential Revision: https://reviews.llvm.org/D107454
C++20 no longer requires the failure memory ordering to be no stronger than the
success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge
instruction memory orderings
Add common operation to merge memory orders that allows non strict memory
orderings to be combined. Use it in SIMemoryLegalizer and
MachineMemOperand::getMergedOrdering.
Reviewed By: efriedma, rampitec
Differential Revision: https://reviews.llvm.org/D106729
I have updated cheapToScalarize to also consider the case when
extracting lanes from a stepvector intrinsic. This required removing
the existing 'bool IsConstantExtractIndex' and passing in the actual
index as a Value instead. We do this because we need to know if the
index is <= known minimum number of elements returned by the stepvector
intrinsic. Effectively, when extracting lane X from a stepvector we
know the value returned is also X.
New tests added here:
Transforms/InstCombine/vscale_extractelement.ll
Differential Revision: https://reviews.llvm.org/D106358
This patch updates ConstantVector::getSplat to use poison instead
of undef when using insertelement/shufflevector to splat.
This follows on from D93793.
Differential Revision: https://reviews.llvm.org/D107751
Replace vector unpack operation with a scalar extend operation.
unpack(splat(X)) --> splat(extend(X))
If we have both, unpkhi and unpklo, for the same vector then we may
save a register in some cases, e.g:
Hi = unpkhi (splat(X))
Lo = unpklo(splat(X))
--> Hi = Lo = splat(extend(X))
Differential Revision: https://reviews.llvm.org/D106929
Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b
Move the last{a,b} operation to the vector operand of the binary instruction if
the binop's operand is a splat value. This essentially converts the binop
to a scalar operation.
Example:
// If x and/or y is a splat value:
lastX (binop (x, y)) --> binop(lastX(x), lastX(y))
Differential Revision: https://reviews.llvm.org/D106932
Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0
I was originally going to try to implement this in target-independent
code, but it's actually sort of tricky to generate the correct sequence
for vectors like nxv2f32. So just stick this in target-specific code,
at least for now.
Differential Revision: https://reviews.llvm.org/D107608
We use the CurrentBlock to determine whether we have already processed a
block. Don't reuse this variable for setting where we should insert the
rematerialization. The rematerialization block is different to the
current block when we rematerialize for coro suspend block users.
Differential Revision: https://reviews.llvm.org/D107573
The patterns for fixed length gather/scatter with 32-bit offsets and
64-bit memory type are slightly different that the rest of the patterns,
as such the lowering needs to be slightly different to ensure the
correct types are used.
Differential Revision: https://reviews.llvm.org/D107576
This patch is a revert of e08f205f5c. In that patch, DW_TAG_subprograms
were permitted to be referenced across CU boundaries, to improve stack
trace construction using call site information. Unfortunately, as
documented in PR48790, the way that subprograms are "owned" by dwarf units
is sufficiently complicated that subprograms end up in unexpected units,
invalidating cross-unit references.
There's no obvious way to easily fix this, and several attempts have
failed. Revert this to ensure correct DWARF is always emitted.
Three tests change in addition to the reversion, but they're all very
light alterations.
Differential Revision: https://reviews.llvm.org/D107076
Shuffles which are broken into separate halves reveal splats in which
a half is accessed via one index; such operations can be optimized to
use "vrgather.vi".
This optimization could be achieved by adding extra patterns to match
`vrgather_vv_vl` which uses a splat as an index operand, but this patch
instead identifies splat earlier. This way, future optimizations can
build on top of the data gathered here, e.g., to splat-gather dominant
indices and insert any leftovers.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D107449
And assign RegClass (i.e. operand class for all GPR) as the super class
of ARegClass and DRegClass. Note that this is a NFC change because
actually we already had XRDReg to model either address or data register
operands (as well as test coverage for it). The new super class syntax
added here is just making the relations between three RegClass-es more
explicit.
The decoder function and table are the same as FPR128, use that instead.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D107644
MIPS .debug_* sections should have SHT_MIPS_DWARF section type to
distinguish among sections contain DWARF and ECOFF debug formats, but in
assembly files these sections have SHT_PROGBITS (@progbits) type. Now
assembler shows 'changed section type for ...' error when parsing
`.section .debug_*,"",@progbits` directive for MIPS targets.
The same problem exists for x86-64 target and this patch extends
workaround implemented in D76151. The patch adds one more case
when assembler ignores section types mismatch after `SwitchSection()`
call.
Differential Revision: https://reviews.llvm.org/D107707
Previously we converted ISD condition codes to integers and stored
them directly in our MIR instructions. The ISD enum kind of belongs
to SelectionDAG so that seems like incorrect layering.
This patch instead uses a CondCode node on RISCV::SELECT_CC until
isel and then converts it from ISD encoding to a RISCV specific value.
This value can be converted to/from the RISCV branch opcodes in the
RISCV namespace.
My larger motivation is to possibly support a microarchitectural
feature of some CPUs where a short forward branch over a single
instruction can be predicated internally. This will require a new
pseudo instruction for select that needs to carry a branch condition
and live probably until RISCVExpandPseudos. At that point it can be
expanded to control flow without other instructions ending up in the
predicated basic block. Using an ISD encoding in RISCVExpandPseudos
doesn't seem like correct layering.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D107400
This is the data to be stored so it should be an input.
To keep operand order similar between loads and stores, move the temp
register to the first dest operand of floating point loads. Rework
the assembler code accordingly.
This doesn't have any functional effect because this Pseudo is only
used by the assembler which doesn't use ins/outs.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D107309
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).
The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.
Reviewed by: Ayal Zaks
Differential Revision: https://reviews.llvm.org/D104750
Previously ADD & ADDA (as well as SUB & SUBA) instructions are mixed
together, which not only violated Motorola assembly's syntax but also
made asm parsing more difficult. This patch separates these two kinds of
instructions migrate rest of the tests from
test/CodeGen/M68k/Encoding/Arithmetic to test/MC/M68k/Arithmetic.
Note that we observed minor regressions on codegen quality: Sometimes
isel uses ADD instead of ADDA even the latter can lead to shorter
sequence of code. This issue implies that some isel patterns might need
to be updated.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.
This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.
We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D107230
The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.
I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.
Differential Revision: https://reviews.llvm.org/D102113
In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D106872
The IR for pmuldq/pmuludq intrinsics uses a sext_inreg/zext_inreg
pattern on the inputs. Ideally we pattern match these away during
isel. It is possible for LICM or other middle end optimizations
to separate the extend from the mul. This prevents SelectionDAG
from removing it or depending on how the extend is lowered, we
may not be able to generate an AssertSExt/AssertZExt in the
mul basic block. This will prevent pmuldq/pmuludq from being
formed at all.
This patch teaches shouldSinkOperands to recognize this so
that CodeGenPrepare will clone the extend into the same basic
block as the mul.
Fixes PR51371.
Differential Revision: https://reviews.llvm.org/D107689
Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF),
so we should have a preference. It seems like mask+negation is better
than two shifts.
After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.
This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D100102
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().
We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
This commit adds the isnan intrinsic and provides a default expansion
for it in the SDAG. However, it makes the assumption that types
it operates on are IEEE-compliant types. This is not always the case.
An example of that is PPC "double double" which has a representation
that
- Does not need to conform to IEEE requirements for isnan as it is
not an IEEE-compliant type
- Does not have a representation that allows for straightforward
reinterpreting as an integer and use of integer operations
The result was that this commit broke __builtin_isnan for ppc_fp128
making many valid numeric values report a NaN.
This patch simply changes the expansion to always expand to unordered
comparison (regardless of whether FP exceptions are tracked). This
is inline with previous semantics.
There may be some generalizations (see test comments) of these patterns,
but this should handle the cases motivated by:
https://llvm.org/PR51315https://llvm.org/PR51259
The backend may want to transform differently, but at least for
the x86 examples that I looked at, there does not appear to be
any significant perf diff either way.
Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.
While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.
For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.
Differential Revision: https://reviews.llvm.org/D106401
fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D107618
- Loads from the constant memory (either explicit one or as the source
of memory transfer intrinsics) won't alias any stores.
Reviewed By: asbirlea, efriedma
Differential Revision: https://reviews.llvm.org/D107605
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.
This can make D107420 unnecessary.
Needs tests, but I wanted to start discussion about D107420.
Reviewed By: FreddyYe
Differential Revision: https://reviews.llvm.org/D107581
Some of the Arm complex pattern functions call canExtractShiftFromMul,
which can modify the DAG in-place. For this to be valid and handled
successfully we need to define ComplexPatternFuncMutatesDAG.
Differential Revision: https://reviews.llvm.org/D107476
1) add some self-diagnosis (when asserts are enabled) to check that all
features have the same nr of entries
2) avoid storing pointers to mutable fields because the proto API
contract doesn't actually guarantee those stay fixed even if no further
mutation of the object occurs.
Differential Revision: https://reviews.llvm.org/D107594
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.
HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.
To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.
Reviewed by: yaxunl
Differential Revision: https://reviews.llvm.org/D105682
D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage.
I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.
As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width.
The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data.
This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize.
Differential Revision: https://reviews.llvm.org/D107158