There are several places in the code that are currently broken as
they assume an Instruction always has a parent Function when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction has a parent.
I've added a test for a parentless @llvm.vscale intrinsic call here:
unittests/Analysis/ValueTrackingTest.cpp
Differential Revision: https://reviews.llvm.org/D110158
This is basically D108837 but for jump threading. Free instructions
should be ignored for the threading decision. JumpThreading already
skips some free instructions (like pointer bitcasts), but does not
skip various free intrinsics -- in fact, it currently gives them a
fairly large cost of 2.
Differential Revision: https://reviews.llvm.org/D110290
While both GlobalAlias and GlobalIFunc are GlobalIndirectSymbol, their
`getIndirectSymbol()` usage is quite different (GlobalIFunc's resolver
is an entity different from GlobalIFunc itself).
As discussed on https://lists.llvm.org/pipermail/llvm-dev/2020-September/144904.html
("[IR] Modelling of GlobalIFunc"), the name `getBaseObject` is confusing when
used with GlobalIFunc.
To resolve the confusion:
* Move GloalIndirectSymol::getBaseObject to GlobalAlias:: (GlobalIFunc should use `getResolver` instead)
* Change GlobalValue::getBaseObject not to inspect GlobalIFunc. Note: the function has 7 references.
* Add GlobalIFunc::getResolverFunction to peel off potential ConstantExpr indirection
(`strlen` in `test/LTO/Resolution/X86/ifunc.ll`)
Note: GlobalIFunc::getResolver (like GlobalAlias::getAliasee which does not peel
off ConstantExpr indirection) is kept to be used by ValueEnumerator.
Reviewed By: ibookstein
Differential Revision: https://reviews.llvm.org/D109792
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.
Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b1 to save compile-time. See PR50220 for more
context.
This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.
If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.
This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.
I will share a follow-up patch to also use the information for call
instructions to fix PR50220.
In terms of compile-time, the impact is low in general,
NewPM-O3: +0.05%
NewPM-ReleaseThinLTO: +0.05%
NewPM-ReleaseLTO-g: +0.03
with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions
Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%
The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions
Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D109844
The NFC commit e5692a564a changed the logic for
DomTreeUpdates to use the range [succ_begin, succ_begin) when
looking for SuccsOfPredBB rather than using [succ_begin, succ_end).
As the commit was NFC this is identified as a typo (it has been
discussed briefly in phabricator).
The typo was found when inspecting the code, so I've got no idea if
changing back to the old range has any significant impact (such as
solving any PR:s or causing some new problems). But at least this
restores the code to the originally indented behavior.
Bisecting and reducing opt pipelines that includes the
ModuleInlinerWrapperPass has turned out to be a bit problematic.
This is far from perfect (it still lacks information about inline
advisor params etc.), but it should give some kind of hint to what
the wrapped pipeline looks like when using -print-pipeline-passes.
Reviewed By: aeubanks, mtrofin
Differential Revision: https://reviews.llvm.org/D109878
This patch fixes a problem when the AAKernelInfo state was invalidated,
e.g., due to `optnone` for a kernel, but not all parts indicated the
invalidation properly. We further eliminate most full state
invalidations as they should never be necessary.
Differential Revision: https://reviews.llvm.org/D109468
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110279
When determining whether to fold branches to a common destination by
merging two blocks, SimplifyCFG will count the number of instructions to
be moved into the first basic block. However, there's no reason to count
free instructions like bitcasts and other similar instructions.
This resolves missed branch foldings with -fstrict-vtable-pointers in
llvm-test-suite's lambda benchmark.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D108837
Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110230
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics
We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.
This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)
In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110029
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110227
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110226
Summary:
The thread ID function was reintroduced in D110195, but could
potentially be removed by the optimizer. Make the function noinline to
preserve the call sites and add it to the externalization RAII so its
definition is not removed by the attributor.
IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.
This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like
using matrix_t = double __attribute__((matrix_type(15, 15)));
void foo(unsigned i, matrix_t &A, matrix_t &B) {
for (unsigned j = 0; j < 4; ++j)
for (unsigned k = 0; k < i; k++)
B[k][j] -= A[k][j] * B[i][j];
}
https://clang.godbolt.org/z/6dKxK1Ed7
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D102496
This reverts commit 2f6b07316f.
This caused several bots to hit an infinite loop at stage 2,
so it needs to be reverted while figuring out how to fix that.
The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C,
Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar.
SELECT VecC, ScalarIdx, 0
We could splat Idx to a vector but it defeats the purpose of
optimisation. Don't apply the folding rule in this case.
This fixes a regression from commit d561b6fbdb.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.
Suggested in D100302.
The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.
Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.
Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%
http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions
Reviewed By: spatel, lebedev.ri
Differential Revision: https://reviews.llvm.org/D110171
InstCombine's worklist can be re-used by other passes like
VectorCombine. Move it to llvm/Transform/Utils and rename it to
InstructionWorklist.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110181
Add a new LLVM switch `-profile-sample-block-accurate` to trust zero block counts for branches. Currently we leave out such zero counts when annotating branch weight metadata, which would lead to weights being considered as unknown.
Differential Revision: https://reviews.llvm.org/D110117
MergeICmps will currently sort (by offset) all comparisons in a chain,
including those that do not get merged. This is problematic in two ways:
* We may end up moving the original first block into the middle of
the chain, in which case the "extra work" instructions will also
be in the middle of the chain, resulting in invalid IR
(reported in https://reviews.llvm.org/D108782#3005583).
* Reordering branches is generally not legal, because it may
introduce branch on poison, which is UB (PR51845). The merging
done by MergeICmps is legal as long as we assume that memcmp()
works on frozen memory, but the reordering of unmerged comparisons
is definitely incorrect (without inserting freeze instructions),
so we should avoid it.
There are easier ways to fix the first issue, but I figured it was
worthwhile to do this properly to also fix the second one. What we
now do is to restore the original relative order of (potentially
merged) comparisons.
I took the liberty of dropping the MERGEICMPS_DOT_ON functionality,
because it would be more awkward to implement now (as the before and
after representation is different) and it doesn't seem terribly
useful nowadays.
Differential Revision: https://reviews.llvm.org/D110024
This patch fixes the crash found by PR51614:
whenever doing tail folding, interleave groups must be considered under mask.
Another fix D108900 follows for targets that support masked loads and stores:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse - which is currently not supported; rather than (only) asserting when
computing cost and generating code.
Differential Revision: https://reviews.llvm.org/D108891
We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer).
Requires reassoc.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D109954
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.
The use VectorCombine is updated to pass through the DT to enable
additional scalarization.
Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110175
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.
Also added an API for retrieving a unique undroppable user.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D109700
This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().
With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():
[...]
cont2.i:
br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i
handler.type_mismatch3.i:
%3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
unreachable
cont4.i:
unreachable
[...]
with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.
This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D109221
First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones.
Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too.
Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D106947
When following a case of a switch instruction is guaranteed to lead to
UB, we can safely break these edges and redirect those cases into a newly
created unreachable block. As result, CFG will become simpler and we can
remove some of Phi inputs to make further analyzes easier.
Patch by Dmitry Bakunevich!
Differential Revision: https://reviews.llvm.org/D109428
Reviewed By: lebedev.ri
We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.
Differential Revision: https://reviews.llvm.org/D110043
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.
Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.
The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.
The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.
Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.
Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.
Differential Revision: https://reviews.llvm.org/D105020
In default pipelines the ModuleInlinerWrapperPass is adding the
InlinerPass to the pipeline twice, once due to MandatoryFirst (passing
true in the ctor) and then a second time with false as argument.
To make it possible to bisect and reduce opt test cases for this
part of the pipeline we need to be able to choose between the two
different variants of the InlinerPass when running opt. This patch is
changing 'inline' to a CGSCC_PASS_WITH_PARAMS in the PassRegistry,
making it possible run opt with both -passes=cgscc(inline) and
-passes=cgscc(inline<only-mandatory>).
Reviewed By: aeubanks, mtrofin
Differential Revision: https://reviews.llvm.org/D109877
All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify
form and rely on it. Added test that shows that this actually stands.
This reverts commit 6fec6552f5.
The patch was reverted on incorrect claim that this patch may break LCSSA form
when the loop is not in a simplify form. All IndVars' transform insure that
the loop is in simplify and LCSSA form, so if it wasn't broken before this
transform, it will also not be broken after it.
The scev-based salvaging for LSR can sometimes produce unnecessarily
verbose expressions. This patch adds logic to detect when the value to
be recovered and the induction variable differ by only a constant
offset. Then, the expression to derive the current iteration count can
be omitted from the dbg.value in favour of the offset.
Reviewed by: aprantl
Differential Revision: https://reviews.llvm.org/D109044
The AAExecutionDomain instance checks if a BB is executed by the main
thread only. Currently, this only checks the `__kmpc_kernel_init` call
for generic regions to indicate the path taken by the main thread. In
the new runtime, we want to be able to detect basic blocks even in SPMD
mode. For this we enable it to check thread-ID intrinsics being compared
to zero as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D109849
Nobody has complained about this, and the documentation for
LLVMContext::yield() states that LLVM is allowed to never call it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D110008
This introduces an option to allow specialising on the address of global
values. This option is off by default because it is likely not that profitable
to do so and needs more investigation. Before, we were specialising on addresses
and thus this changes the default behaviour.
Differential Revision: https://reviews.llvm.org/D109775
Do not call `TryToShrinkGlobalToBoolean` for address spaces
that don't allow initializers. It inserts an initializer value
while shrinking to bool. Used the target hook introduced with
D109337 to skip this call for the restricted address spaces.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D109823
To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D109855
Skip stack accesses unless requested, as the memory profiler runtime
does not currently look at or report accesses for these addresses.
Differential Revision: https://reviews.llvm.org/D109868
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.
Differential Revision: https://reviews.llvm.org/D109852
This makes some tests in vector-reductions-logical.ll more stable when
applying D108837.
The cost of branching is higher when vector ops are involved due to
potential SLP transformations.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D108935
Change the asan-module pass into a MODULE_PASS_WITH_PARAMS in the
pass registry, and add a single parameter called 'kernel' that
can be set instead of having a special pass name 'kasan-module'
to trigger that special pass config.
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.
This is a follow-up to D105006 and D105007.
Split ThreadSanitizerPass into ThreadSanitizerPass (as a function
pass) and ModuleThreadSanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.
This is a follow-up to D105006 and D105007.
Split MemorySanitizerPass into MemorySanitizerPass (as a function
pass) and ModuleMemorySanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.
This is a follow-up to D105006 and D105007.
Alive2 for `{insert/extract}element`: https://alive2.llvm.org/ce/z/hwy_E-
Actually, no one file of test suite is touched by this change,
which means that is rare pattern not generated by frontend. But
it's worth being in place.
Differential Revision: https://reviews.llvm.org/D109236
In particular, it couldn't handle cases where lookup table constant
expressions involved bitcasts. This does not seem to come up
frequently in C++, but comes up reasonably often in Rust via
`#[derive(Debug)]`.
Originally reported by pcwalton.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D109565
This reverts commit 4ac4e52189.
There are couple of test failures, which needs update of the test cases.
Doing a clean revert and will recommit the change along with fixed
testcases.
Fix build bot failure in rG4ac4e521 caused due to assumeBundleBuilder
using new API (getUniqueUndroppableUser).
We now continue using the existing API for AssumeBundleBuilder
(getSingleUndroppableUser).
Sorry for the noise here.
Tests-Run: failing testcase passes.
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.
Also, the API for retrieving undroppable user has been updated accordingly since
in both usecases (Attributor and InstCombine), we seem to care about the user,
rather than the use.
Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D109700
I was wondering how instcombine does on the examples in D109236,
and we're missing a basic transform:
inselt (ext X), (ext Y), Index --> ext (inselt X, Y, Index)
https://alive2.llvm.org/ce/z/z2aBu9
Note that there are several possible extensions of this fold
(see TODO comments).
Differential Revision: https://reviews.llvm.org/D109537
This is a first step towards addressing the last remaining limitation of
the VPlan version of sinkScalarOperands: the legacy version can
partially sink operands. For example, if a GEP has uniform users outside
the sink target block, then the legacy version will sink all scalar
GEPs, other than the one for lane 0.
This patch works towards addressing this case in the VPlan version by
detecting such cases and duplicating the sink candidate. All users
outside of the sink target will be updated to use the uniform clone.
Note that this highlights an issue with VPValue naming. If we duplicate
a replicate recipe, they will share the same underlying IR value and
both VPValues will have the same name ir<%gep>.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D104254
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.
Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.
The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).
LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch
Differential Revision: https://reviews.llvm.org/D109310
Instead of discovering the sink-to block for each operand in the main
loop, the sink-to block can instead be directly queued with the
operands.
This simplifies processing in the main loop and is a NFC change split
off from D104254 as suggested there.
Patch by @dpalermo
The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader.
This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use.
This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space.
Reviewed By: ronlieb, jdoerfert, dpalermo-phab
Differential Revision: https://reviews.llvm.org/D109500
Moved out the checks for profitability of TryToSinkInstructions
into a lambda function.
This will also allow us to easily add checks for bailing out if the
transform is not profitable.
Tests-Run: instCombine tests.
38b098be66 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we can insert
a freeze of the original value, to prevent propagation of poison.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D107580
After transformation, we assume the split condition of the pre-loop is always
true. In order to guarantee it, we need to check the start value of the split
cond AddRec satisfies the split condition.
Differential Revision: https://reviews.llvm.org/D109354
This patch simply replaces any unsigned VFs with ElementCounts. It's
still NFC because at the moment epilogue vectorisation is disabled
when the main vector loop uses scalable vectors.
Differential Revision: https://reviews.llvm.org/D109364
Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit
is taken on 1st iteration, we make all branches in the following exiting blocks
always branch out of the loop and their conditions simplified away.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D108910
Reviewed By: lebedev.ri
This is a part of D108910.
We replace all loop PHIs with values coming from the loop preheader if
we proved that backedge is never taken.
Patch by Dmitry Makogon!
Differential Revision: https://reviews.llvm.org/D109596
Reviewed By: lebedev.ri
This patch fixes a error made in 2cc6f7c8e1. That patch
added a call site position but there was a small error with the way
the presence of a unknown call edge was being propagated from call site
to function. This patch fixes that error. This error was effecting some
AMDGPU tests.
This patch makes it possible to query callbase reachability
(Can a callbase reach a function Fn transitively).
The patch moves the reachability query handling logic to a member class,
this class will have more users within the AA once we add other function
reachability queries.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106402
This patch adds a call site position for AACallEdges, this
allows us to ask questions about which functions a specific
`CallBase` might call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106208