This reverts commit 7aa8a67882.
This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.
Those include:
* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
branch-on-poison.
The memcmp simplifier is limited to folding to constants calls with constant
arrays and constant sizes. This change adds the ability to simplify
memcmp(A, B, N) calls with constant A and B and variable N to the pseudocode
equivalent of
N <= Pos ? 0 : (A < B ? -1 : B < A ? +1 : 0)
where Pos is the offset of the first mismatch between A and B.
Differential Revision: https://reviews.llvm.org/D127766
GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value
defined for current basic block.
If there is already a value it tries (in this order):
to find single register coming from all predecessors
find existing phi node which matches our incoming registers
build new phi.
The compile time improvement is to use current available value if
it is defined out of current BB or it is a PHI register.
This is due to it can be used in the middle basic block.
Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126523
Handle the fact that not only constant expressions, but also
constant aggregates containing expressions can trap.
This still doesn't fix the original C reproducer, probably due to
more issues remaining in other passes.
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.
On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.
I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.
Differential Revision: https://reviews.llvm.org/D127131
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
The IV widening code currently asserts that terminators aren't SCEVable
-- however, this is not the case for invokes with a returned attribute.
As far as I can tell, this assertions is not necessary -- even if we
have a critical edge (the second test case), the trunc gets inserted
in a legal position.
Fixes https://github.com/llvm/llvm-project/issues/55925.
Differential Revision: https://reviews.llvm.org/D127288
Enhance memchr libcall folder to handle constant arrays consisting
of one or two sequences of cosecutive equal characters.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D126515
Now that transforms introducing branch on poison have been removed,
we can stop marking ranges that have been derived from branch
conditions as containing undef. The existing comment explains why
this is legal. I've checked that alive2 is happy with SCCP tests
after this change.
Differential Revision: https://reviews.llvm.org/D126647
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
I chose to encode the allockind information in a string constant because
otherwise we would get a bit of an explosion of keywords to deal with
the possible permutations of allocation function types.
I'm not sure that CodeGen.h is the correct place for this enum, but it
seemed to kind of match the UWTableKind enum so I put it in the same
place. Constructive suggestions on a better location most certainly
encouraged.
Differential Revision: https://reviews.llvm.org/D123088
Need to use all ReductionOps when propagating flags for the reduction
ops, otherwise transformation is not correct. Plus, need to drop nuw/nsw
flags.
Differential Revision: https://reviews.llvm.org/D126371
This is a followup to D125754. We introduce two branches, one
before the unrolled loop and one before the epilogue (and similar
for the prologue case). The previous patch only froze the
condition on the first branch.
Rather than independently freezing the second condition, this patch
instead freezes TripCount and bases BECount on it. These are the
two quantities involved in the conditions, and this ensures that
both work on a consistent, non-poisonous trip count.
Differential Revision: https://reviews.llvm.org/D125896
Fixes a bug preventing moving the loop's metadata to an outer loop's header,
which happens if the loop's exit is also the header of an outer loop.
Adjusts test for above.
Fixes#55416.
Differential Revision: https://reviews.llvm.org/D125574
Evaluation odering in function call arguments is implementation-dependent.
In fact, gcc evaluates bottom-top and clang does top-bottom.
Fixes#55283 partially.
Part of https://reviews.llvm.org/D125627
The modified function was incorrectly (not unnecessarily) ignoring grandchild
loops, and this change fixes the bug. In particular, this fixes the handling of
the loop { inner, body }. The TODO in the same function is talking about the b1
self loop, which may be "unnecessarily" lost, but that is a different issue.
%x umin_seq %y is currently expanded to %x == 0 ? 0 : umin(%x, %y).
This patch changes the expansion to umin(%x, freeze %y) instead
(https://alive2.llvm.org/ce/z/wujUhp).
The motivation for this change are the test cases affected by
D124910, where the freeze expansion ultimately produces better
optimization results. This is largely because
`(%x umin_seq %y) == %x` is a common expansion pattern, which
reliably optimizes in freeze representation, but only sometimes
with the zero comparison (in particular, if %x == 0 can fold to
something else, we generally won't be able to cover reasonable
code from this.)
Differential Revision: https://reviews.llvm.org/D125372
When performing runtime unrolling with multiple exits, one of the
earlier (non-latch) exits may exit the loop on the first iteration,
such that we never branch on the latch exit condition. As such, we
need to freeze the condition of the new branch that is introduced
before the loop, as it now executes unconditionally.
Differential Revision: https://reviews.llvm.org/D125754
This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.
The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.
Note that the initial version is restricted in multiple ways:
1. Pointers must only either be read or written, by a single
instruction (this allows re-constructing source/sink for
dependences with the available information)
2. Source and sink pointers must be add-recs, with matching steps
3. The step must be a constant.
3. abs(step) == AccessSize.
Most of those restrictions can be relaxed in the future.
See https://github.com/llvm/llvm-project/issues/53590.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D119078
We're having a hard time booting the ARCH=i386 Linux kernel with clang
after removing -ffreestanding because instcombine was dropping inreg
from callers during libcall simplification, but not the callees defined
in different translation units. This led the callers and callees to have
wildly different calling conventions, which (predictably) blew up at
runtime.
Infer the inreg param attrs on function declarations from the module
metadata "NumRegisterParameters." This allows us to boot the ARCH=i386
Linux kernel (w/ -ffreestanding removed).
Fixes: https://github.com/llvm/llvm-project/issues/53645
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D125285
When a callee function is inlined via an invoke instruction, every function call inside the callee, if not an invoke, will be converted to an invoke after cloned to the caller body. I found that during the conversion the !prof metadata was dropped. This in turned caused a cloned indirect call not properly promoted in subsequent passes.
The particular scenario I was investigating was with AutoFDO and thinLTO. In prelink, no ICP was triggered (neither by the sample loader nor PGO ICP), no indirect call was promoted. This is because 1) the particular indirect call did not have inlined samples; and 2) PGO ICP was intentionally disabled. After inlining, the prof metadata was dropped. Then in postlink, PGO ICP jumped in but didn't do anything. Thus the opportunity was missed.
I'm making a simple fix to preserve !prof metadata when converting call to invoke.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D125249
Per feedback on D123086 after submit.
Also added a test for vec_malloc et al attribute inference to show it's
doing the right thing.
The new tests exposed a defect, corrected by adding vec_free to the list of
free functions in MemoryBuiltins.cpp, which had been overlooked all the
way back in D94710, over a year ago.
Differential Revision: https://reviews.llvm.org/D124859
libcalls." (was 0f8c626). This reverts commit 14d9390.
The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.
In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.
Reviewed By: Eli Friedman
Differential Revision: https://reviews.llvm.org/D123198
Per the guidance in
https://llvm.org/docs/Atomics.html#atomics-and-ir-optimization,
an atomic load from a constant global can be dropped, as there can
be no stores to synchronize with. Any write to the constant global
would be UB.
IPSCCP will already drop such loads, but the main helper in Local
doesn't recognize this currently. This is motivated by D118387.
Differential Revision: https://reviews.llvm.org/D124241
TI->getBitWidth can be > 64 and in those cases the shift will be UB due
to the exponent being too large.
To fix this, cap the shift at 63. I think this should work out fine,
because TableSize is itself a 64 bit type and the maximum table size
must fit in the type. Also, if we would underestimate the size here, at
most we get an extra ZExt.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D124608
Replace the condition value with the known constant value on the
threaded edge. This happens implicitly with phi threading because
we replace with the incoming value, but not for non-phi threading.
SimplifyCFG implements basic jump threading, if a branch is
performed on a phi node with constant operands. However,
InstCombine canonicalizes such phis to the condition value of a
previous branch, if possible. SimplifyCFG does support this as
well, but only in the very limited case where the same condition
is used in a direct predecessor -- notably, this does not include
the common diamond pattern (i.e. two consecutive if/elses on the
same condition).
This patch extends the code to look back a limited number of
blocks to find a branch on the same value, rather than only
looking at the direct predecessor.
Fixes https://github.com/llvm/llvm-project/issues/54980.
Differential Revision: https://reviews.llvm.org/D124159