Commit Graph

31623 Commits

Author SHA1 Message Date
Kevin P. Neal 05ac82de40 [FPEnv][EarlyCSE] Support for CSE when exception behavior is "ignore" or "maytrap" and the rounding mode is known.
Previously we would only CSE constrained FP intrinsics in the default
floating point environment. Exception behavior of "strict" is still not
allowed since we are not allowed to remove any traps in that case.

There are no restrictions on CSE across function calls inside a function.

Differential Revision: https://reviews.llvm.org/D112256
2022-08-16 08:31:42 -04:00
Martin Sebor 65967708d2 [InstCombine] Adjust snprintf folding of constant strings (PR #56598)
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130494
2022-08-15 15:59:21 -06:00
Arthur Eubanks 633f5663c3 [LegacyPM] Remove ThinLTO bitcode writer legacy pass
Using the legacy PM for the optimization pipeline is deprecated and in
the process of being removed. This is a small step in that direction.

For an example of migrating to the new PM:
853b57fe80
2022-08-15 14:21:16 -07:00
Philip Reames e792a353b5 [slp] adjust debug output to include final computed cost 2022-08-15 13:51:39 -07:00
Jameson Nash 3a8d7fe201 [SimplifyCFG] teach simplifycfg not to introduce ptrtoint for NI pointers
SimplifyCFG expects to be able to cast both sides to an int, if either side can be case to an int, but this is not desirable or legal, in general, per D104547.

Spotted in https://github.com/JuliaLang/julia/issues/45702

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128670
2022-08-15 15:11:48 -04:00
Alexey Bataev 2819126d0c [SLP][NFC]Replace multiple isa calls with single one where possible,
NFC.
2022-08-15 11:56:58 -07:00
Sanjay Patel e5748c6e73 [InstCombine] reduce sub-with-overflow ==/!= 0
The basic patterns look like this:
https://alive2.llvm.org/ce/z/MDj9EC

The tests have a use of the overflow value too.
Otherwise, existing folds should reduce already.

This was noted as a missing IR fold in:
926e7312b2

Hopefully, this makes it easier to implement a backend
fix because we should get the same IR regardless of
whether the source used builtins or inline code.
2022-08-15 13:03:51 -04:00
Nuno Lopes 0299ebc1bd InstCombine: use poison instead of undef as placeholder in insertvalue [NFC]
These vectors are fully initialized so the placeholder value is irrelevant
2022-08-14 21:37:23 +01:00
Kazu Hirata 50724716cd [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-14 12:51:58 -07:00
Kazu Hirata 448c466636 Use llvm::erase_value (NFC) 2022-08-13 12:55:50 -07:00
Kazu Hirata 109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00
Kazu Hirata 2117fcb1c0 Use Optional::transform instead of Optional::map (NFC)
I'm planning to deprecate map in favor of transform for consistency
with std::optional::transform in C++23.
2022-08-13 11:48:26 -07:00
Sanjay Patel 8b56fa92de [InstCombine] fix "X|(X^Y)" pattern-matching for commuted variants 2022-08-13 11:02:28 -04:00
Sanjay Patel 9d218b61cc [InstCombine] reduce or-xor-or patterns
(A | ?) | (A ^ B) --> (A | ?) | B
https://alive2.llvm.org/ce/z/dbNQw4

This extends the existing transform to peek through
another 'or' instruction for the common operand.

This is the underlying missing fold that should allow
issue #56711 and issue #57120 to reduce even more.
2022-08-13 09:52:01 -04:00
Sanjay Patel 763b31237f [InstCombine] move comments closer to relevant code; NFC 2022-08-13 09:16:33 -04:00
Kevin Athey 532564de17 [MSAN] add flag to suppress storage of stack variable names with -sanitize-memory-track-origins
Allows for even more savings in the binary image while simultaneously removing the name of the offending stack variable.

Depends on D131631

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D131728
2022-08-12 11:59:53 -07:00
Arthur Eubanks a3ac1cfaed [SampleProfile] Fix non-determinism in promoteMergeNotInlinedContextSamples()
We're seeing non-determinism with loading sample profiles. It seems to
be related to the order in which we merge FunctionSamples in
promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over
NonInlinedCallSites in the order entries were inserted.

Reviewed By: wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D131592
2022-08-12 10:13:25 -07:00
Kevin Athey ec277b67eb [MSAN] Separate id ptr from constant string for variable names used in track origins.
The goal is to reduce the size of the MSAN with track origins binary, by making
the variable name locations constant which will allow the linker to compress
them.

Follows: https://reviews.llvm.org/D131415

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D131631
2022-08-12 08:47:36 -07:00
Max Kazantsev a3d1fb3b59 [SCEV] Prove condition invariance via context
Contextual knowledge may be used to prove invariance of some conditions.
For example, in this case:
```
  ; %len >= 0
  guard(%iv = {start,+,1}<nuw> <s %len)
  guard(%iv = {start,+,1}<nuw> <u %len)
```
the 2nd check always fails if `start` is negative and always passes otherwise.

It looks like there are more opportunities of this kind that are still to be
implemented in the future.

Differential Revision: https://reviews.llvm.org/D129753
Reviewed By: apilipenko
2022-08-12 14:23:35 +07:00
Chuanqi Xu e190b7cc90 [Coroutines] Maintain the position of final suspend
Closing https://github.com/llvm/llvm-project/issues/56329

The problem happens when we try to simplify the suspend points. We might
break the assumption that the final suspend lives in the last slot of
Shape.CoroSuspends. This patch tries to main the assumption and fixes
the problem.
2022-08-12 13:05:08 +08:00
Sanjay Patel fa68d93d54 [InstCombine] fold reassociative fadd with negated operand
We manage to iteratively achieve this result with no extra
uses, and the reassociate pass can also do this, but this
pattern falls through the cracks in the example from
issue #57053.
2022-08-11 11:43:36 -04:00
Marco Elver c47ec95531 [MemorySanitizer] Support memcpy.inline and memset.inline
Other sanitizers (ASan, TSan, see added tests) already handle
memcpy.inline and memset.inline by not relying on InstVisitor to turn
the intrinsics into calls. Only MSan instrumentation currently does not
support them due to missing InstVisitor callbacks.

Fix it by actually making InstVisitor handle Mem*InlineInst.

While the mem*.inline intrinsics promise no calls to external functions
as an optimization, for the sanitizers we need to break this guarantee
since access into the runtime is required either way, and performance
can no longer be guaranteed. All other cases, where generating a call is
incorrect, should instead use no_sanitize.

Fixes: https://github.com/llvm/llvm-project/issues/57048

Reviewed By: vitalybuka, dvyukov

Differential Revision: https://reviews.llvm.org/D131577
2022-08-11 10:43:49 +02:00
Kevin Athey 057cabd997 Remove function name from sanitize-memory-track-origins binary.
This work is being done to reduce the size of MSAN with track origins binary.

Builds upon: https://reviews.llvm.org/D131205

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D131415
2022-08-10 15:45:40 -07:00
Johannes Doerfert b65471d715 [Attributor][FIX] Visit same instructions with different scopes
If we collect potential values we need to visit a value even if we have
seen it before if the scope is different. The scope is part of the
result after all. Test included.

Fixes https://github.com/llvm/llvm-project/issues/56753

Differential Revision: https://reviews.llvm.org/D131597
2022-08-10 16:02:12 -05:00
Kevin Athey d7a47a9bb5 Desist from passing function location to __msan_set_alloca_origin4.
This is done by calling __msan_set_alloca_origin and providing the location of the variable by using the call stack.
This is prepatory work for dropping variable names when track-origins is enabled.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D131205
2022-08-10 09:02:53 -07:00
Nikita Popov 32017d5efe [Attributor] Check for noalias call in AAInstanceInfo
The relevant property of allocation functions of interest here is
their uniqueness (in the sense of disjoint provenance), which is
encoded by the noalias return attribute.

Differential Revision: https://reviews.llvm.org/D130225
2022-08-10 10:27:14 +02:00
Dinar Temirbulatov cab6cd6834 [AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding.
After D121595 was commited, I noticed regressions assosicated with small trip
count numbersvectorisation by tail folding with scalable vectors. As a solution
for those issues I propose to introduce the minimal trip count threshold value.

  Differential Revision: https://reviews.llvm.org/D130755
2022-08-09 22:10:17 +01:00
Sanjay Patel 926e7312b2 [InstCombine] fold usub.with.overflow to icmp when there's no use of the math value
https://alive2.llvm.org/ce/z/UE48FH

This is part of solving issue #56926.
2022-08-09 13:13:48 -04:00
Sanjay Patel 6bfe5361b7 [InstCombine] add helper function for extract of with-overflow-intrinsic; NFC
We can do more with these patterns, so this block is going to grow.
2022-08-09 12:38:11 -04:00
zhongyunde c2ab65ddaf [IndVars] Eliminate redundant type cast with different sizes
Deal with different sizes between the itofp and fptoi with
trunc or sext/zext, depend on D129756.
Fixes https://github.com/llvm/llvm-project/issues/55505.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129958
2022-08-09 23:59:42 +08:00
Nikita Popov 4ac00789e1 [RelLookupTableConverter] Bail on invalid pointer size (x32)
The RelLookupTableConverter pass currently only supports 64-bit
pointers.  This is currently enforced using an isArch64Bit() check
on the target triple. However, we consider x32 to be a 64-bit target,
even though the pointers are 32-bit. (And independently of that
specific example, there may be address spaces with different pointer
sizes.)

As such, add an additional guard for the size of the pointers that
are actually part of the lookup table.

Differential Revision: https://reviews.llvm.org/D131399
2022-08-09 09:36:39 +02:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Ruobing Han f756f06cc4 [SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops
With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.

Reviewed By: aeubanks, asbirlea

Differential Revision: https://reviews.llvm.org/D129599
2022-08-08 18:12:04 +00:00
Vang Thao 257251247a [SROA] Try harder to find a vector promotion viable type when rewriting
We are seeing significant performance loss when an alloca fails to get promoted
to register. I have observed that this is due to the common type found when
attempting to rewrite partition users being unviable for promotion. While if we
would have continue looking for a type, we would have found a subtype in the
original allocated type that would have enabled promotion. Thus first check if
the initial common type found is promotion viable and if not then continue
looking instead of stopping with the initial common type found.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128073
2022-08-08 11:04:01 -07:00
Denis Antrushin 36cc533471 [EarlyCSE][OpaquePointers]Replace assert with return for mask type check.
When EarlyCSE tries to common vector masked loads/stores, it first checks that
they have same base operand and then assumes that this is enough for mask types
to be equal. This is true for typed pointers but false for opaque ones -
two loads of different vector sizes from same base pointer '%b' are the same,
`ptr %b`. (For typed pointers, `%b` was cast to vector pointer type so bases
were different).
Change assert to return from lambda `isSubmask` so this transformation properly
works with opaque pointers.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D131251
2022-08-08 16:14:42 +03:00
Kazu Hirata e20d210eef [llvm] Qualify auto (NFC)
Identified with readability-qualified-auto.
2022-08-07 23:55:27 -07:00
Kazu Hirata 0e37ef0186 [Transforms] Fix comment typos (NFC) 2022-08-07 23:55:24 -07:00
Kazu Hirata ba0407ba86 [llvm] Use range-based for loops (NFC) 2022-08-07 00:16:21 -07:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Fangrui Song fa66789d06 [llvm] LLVM_NODISCARD => [[nodiscard]]. NFC
With C++17 there is no Clang pedantic warning.
2022-08-07 00:26:33 +00:00
Fangrui Song 5deb678289 Revert "[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508"
This reverts commit 48c74bb2e2.
With C++17 the workaround is no longer needed.
2022-08-06 16:48:23 -07:00
Dawid Jurczak 1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
David Spickett c401dbde71 [llvm][IROutliner] Account for return void in sort comparator
This fixes 69 llvm tests that failed when EXPENSIVE_CHECKS was enabled.
llvm/test/Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
is one example.

When we have EXPENSIVE_CHECKS, _GLIBCXX_DEBUG is defined. This means
that libstdc++ will call the compare function to check if it is
implemented correctly (that !(a < a) is true).

This happens even if there is only one item and here, we expect
to see one return void or multiple return constant integer.

Don't sort if we have 1 item, but do assert that it is the 1
ret void we expect. In the comparator, assert that neither
Value is a nullptr in case one ended up in a the list somehow.

Reviewed By: AndrewLitteken

Differential Revision: https://reviews.llvm.org/D130230
2022-08-05 09:36:43 +00:00
Chuanqi Xu 230d6f93aa [Coroutines] Remove lifetime intrinsics for spliied allocas in coroutine frames
Closing https://github.com/llvm/llvm-project/issues/56919

It is meaningless to preserve the lifetime markers for the spilled
allocas in the coroutine frames and it would block some optimizations
too.
2022-08-05 14:50:43 +08:00
Fangrui Song 7d6017fd31 [TTI] Change new getVectorInstrCost overload to use const reference after D131114
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D131197
2022-08-04 15:16:51 -07:00
Arthur Eubanks 6e45162adf [InstrProf] Set prof global variables to internal linkage if adding a comdat
COFF has a verifier check that private global variables don't have a comdat of the same name.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D131043
2022-08-04 13:24:55 -07:00
Mingming Liu bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
Johannes Doerfert f81a209337 [Attributor][FIX] Deal with implicit `undef` in AAPotentialConstantValues.
In contrast to AAPotentialValues, the constant values version can
contain implicit `undef` in the set. We had an assertion that could
misfire before. Handle it properly now.
2022-08-04 14:44:51 -05:00
Ellis Hoag 12e78ff881 [InstrProf] Add the skipprofile attribute
As discussed in [0], this diff adds the `skipprofile` attribute to
prevent the function from being profiled while allowing profiled
functions to be inlined into it. The `noprofile` attribute remains
unchanged.

The `noprofile` attribute is used for functions where it is
dangerous to add instrumentation to while the `skipprofile` attribute is
used to reduce code size or performance overhead.

[0] https://discourse.llvm.org/t/why-does-the-noprofile-attribute-restrict-inlining/64108

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D130807
2022-08-04 08:45:27 -07:00
Arthur Eubanks 203296d642 [BoundsChecking] Fix merging of sizes
BoundsChecking uses ObjectSizeOffsetEvaluator to keep track of the
underlying size/offset of pointers in allocations.  However,
ObjectSizeOffsetVisitor (something ObjectSizeOffsetEvaluator
uses to check for constant sizes/offsets)
doesn't quite treat sizes and offsets the same way as
BoundsChecking.  BoundsChecking wants to know the size of the
underlying allocation and the current pointer's offset within
it, but ObjectSizeOffsetVisitor only cares about the size
from the pointer to the end of the underlying allocation.

This only comes up when merging two size/offset pairs. Add a new mode to
ObjectSizeOffsetVisitor which cares about the underlying size/offset
rather than the size from the current pointer to the end of the
allocation.

Fixes a false positive with -fsanitize=bounds.

Reviewed By: vitalybuka, asbirlea

Differential Revision: https://reviews.llvm.org/D131001
2022-08-03 17:21:19 -07:00
Vitaly Buka a2aa6809a8 [NFC][Inliner] Add cl::opt<int> to tune InstrCost
The plan is tune this for sanitizers.

Differential Revision: https://reviews.llvm.org/D131123
2022-08-03 17:14:10 -07:00
Congzhe Cao 8dc4b2edfa [LoopInterchange][PR56275] Fix legality with negative dependence vectors
This is the 2nd patch of the two-patch series (D130188, D130189) that
fix PR56275 (https://github.com/llvm/llvm-project/issues/56275) which
is a missed opportunity for loop interchange.

As follow-up on the dependence analysis (DA) patch D130188, this patch
normalizes DA results in loop interchange, such that negative dependence
vectors queried by loop interchange are reversed to be non-negative.

Now all tests in PR56275 can get interchanged. Those tests are added
in lit test as `pr56275.ll`.

Reviewed By: kawashima-fj, bmahjour, Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D130189
2022-08-03 19:59:01 -04:00
Bill Wendling 239c831de4 Add switch to use "source_filename" instead of a hash ID for globally promoted local
During LTO a local promoted to a global gets a unique suffix based on
a hash of the module IR. This means that changes in the local's module
can affect the contents in another module that imported it (because the name
of the imported promoted local is changed, but that doesn't reflect a
real change in the importing module). So any tool that's
validating changes to the importing module will see a superficial change.

Instead of using the module hash, we can use the "source_filename" if it
exists to generate a unique identifier that doesn't change due to LTO
shenanigans.

Differential Revision: https://reviews.llvm.org/D128863
2022-08-03 16:41:56 -07:00
Philip Reames 569a7f6aa3 [LV] Move definition of isPredicatedInst out of line and make it const [nfc] 2022-08-03 08:53:11 -07:00
Philip Reames a1cab0daae [LV] Use cost base decision for uniform mem op strategy [nfc-ish]
This is mostly a stylistic change to make the uniform memop widening cost
code fit more naturally with the sourounding code.  Its not strictly
speaking NFC as I added in the store with invariant value case, and we
could in theory have a target where a gather/scatter is cheaper than a
single load/store... but it's probably NFC in practice.  Note that the
scatter/gather result can still be overriden later if the result is
uniform-by-parts.
2022-08-03 07:47:24 -07:00
Nikita Popov b128e057c1 [AA] Make ModRefInfo a bitmask enum (NFC)
Mark ModRefInfo as a bitmask enum, which allows using normal
& and | operators on it. This supersedes various functions like
unionModRef() and intersectModRef(). I think this makes the code
cleaner than going through helper functions...

Differential Revision: https://reviews.llvm.org/D130870
2022-08-03 10:05:55 +02:00
Paul Kirth d434e40f39 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-08-03 00:09:45 +00:00
Vladislav Dzhidzhoev f6d9f00031 [DebugInfo] Test commit: update irrelevant comments
Differential Revision: https://reviews.llvm.org/D130998
2022-08-02 20:21:24 +03:00
Philip Reames 0b47615fcf [LV] Recognize store of invariant value to invariant address as uniform
This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result.

For context, the basic structure of the existing code and how the change fits in:
* First, we select a widening strategy. (The result is irrelevant for this patch.)
* Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.)
* If it is, we overrule the widening strategy, and unconditionally scalarize.
* VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.)

An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions.

Differential Revision: https://reviews.llvm.org/D130364
2022-08-02 08:09:49 -07:00
David Sherwood 4ef9cb6c17 [AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses
If we have interleave groups in the loop we want to vectorise then
we should fall back on normal vectorisation with a scalar epilogue. In
such cases when tail-folding is enabled we'll almost certainly go on to
create vplans with very high costs for all vector VFs and fall back on
VF=1 anyway. This is likely to be worse than if we'd just used an
unpredicated vector loop in the first place.

Once the vectoriser has proper support for analysing all the costs
for each combination of VF and vectorisation style, then we should
be able to remove this.

Added an extra test here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D128342
2022-08-02 09:52:33 +01:00
jacquesguan e38af7ba95 [LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add.
Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd.

This patch does the following changes:
1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost.
2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul.

Differential Revision: https://reviews.llvm.org/D130868
2022-08-02 16:02:38 +08:00
Martin Sebor bcef4d238d [InstCombine] Correct strtol folding with nonnull endptr
Reflect in the pointer's offset the length of the leading part
of the consumed string preceding the first converted digit.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130912
2022-08-01 16:47:05 -06:00
Simon Pilgrim 27105e2f30 MisExpect.h - fix Wdocumentation warnings. NFC. 2022-08-01 15:06:30 +01:00
Alex Bradbury 9bf2d8cbbe [NFC] Use AllocaInst's getAddressSpace helper 2022-08-01 10:11:16 +01:00
Nikita Popov 7314ad7a06 Revert "[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions"
This reverts commit 7b0f6378e2.

As commented on the review, this patch has a correctness issue
regarding the modelling of memory effects.
2022-08-01 09:20:56 +02:00
Momchil Velikov 7b0f6378e2 [SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions
SimplifyCFG does some common code hoisting, which is limited to hoisting a
sequence of identical instruction in identical order and stops at the first
non-identical instruction.

This patch allows hoisting instruction pairs over same-length sequences of
non-matching instructions. The linear asymptotic complexity of the algorithm
stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit`
serving to limit compilation time and/or the size of the hoisted live ranges.

The patch improves SPECv6/525.x264_r by about 10%.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D129370
2022-08-01 07:55:14 +01:00
Kazu Hirata bf6021709a Use drop_begin (NFC) 2022-07-31 15:17:09 -07:00
Sanjay Patel 7073ec530e [InstCombine] canonicalize more zext-and-of-bool compare to narrow and
https://alive2.llvm.org/ce/z/vBNiiM

This matches variants of patterns that were folded with:
b5a9361c90
2022-07-30 11:22:05 -04:00
Sanjay Patel f95a6aea1b [InstCombine] avoid splitting a constant expression with div/rem fold
Follow-up to d4940c0f3d to further limit the transform
to avoid an unintended pattern/fold of a constant expression.
2022-07-30 09:45:25 -04:00
Nuno Lopes fffabd5348 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-30 13:55:56 +01:00
Alexander Shaposhnikov 4220ef2be1 [InstCombine] Add fold for redundant sign bits count comparison
For power-of-2 C:
((X s>> ShiftC) ^ X) u< C --> (X + C) u< (C << 1)
((X s>> ShiftC) ^ X) u> (C - 1) --> (X + C) u> ((C << 1) - 1)

(https://github.com/llvm/llvm-project/issues/56479)

Test plan:
0/ ninja check-llvm check-clang + bootstrap LLVM/Clang
1/ https://alive2.llvm.org/ce/z/eEUfx3

Differential revision: https://reviews.llvm.org/D130433
2022-07-30 09:06:53 +00:00
Alexander Shaposhnikov d982f1e0c6 [InstCombine] Refactor foldICmpMulConstant
This is a follow-up to 2ebfda2417
(replace "if" with "else if" since the cases nuw/nsw
were meant to be handled separately).

Test plan:
1/ ninja check-llvm check-clang check-lld
2/ Bootstrapped LLVM/Clang pass tests
2022-07-30 02:29:15 +00:00
Sanjay Patel d4940c0f3d [InstCombine] fix miscompile from urem/udiv transform with constant expression
The isa<Constant> check could misfire on an instruction with 2 constant
operands. This bug was introduced with bb789381fc (D36988).

See issue #56810 for a C source example that exposed the bug.
2022-07-29 17:14:30 -04:00
Sanjay Patel b5a9361c90 [InstCombine] canonicalize zext-and-of-bool compare to narrow and
https://alive2.llvm.org/ce/z/3jYbEH

We should choose one of these forms, and the option that uses
the narrow type allows the motivating example from issue #56294
to reduce. In the best case (no 'not' needed and 'trunc' remains),
this does remove an instruction.

Note that there is what looks like a regression because there
is an existing canonicalization that turns trunc into and+icmp.
That is a long-standing transform, and I'm not sure what effect
reversing it would have.
2022-07-29 12:02:54 -04:00
Nikita Popov 5eaeeed8cb [InstCombine] Avoid ConstantExpr::getFNeg() calls (NFCI)
Instead call the constant folding API, which can fail. For now,
this should be NFC, as we still allow the creation of fneg
constant expressions.
2022-07-29 16:01:46 +02:00
Francis Visoiu Mistrih bfd3883e83 [Matrix] Refactor transpose distribution. NFC
Use a function to distribute transposes. Preparation for future patches.
2022-07-28 17:30:00 -07:00
Philip Reames 82c1b136db [LV] Don't predicate uniform mem op stores unneccessarily
We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible.

Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would *not* be legal to make this same change for merely "uniform" operations.

Differential Revision: https://reviews.llvm.org/D130637
2022-07-28 08:55:52 -07:00
Liqiang Tao d52e775b05 [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 22:44:03 +08:00
Liqiang Tao c113594378 Revert "[llvm][ModuleInliner] Add inline cost priority for module inliner"
This reverts commit bb7f62bbbd.
2022-07-28 22:36:28 +08:00
Liqiang Tao bb7f62bbbd [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 21:28:07 +08:00
Sanjay Patel 28ad5dc3f7 [InstCombine] try harder to narrow bitwise logic with cast operands
This works with any logic + extend:
https://alive2.llvm.org/ce/z/vzsqQD

The motivating case is from issue #56294, but that's still not optimal
(it should simplify completely).
2022-07-28 07:23:22 -04:00
Paul Kirth 6e9bab71b6 Revert "[llvm][NFC] Refactor code to use ProfDataUtils"
This reverts commit 300c9a7881.

We will reland once these issues are ironed out.
2022-07-27 21:38:11 +00:00
Paul Kirth 300c9a7881 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-07-27 21:13:54 +00:00
Florian Hahn 16e0620d6d
[VPlan] Mark VPPredInstPHIRecipe as not having side-effects.
Now that all uses of VPPredInstPHIRecipes are properly modeled, they can
be treated as not having side-effects, enabling removal.
2022-07-27 19:29:26 +01:00
Stanislav Mekhanoshin 0562cf442f Allow data prefetch into non-default address space
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.

This does not currently affect any known targets, but seems to be
generally useful for the future.

Differential Revision: https://reviews.llvm.org/D129795
2022-07-27 10:01:26 -07:00
Sanjay Patel e079bf6558 [AggressiveInstCombine] check sqrt operand to allow more libcall->intrinsic transforms
This should fix issue #56383 (at least when compiled with -O3 because this pass is only
run at -O3 currently).
2022-07-27 11:36:13 -04:00
Joseph Huber b08369f7f2 Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.

This reverts commit d61d72dae6.

Fixes #56752
2022-07-27 11:09:18 -04:00
Aaron Kogon dd3ca65c37 Sinking or hoisting instructions between loops before fusion
Instructions between two adjacent loops will be hoisted above the first
loop, or sunk below the second to facilitate loop fusion. Hoisting will
be attempted for an instruction that dominates the first loop.
Otherwise, sinking this instructions will be attempted.

Instructions with side effects will not be considered for sinking or
hoisting. Hoisting/sinking of any instructions between loops will only
be performed if all the instructions can be moved. As well,
sinking/hoisting is considered for each instruction in isolation,
without taking into account sinking/hoisting decisions for other
instructions in the preheader.

Differential Revision: https://reviews.llvm.org/D118076
2022-07-27 06:55:09 -04:00
Kirill Stoimenov d6e1e0a019 [ASan] Use stack safety analysis to optimize allocas instrumentation.
Added alloca optimization which was missed during the implemenation of D112098.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130503
2022-07-26 18:48:16 -07:00
Martin Sebor 4447603616 [InstCombine] Fold strtoul and strtoull and avoid PR #56293
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D129224
2022-07-26 14:11:40 -06:00
Sanjay Patel e3205b8765 [AggressiveInstCombine] convert sqrt libcalls with "nnan" to sqrt intrinsics
This is an alternate to D129155 that uses TTI.haveFastSqrt() to avoid a
potential miscompile for programs with reads of errno. Moving the transform
to AggressiveInstCombine provides access to TTI.

If a sqrt call has "nnan", that implies that the input argument is never
negative because sqrt of {negative number} --> NAN.
If the argument is never negative and the call can be lowered without a
libcall, then we can assume that errno accesses are unchanged after lowering,
so the call can be translated to the LLVM intrinsic (which is expected to
become inline code).

This affects codegen for targets like x86 that have sqrt instructions, but
still have to conservatively assume that a libcall may be needed to set
errno as shown in issue #52620 and issue #56383.

This patch won't solve those examples - we will need to extend this to use
CannotBeOrderedLessThanZero or similar, enhance that analysis for new
operators, and/or deal with llvm.assume too.

Differential Revision: https://reviews.llvm.org/D129167
2022-07-26 15:50:14 -04:00
Francis Visoiu Mistrih 448a094d3e [Matrix] Add assert to catch extracted vectors with poison elements
Assert when the extracted vector is wider than the row/column.

Differential Revision: https://reviews.llvm.org/D130173
2022-07-26 11:07:02 -07:00
Francis Visoiu Mistrih 2c6e8b4636 [Matrix] Refactor tiled loops in a struct. NFC
The three loops have the same structure: index, header, latch.
2022-07-26 11:02:22 -07:00
Stefan Gränitz 1e30820483 [WinEH] Apply funclet operand bundles to nounwind intrinsics that lower to function calls in the course of IR transforms
WinEHPrepare marks any function call from EH funclets as unreachable, if it's not a nounwind intrinsic or has no proper funclet bundle operand. This
affects ARC intrinsics on Windows, because they are lowered to regular function calls in the PreISelIntrinsicLowering pass. It caused silent binary truncations and crashes during unwinding with the GNUstep ObjC runtime: https://github.com/gnustep/libobjc2/issues/222

This patch adds a new function `llvm::IntrinsicInst::mayLowerToFunctionCall()` that aims to collect all affected intrinsic IDs.
* Clang CodeGen uses it to determine whether or not it must emit a funclet bundle operand.
* PreISelIntrinsicLowering asserts that the function returns true for all ObjC runtime calls it lowers.
* LLVM uses it to determine whether or not a funclet bundle operand must be propagated to inlined call sites.

Reviewed By: theraven

Differential Revision: https://reviews.llvm.org/D128190
2022-07-26 17:52:43 +02:00
Arthur Eubanks 2eade1dba4 [WPD] Use new llvm.public.type.test intrinsic for potentially publicly visible classes
Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`.

Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`.

To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D128955
2022-07-26 08:01:08 -07:00
Phoebe Wang 19c5638e4f [ArgPromotion] Transfer metadata nontemporal to promoted loads
Fixes #56703

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130536
2022-07-26 16:30:08 +08:00
Kazu Hirata 3f3930a451 Remove redundaunt virtual specifiers (NFC)
Identified with tidy-modernize-use-override.
2022-07-25 23:00:59 -07:00
zhongyunde d485c1b73e [LoopDataPrefetch] Fix crash when TTI doesn't set CacheLineSize
Fix https://github.com/llvm/llvm-project/issues/56681

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130418
2022-07-26 13:08:42 +08:00
Joseph Huber d61d72dae6 [OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped
immediately in the OpenMPOpt module pass. However, Since the changes in
D130298, we not explicitly state which functions will have external
visiblity in the bitcode library. Additionally the OpenMPOpt module pass
should run before the inliner pass, so this shouldn't make a difference
in whether or not the functions will be alive for the initial pass of
OpenMPOpt. This should simplify the interface, and additionally save
time spend on scanning funciton names for noinline.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130368
2022-07-25 15:44:50 -04:00
Warren Ristow 3bbd380a5b [Reassociate][NFC] Use an appropriate dyn_cast for BinaryOperator
In D129523, it was noted that there is are some questionable naked casts
from Instruction to BinaryOperator, which could be addressed by doing a
dyn_cast directly to BinaryOperator, avoiding the need for the later cast.
This cleans up that casting.

Reviewed By: nikic, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D130448
2022-07-25 10:24:43 -07:00
Kazu Hirata 95a932fb15 Remove redundaunt override specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 22:28:11 -07:00
Kazu Hirata b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Warren Ristow 3089b411a4 [Reassociate][NFC] Consistent checking for FastMathFlags suitability
In D129523, it was noted that the approach to check whether a value can
have FastMathFlags was done in different ways, and they should be made
consistent.  This patch makes minor changes to fix that.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D130408
2022-07-24 17:44:30 -07:00
Kazu Hirata acf648b5e9 Use llvm::less_first and llvm::less_second (NFC) 2022-07-24 16:21:29 -07:00
Kazu Hirata 8ac2d06195 [IPO] Use range-based for loops (NFC) 2022-07-24 14:48:06 -07:00
Kazu Hirata 3736a498d4 [IPO] Use std::array for AccessKind2Accesses (NFC)
Switching to std:array allow us to use fill.

While I am at it, this patch also converts one for loop to a
range-based one.
2022-07-23 15:47:53 -07:00
Fangrui Song 7225213c0a [LegacyPM] Remove {,PostInline}EntryExitInstrumenterPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline.
2022-07-23 15:30:15 -07:00
Nuno Lopes 9df0b254d2 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-23 21:50:11 +01:00
Kazu Hirata 2d2e2e7ea9 [Vectorize] Remove isConsecutiveLoadOrStore (NFC)
The last use was removed on Jan 4, 2022 in commit
95a93722db.
2022-07-23 13:01:14 -07:00
Johannes Doerfert 6b7eae11f1 [Attributor][FIX] HasBeenWrittenTo logic should only be used for reads
If we look at a write, we should not enact the "has been written to"
logic introduced to avoid spurious write -> read dependences. Doing so
lead to elimination of stores we needed, which is obviously bad.
2022-07-22 23:57:57 -05:00
Alexander Shaposhnikov 2ebfda2417 [InstCombine] Improve folding of mul + icmp
This diff adds folds for patterns like X * A < B
where A, B are constants and "mul" has either "nsw" or "nuw".
(to address https://github.com/llvm/llvm-project/issues/56563).

Test plan:
1/ ninja check-llvm check-clang
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D130039
2022-07-22 22:08:53 +00:00
Sanjay Patel 08091a99ae Revert "[InstCombine] enhance fold for subtract-from-constant -> xor"
This reverts commit 79bb915fb6.
This caused regressions because SCEV works better with sub.
2022-07-22 15:56:24 -04:00
Philip Reames b5c7213647 [LV] Use early return to simplify code structure 2022-07-22 12:15:14 -07:00
Mircea Trofin 7b81a81d5f [NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate
The name `getEntrySamples` was misleading for 2 reasons. One, it's
close in name to `Function::getEntryCount`, but the equivalent here is
`getHeadSamples`; second, as opposed to the other get* APIs in
`FunctionSamples`, it performs an estimate/heuristic rather than just
retrieving raw data (or a non-heuristic derivate off that data, like
`getMaxCountInside`)

The new name should more clearly communicate its intent; and, being
close (in name) to `getHeadSamples`, it should allow the reader discover
the relation between them.

Also updated the doc comments for both `getHeadSamples[Estimate]` so a
reader may better understand the relation between them.

Differential Revision: https://reviews.llvm.org/D130281
2022-07-22 09:17:59 -07:00
Benjamin Kramer 5a445395e4 [LV] Remove unused variable. NFC. 2022-07-22 17:43:58 +02:00
Philip Reames d7bf81fd51 [LV] Rework widening cost of uniform memory ops for clarity [nfc]
Reorganize the code to make it clear what is and isn't handle, and why.
Restructure bailout to remove (false and confusing) dependence on
CM_Scalarize; just return invalid cost and propagate, that's what it
is for.
2022-07-22 08:35:45 -07:00
Joseph Huber 3d0ab8638b [Internalize] Support glob patterns for API lists
The internalize pass supports an option to provide a list of symbols
that should not be internalized. THis is useful retaining certain
defintions that should be kept alive. However, this interface is
somewhat difficult to use as it requires knowing every single symbol's
name and specifying it. Many APIs provide common prefixes for the
symbols exported by the library, so it would make sense to be able to
match these using a simple glob pattern. This patch changes the handling
from a simple string comparison to a glob pattern match.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D130319
2022-07-22 08:24:32 -04:00
Johannes Doerfert a50b9f9f1f [Attributor][FIX] Handle non-recursive but re-entrant functions properly
If a function is non-recursive we only performed intra-procedural
reasoning for reachability (via AA::isPotentiallyReachable). However,
if it is re-entrant that doesn't mean we can't reach. Instead of this
problematic logic in the reachability reasoning we utilize logic in
AAPointerInfo. If a location is for sure written by a function it can
be re-entrant or recursive we know only intra-procedural reasoning is
sufficient.
2022-07-22 00:00:56 -05:00
Max Kazantsev a40af8589e [RS4GC] Handle special cases in unreachable code for memcpy/memmov
The existing code doesn't expect dummy values (undef, poison, null-derived
constants etc) as arguments of these intrinsics. However, they can be there
in unreached code. Currently we fail trying to find base for them.

Handle these cases separately. Return null as base for them to be consistent
with the handling in the main algorithm in findBaseDefiningValue.

Differential Revision: https://reviews.llvm.org/D129561
Reviewed By: apilipenko
2022-07-22 11:30:43 +07:00
Johannes Doerfert 62f7888d6d [Attributor] Dominating must-write accesses allow unknown initial values
If we have a dominating must-write access we do not need to know the
initial value of some object to perform reasoning about the potential
values. The dominating must-write has overwritten the initial value.
2022-07-21 23:08:43 -05:00
Johannes Doerfert c72d93a08a [Attributor][NFC] Remove unnecessary overwritten methods 2022-07-21 21:57:02 -05:00
Chenbing Zheng 1a0187c9e7 [InstCombine] remove useless ‘InstCombiner::’. nfc
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130220
2022-07-22 09:24:24 +08:00
Philip Reames bd75350180 [LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst
This code confuses LV's "Uniform" and LVL/LAI's "Uniform".  Despite the
common name, these are different.
* LVs notion means that only the first lane *of each unrolled part* is
  required.  That is, lanes within a single unroll factor are considered
  uniform.  This allows e.g. widenable memory ops to be considered
  uses of uniform computations.
* LVL and LAI's notion refers to all lanes across all unrollings.

IsUniformMem is in turn defined in terms of LAI's notion.  Thus a
UniformMemOpmeans is a memory operation with a loop invariant address.
This means the same address is accessed in every iteration.

The tweaked piece of code was trying to match a uniform mem op (i.e.
fully loop invariant address), but instead checked for LV's notion of
uniformity.  In theory, this meant with UF > 1, we could speculate
a load which wasn't safe to execute.

This ends up being mostly silent in current code as it is nearly
impossible to create the case where this difference is visible.  The
closest I've come in the test case from 54cb87, but even then, the
incorrect result is only visible in the vplan debug output; before this
change we sink the unsafely speculated load back into the user's predicate
blocks before emitting IR.  Both before and after IR are correct so the
differences aren't "interesting".

The other test changes are uninteresting.  They're cases where LV's uniform
analysis is slightly weaker than SCEV isLoopInvariant.
2022-07-21 15:44:34 -07:00
Alexander Shaposhnikov e9afdf838e [GlobalOpt] Enable evaluation of atomic loads
Relax the check to allow evaluation of atomic loads
(but still skip volatile loads).

Test plan:
1/ ninja check-llvm check-clang
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D130211
2022-07-21 21:36:11 +00:00
Augie Fackler bd6aa67e02 BuildLibCalls: move inference of freeing memory later
This probably should have been part of D123089, but the effects of it
don't show up until we start removing functions from the table in
D130107. Oops.

Differential Revision: https://reviews.llvm.org/D130184
2022-07-21 15:31:16 -04:00
Sanjay Patel 78c09f0f24 [PatternMatch][InstCombine] match a vector with constant expression element(s) as a constant expression
The InstCombine test is reduced from issue #56601. Without the more
liberal match for ConstantExpr, we try to rearrange constants in
Negator forever.

Alternatively, we could adjust the definition of m_ImmConstant to be
more conservative, but that's probably a larger patch, and I don't
see any downside to changing m_ConstantExpr. We never capture and
modify a ConstantExpr; transforms just want to avoid it.

Differential Revision: https://reviews.llvm.org/D130286
2022-07-21 15:23:57 -04:00
David Sherwood f15b6b2907 [AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560
2022-07-21 17:20:06 +01:00
Nikita Popov 1f69503107 [MemoryBuiltins] Add getReallocatedOperand() function (NFC)
Replace the value-accepting isReallocLikeFn() overload with a
getReallocatedOperand() function, which returns which operand is
the one being reallocated. Currently, this is always the first one,
but once allockind(realloc) is respected, the reallocated operand
will be determined by the allocptr parameter attribute.
2022-07-21 14:54:16 +02:00
Nikita Popov 46e6dd84b7 [MemoryBuiltins] Remove isFreeCall() function (NFC)
Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)
2022-07-21 14:44:23 +02:00
Nikita Popov 5e856a8578 [InstCombine] Use getFreedOperand() (NFC)
Use getFreedOperand() instead of isFreeCall() to remove the
implicit assumption that any pointer operand to a free function
is the operand being freed. This won't actually matter until we
handle allockind(free).
2022-07-21 14:33:55 +02:00
Nikita Popov 3ac8587a2b [Attributor] Use getFreedOperand() (NFC)
Track which operand is actually freed, to avoid the implicit
assumption that it is the first call argument.
2022-07-21 14:26:47 +02:00
Nikita Popov c81dff3c30 [MemoryBuiltins] Add getFreedOperand() function (NFCI)
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.

To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.

This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
2022-07-21 12:39:35 +02:00
Nikita Popov 8d58c8e57b Reapply [InstCombine] Don't check for alloc fn before fetching alloc size
Reapply the patch with getObjectSize() replaced by getAllocSize().
The former will also look through calls that return their argument,
and we'll end up placing dereferenceable attributes on intrinsics
like llvm.launder.invariant.group. While this isn't wrong, it also
doesn't seem to be particularly useful. For now, use getAllocSize()
instead, which sticks closer to the original behavior of this code.

-----

This code is just interested in the allocsize, not any other
allocator properties.
2022-07-21 11:48:24 +02:00
Nikita Popov 70056d04e2 Revert "[InstCombine] Don't check for alloc fn before fetching object size"
This reverts commit c72c22c04d.

This affected an Analysis test that I missed. Reverting for now.
2022-07-21 10:59:12 +02:00
Nikita Popov c72c22c04d [InstCombine] Don't check for alloc fn before fetching object size
This code is just interested in the allocsize, not any other
allocator properties.
2022-07-21 10:45:03 +02:00
Nikita Popov f45ab43332 [MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc
Alloc directly checking whether a given call is a removable
allocation, instead of first checking whether it is an allocation
first.
2022-07-21 09:39:19 +02:00
Chenbing Zheng 8c124c9088 [InstCombine] (ShiftValC >> Y) >s -1/<s 0 --> Y != 0/==0
We can do folds (ShiftValC >> Y) >s -1 --> Y != 0 and
(ShiftValC >> Y) <s 0 --> Y == 0, with ShiftValC < 0.

Alive2: https://alive2.llvm.org/ce/z/-PRHfD

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129726
2022-07-21 10:12:29 +08:00
Chenbing Zheng 8075f680c8 [InstCombine] add fold (X > C - 1) ^ (X < C + 1) --> X != C
Considering the correctness of this pattern, we should avoid that C - 1
is non-negative and C + 1 is negative.

Alive2: https://alive2.llvm.org/ce/z/c_rBaq

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129622
2022-07-21 10:08:21 +08:00
Johannes Doerfert ad98ef8be4 [Attributor] Deal with complex PHI nodes better during AAPointerInfo
We were quite conservative when it came to PHI node handling to avoid
recursive reasoning. Now we check more direct if we have seen a PHI
already or not. This allows non-recursive PHI chains to be handled.

This also exposed a bug as we did only model the effect of one loop
traversal. `phi_no_store_3` has been adapted to show how we would have
used `undef` instead of `1` before. With this patch we don't replace
it at all, which is expected as we do not argue about loop iterations
(or alignments).
2022-07-20 17:34:50 -05:00
Johannes Doerfert 142897dd7d [Attributor] Only non-exact accesses require a uniform bit-pattern (=0)
If we only have exact accesses we should never require the bit-pattern
to be uniform (in this case 0). Only a non-exact access should force us
to require only 0 values.
2022-07-20 17:34:50 -05:00
Alexander Shaposhnikov 67f1fe8597 [GlobalOpt] Enable evaluation of atomic stores
Relax the check to allow evaluation of atomic stores
(but still skip volatile stores).

Test plan:
1/ ninja check-llvm check-clang
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D129841
2022-07-20 22:33:58 +00:00
Schrodinger ZHU Yifan 304027206c [ThinLTO] Support aliased GlobalIFunc
Fixes https://github.com/llvm/llvm-project/issues/56290: when an ifunc is
aliased in LTO, clang will attempt to create an alias summary; however, as ifunc
is not included in the module summary, doing so will lead to crash.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D129009
2022-07-20 15:30:38 -07:00
Craig Topper d76c8f5127 [InstCombine] Add mul with negated power of 2 constant to canEvaluateShifted.
If we are right shifting a multiply by a negated power of 2 where
the power of 2 is the same as the shift amount, we can replace with
a negate followed by an And.

New tests have not been committed yet but the patch shows the diffs.
Let me know if you want any changes or additional tests.

Differential Revision: https://reviews.llvm.org/D130103
2022-07-20 11:00:22 -07:00
Ruobing Han 2b98b8e8fb fix bug for useless malloc elimination in CodeGenPrepare
Put AllocationFn check before I->willReturn can allow CodeGenPrepare to remove useless malloc instruction

Differential Revision: https://reviews.llvm.org/D130126
2022-07-20 16:29:51 +00:00
Philip Reames 523a526a02 [LV] Fix miscompile due to srem/sdiv speculation safety condition
An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block.

Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose.

Differential Revision: https://reviews.llvm.org/D130106
2022-07-20 05:35:23 -07:00
Nicolai Hähnle 1ddc51d89d Inliner: don't mark call sites as 'nounwind' if that would be redundant
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.

If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.

v2:
- just check CI->doesNotThrow

v3 (resubmit after revert at 3443788087):
- update Clang tests

Differential Revision: https://reviews.llvm.org/D129860
2022-07-20 14:17:23 +02:00
Florian Hahn 5124b21648
[VPlan] Initial def-use verification.
This patch introduces some initial def-use verification. This catches
cases like the one fixed by D129436.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D129717
2022-07-20 11:06:32 +01:00
Fangrui Song e931c2e870 [LegacyPM] Remove InstrOrderFileLegacyPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline.
2022-07-19 23:58:51 -07:00
Kazu Hirata 0387da6f4f Use value instead of getValue (NFC) 2022-07-19 21:18:26 -07:00
Kazu Hirata 41ae78ea3a Use has_value instead of hasValue (NFC) 2022-07-19 20:15:44 -07:00
Johannes Doerfert f84712f0b8 [Attributor] Teach checkForAllUses to follow returns into callers
If we can determine all call sites we can follow a use in a return
instruction into the caller. AAPointerInfo utilizes this feature.
2022-07-19 18:17:40 -05:00
Johannes Doerfert 4f2ccdd0b1 [Attributor][NFC] Improve debug messages 2022-07-19 18:17:40 -05:00
Nick Desaulniers 1cf6b93df1 Revert "[Local] Allow creating callbr with duplicate successors"
This reverts commit 08860f525a.

Crashes during PPC64LE linux kernel builds as reported by @nathanchance.
https://reviews.llvm.org/D129997#3663632
2022-07-19 15:03:27 -07:00
Johannes Doerfert bf789b1957 [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.

Fixes: https://github.com/llvm/llvm-project/issues/54981

Note: A previous version was flawed and consequently reverted in
      6555558a80.
2022-07-19 16:24:42 -05:00
Arthur Eubanks 13aa2c1c3b [DSE] Revisit pointers that may no longer escape after removing another store
In dependent-capture, previously we'd see that %tmp4 is captured due to
the first store. We'd cache this info in CapturedBeforeReturn and
InvisibleToCallerAfterRet. Then the first store is then removed, causing
the cached values to be wrong.

We also need to revisit everything because normally we work backwards
when removing stores at the end of the function, but in this case
removing an earlier store causes a later store to be removable.

No compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=56796ae1a8db4c85dada28676f8303a5a3609c63&to=21b7e5248ffc423cd36c9d4a020085e363451465&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D123686
2022-07-19 09:30:34 -07:00
Sanjay Patel 3d6c10dcf3 [SimplifyLibCalls] avoid converting pow() to powi() with no FMF
powi() is not a standard math library function; it is specified
with non-strict semantics in the LangRef. We currently require
'afn' to do this transform when it needs a sqrt(), so I just
extended that requirement to the whole-number exponent too.

This bug was introduced with:
b17754bcaa
...where we deferred expansion of pow() to later passes.
2022-07-19 12:26:53 -04:00
Arnold Schwaighofer bc4870f09e [coro async] Add missing llvm.coro.id.async intrinsic to declaresCoroCleanupIntrinsics
rdar://97214593

Differential Revision: https://reviews.llvm.org/D130038
2022-07-19 07:25:04 -07:00
Andrew Turner b850762b62 Add the FreeBSD AArch64 memory layout
Use the FreeBSD AArch64 memory layout values when building for it.
These are based on the x86_64 values, scaled to take into account the
larger address space on AArch64.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125883
2022-07-19 09:58:07 -04:00
Andrew Turner e13bd2644e Add the FreeBSD AArch64 shadow offset to llvm
AArch64 has a larger address space than 64 but x86. Use the larger
shadow offset on FreeBSD AArch64.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125873
2022-07-19 09:58:07 -04:00
William Schmidt bccc9aa81c Don't vectorize PHIs in catchswitch blocks
We currently assert in vectorizeTree(TreeEntry*) when processing a PHI
bundle in a block containing a catchswitch.  We attempt to set the
IRBuilder insertion point following the catchswitch, which is invalid.
This is done so that ShuffleBuilder.finalize() knows where to insert
a shuffle if one is needed.

To avoid this occurring, watch out for catchswitch blocks during
buildTree_rec() processing, and avoid adding PHIs in such blocks to
the vectorizable tree.  It is unlikely that constraining vectorization
over an exception path will cause a noticeable performance loss, so
this seems preferable to trying to anticipate when a shuffle will and
will not be required.
2022-07-19 06:10:17 -07:00
Nikita Popov 08860f525a [Local] Allow creating callbr with duplicate successors
Since D129288, callbr is allowed to have duplicate successors. This
patch removes a limitation which prevents optimizations from actually
producing such callbrs.

Differential Revision: https://reviews.llvm.org/D129997
2022-07-19 14:28:22 +02:00
Florian Hahn a75760a269
[LV] Remove unnecessary cast in widenCallInstruction. (NFC) 2022-07-19 11:23:24 +01:00
Max Kazantsev 82309831c3 [LoopSimplifyCFG] Prevent use-def dominance breach by handling dead exits. PR56243
One of the transforms in LoopSimplifyCFG demands that the LCSSA form is
truly maintained for all values, tokens included, otherwise it may end up creating
a use that is not dominated by def (and Phi creation for tokens is impossible).
Detect this situation and prevent transform for it early.

Differential Revision: https://reviews.llvm.org/D129984
Reviewed By: efriedma
2022-07-19 15:54:12 +07:00
Ellis Hoag 3580daacf3 [InstrProf] Allow CSIRPGO function entry coverage
The flag `-fcs-profile-generate` for enabling CSIRPGO moves the pass
`pgo-instrumentation` after inlining. Function entry coverage works fine
with this change, so remove the assert. I had originally left this
assert in because I had not tested this at the time.

Reviewed By: davidxl, MaskRay

Differential Revision: https://reviews.llvm.org/D129407
2022-07-18 15:10:11 -07:00
Florian Hahn 30e53b8c03
[LV] Sink module variable and use State to set it in widenCall. (NFC)
Limits the lifetime of the variable and makes it independent of
CallInst.
2022-07-18 19:41:48 +01:00
Arnold Schwaighofer 28ebd13d63 [coro async] Fix code to run coro.async.end cleanup like the legacy pass did
The code executed for the Switch ABI does not change.

rdar://97074714

Differential Revision: https://reviews.llvm.org/D129865
2022-07-18 10:41:29 -07:00
Nicolai Hähnle 3443788087 Revert "Inliner: don't mark call sites as 'nounwind' if that would be redundant"
This reverts commit 9905c37981.

Looks like there are Clang changes that are affected in trivial ways. Will look into it.
2022-07-18 17:43:35 +02:00
Nicolai Hähnle 9905c37981 Inliner: don't mark call sites as 'nounwind' if that would be redundant
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.

If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.

v2:
- just check CI->doesNotThrow

Differential Revision: https://reviews.llvm.org/D129860
2022-07-18 17:28:52 +02:00
Sanjay Patel 26fbb79c33 [InstCombine] reduce code for signbit folds; NFC 2022-07-18 11:04:58 -04:00
Nikita Popov 21e2f133a8 [LoopSimplifyCFG] Revert accidental change
This change was included in an unrelated change
b57d61384c
and was of course not intended for commit...
2022-07-18 15:30:13 +02:00
Nikita Popov b57d61384c [ConstantRangeTest] Move nowrap binop tests to generic infrastructure (NFC)
Move testing for add/sub with nowrap flags to TestBinaryOpExhaustive,
rather than separate homegrown exhaustive testing functions.
2022-07-18 15:14:17 +02:00
Kristina Bessonova 44736c1d49 [CloneFunction][DebugInfo] Avoid cloning DILexicalBlocks of inlined subprograms
If DISubpogram was not cloned (e.g. we are cloning a function that has other
functions inlined into it, and subprograms of the inlined functions are
not supposed to be cloned), it doesn't make sense to clone its DILexicalBlocks
as well. Otherwise we'll get duplicated DILexicalBlocks that may confuse
debug info emission in AsmPrinter.

I believe it also makes no sense cloning any DILocalVariables or maybe
other local entities, if their parent subprogram was not cloned, cause
they will be dangling and will not participate in futher emission.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D127102
2022-07-18 13:14:52 +02:00
Nikita Popov 8201e3ef5c [BasicBlockUtils] Don't drop callbr with unique successor
As callbr is now allowed to have duplicate destinations, we can
have a callbr with a unique successor. Make sure it doesn't get
dropped, as we still need to preserve the side-effect.
2022-07-18 12:26:29 +02:00
Nikita Popov 4fba35f973 [InstCombine] Clarify invoke/callbr handling in constexpr call fold (NFCI)
We only need to check the block for the normal/default destination,
not for other destinations. Using the value in those would be
illegal anyway.

The callbr case cannot actually happen here, because callbr is
currently limited to inline asm. Retaining it to match the spirit
of the original code.
2022-07-18 12:02:46 +02:00
Florian Hahn 105032f549
[LV] Use PHI recipe instead of PredRecipe for subsequent uses.
At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of
the predicate recipe. This incorrectly models the def-use chains, as all
later uses should use the phi recipe. Fix that by delaying recording of
the recipe.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D129436
2022-07-18 09:35:34 +01:00
Nikita Popov 11079e8820 [IR] Don't treat callbr as indirect terminator
Callbr is no longer an indirect terminator in the sense that is
relevant here (that it's successors cannot be updated). The primary
effect of this change is that callbr no longer prevents formation
of loop simplify form.

I decided to drop the isIndirectTerminator() method entirely and
replace it with isa<IndirectBrInst>() checks. I assume this method
was added to abstract over indirectbr and callbr, but it never
really caught on, and there is nothing left to abstract anymore
at this point.

Differential Revision: https://reviews.llvm.org/D129849
2022-07-18 09:32:08 +02:00
Fangrui Song 0e3447bf8a [LegacyPM] Remove WholeProgramDevirt
Unused after LTO removal from legacy optimization passline.
2022-07-17 23:14:53 -07:00
Fangrui Song 1f90cc589e [LegacyPM] Remove FunctionImportLegacyPass
Unused after ThinLTO was removed from legacy optimization pipeline.
2022-07-17 23:06:46 -07:00
Kazu Hirata 7094ab4ee7 [llvm] Modernize bool literals (NFC)
Identified with modernize-use-bool-literals.
2022-07-17 18:08:51 -07:00
Kazu Hirata 3112987d5c Remove unused forward declarations (NFC) 2022-07-17 15:37:48 -07:00
Kazu Hirata 8b3ed1fa98 Remove redundant return statements (NFC)
Identified with readability-redundant-control-flow.
2022-07-17 15:37:46 -07:00
Fangrui Song bbaa015e82 [LegacyPM] Remove LowerTypeTestsPass
Unused after LTO removal from optimization passline.
2022-07-17 15:06:38 -07:00
Fangrui Song a6942256ca [LegacyPM] Remove NameAnonGlobalLegacyPass
Unused after LTO removal from optimization passline.
2022-07-17 14:38:29 -07:00
Fangrui Song d74b88c69d [LegacyPM] Remove CanonicalizeAliasesLegacyPass
Unused after LTO removal from optimization passline.
2022-07-17 14:30:22 -07:00
Fangrui Song 70519a1fba [LegacyPM] Remove LTO passes from optimization pipeline
Following recent changes removing non-core features of the legacy
PM/optimization pipeline.
2022-07-17 14:24:36 -07:00
Fangrui Song f502115561 [LegacyPM] Remove PGO options from PassManagerBuilder
They have been dead since legacy PGO/SamplePGO passes were removed.
2022-07-17 14:03:23 -07:00
Fangrui Song dd5e3f0e27 [LegacyPM] Remove SampleProfileLoaderLegacyPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline (e.g. PGO), remove SamplePGO.
2022-07-17 12:09:46 -07:00
Florian Hahn cc0ee17951
[LV] Move VPPredInstPHIRecipe::execute to VPlanRecipes.cpp (NFC) 2022-07-17 11:34:23 +01:00
zhongyunde 3a6b766b1b [IndVars] Directly use unsigned integer induction for FPToUI/FPToSI of float induction
Depend on D129358

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129756
2022-07-17 10:48:35 +08:00
Florian Hahn 6813b41d57
[LV] Avoid creating new run-time VF expression for each runtime checks.
At the moment, the cost of runtime checks for scalable vectors is
overestimated due to creating separate vscale * VF expressions for each
check. Instead re-use the first expression.
2022-07-16 17:24:07 +01:00
David Green 4b7913c357 [VectorCombine] Only consider shuffle uses with the same type.
The backend getShuffleCosts do not currently handle shuffles that change
size very well. Limit the shuffles we collect to the same type to make
sure they do not cause issues as reported in D128732.
2022-07-16 13:23:39 +01:00
Fangrui Song f9d6f37201 [LegacyPM] Remove ControlHeightReductionLegacyPass
This pass tries to reduce the number of conditional branches in the hot path
based on profile. It's mostly a no-op after legacy PGO passes are moved.
2022-07-16 01:35:56 -07:00
Fangrui Song 3a42c499c2 [LegacyPM] Remove createInstrProfilingLegacyPass
Follow the steps of removing non-core instrumentation passes like PGO.
2022-07-16 01:26:40 -07:00
Fangrui Song 685775bbab [LegacyPM] Remove CGProfileLegacyPass
It's mostly a no-op after I removed legacy PGO passes in D123834.
2022-07-16 00:39:56 -07:00
Fangrui Song df8f5be596 [LegacyPM] Remove ModuleSanitizerCoverageLegacyPass
Follow the steps of various other legacy instrumentation passes removed for
15.0.0.
2022-07-15 19:01:20 -07:00
Rong Xu 5e0443292b [PGO] Report number of counts being dropped when a hash-mismatch happens
This patch reports number of counts being dropped when a hash-mismatch
happens. This information will be helpful to the users -- if the dropped
counts are large, the user should redo the instrumentation build and
recollect the profile.

Differential Revision: https://reviews.llvm.org/D129001
2022-07-15 14:53:59 -07:00
Rong Xu 19ac75364f [PGO] Improve hash-mismatch warning message
This patch improves FDO hash-mismatch handling:
(1) filter out warnings to weak functions.
Weak functions definition will be overridden by a strong definition by linker.
The hash mismatch in profile use compilation is expected.
Make the profile hash mismatch warning under the existing option (default true).

(2) add an option to trace the hash of functions with the specific string.
Note that an empty string parameter will trace all functions.

Differential Revision: https://reviews.llvm.org/D129002
2022-07-15 13:44:55 -07:00
Philip Reames 6ab686eb86 [LSR] Allow already invariant operand for ICmpZero matching [try 2]
Changes since initial commit:

* Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in d767b392.
* isLoopInvariant returns true for instructions outside the loop, but not necessarily *above* the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in 2aed3cdb.

Original commit message follows...

The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.

As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?

Differential Revision: https://reviews.llvm.org/D129793
2022-07-15 13:29:43 -07:00
Warren Ristow c650793049 [Reassociate] Enable FP reassociation via 'reassoc' and 'nsz'
Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as
expected, and that enables FP reassociation. Only the two FMF flags
'reassoc' and 'nsz' are technically required to perform reassociation,
but disabling other unrelated FMF bits is needlessly suppressing the
optimization.

This patch fixes that needless suppression, and makes appropriate
adjustments to test-cases, fixing some outstanding TODOs in the process.

Fixes: #56483

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129523
2022-07-15 11:44:35 -07:00
Philip Reames 6fe766beba Revert "[LSR] Allow already invariant operand for ICmpZero matching"
This reverts commit 9153515a7b.  Builtbot crash was reported in the commit thread, reverting while investigating.
2022-07-15 10:47:57 -07:00
Florian Hahn aa00fb02c9
[LV] Use umax(VF * UF, MinProfTC) for scalable vectors.
For scalable vectors, it is not sufficient to only check
MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because
this property may not holder for larger values of vscale. In those
cases, compute umax(VF * UF, MinProfTC) instead.

This should fix
https://lab.llvm.org/buildbot/#/builders/197/builds/2262
2022-07-15 10:23:14 -07:00
Philip Reames 9153515a7b [LSR] Allow already invariant operand for ICmpZero matching
The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.

As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?

Differential Revision: https://reviews.llvm.org/D129793
2022-07-15 09:51:00 -07:00
Nikita Popov 8a519b3c21 [InstCombine] Ensure constant folding in binop of select fold
When folding a binop into a select, we need to ensure that one
of the select arms actually does constant fold, otherwise we'll
create two binop instructions and perform the reverse transform.

Ensure this by performing an explicit constant folding attempt,
and failing the transform if neither side simplifies.

A simple alternative here would have been to limit the fold to
ImmConstants, but given the current representation of scalable
vector splats, this wouldn't be ideal.
2022-07-15 11:03:10 +02:00
Mel Chen bd404fbcc8 [LV][NFC] Fix the condition for printing debug messages
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D128523
2022-07-15 01:47:33 -07:00
Nikita Popov f75ccadcdd [LSR] Create SCEVExpander earlier, use member isSafeToExpand() (NFC)
This is a followup to D129630, which switches LSR to the member
isSafeToExpand() variant, and removes the freestanding function.

This is done by creating the SCEVExpander early (already during the
analysis phase). Because the SCEVExpander is now available for the
whole lifetime of LSRInstance, I've also made it into a member
variable, rather than passing it around in even more places.

Differential Revision: https://reviews.llvm.org/D129769
2022-07-15 09:41:23 +02:00
Craig Topper 0e718443c7 [SimplifyIndVar] Use enum class for ExtendKind. NFC
I happened to notice a two places where the enum was being pass
directly to the bool IsSigned argument of createExtendInst. This
was functionally ok since SignExtended in the enum has value
of 1, but the code shouldn't rely on that.

Using an enum class prevents the enum from being convertible to bool,
but does make writing the enum values more verbose. Since we now
have to write ExtendKind:: in front of them, I've shortened the
names of ZeroExtended and SignExtended.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129733
2022-07-14 10:03:58 -07:00
Philip Reames 3bc09c7da5 [SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case
Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all.

SCEVExpander needs(*) to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values.

vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes.

(*) As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference.

Differential Revision: https://reviews.llvm.org/D129710
2022-07-14 08:56:58 -07:00
Brendon Cahoon 58fec78231 Revert "[UnifyLoopExits] Reduce number of guard blocks"
This reverts commit e13248ab0e.

Need to revert because the transformation cannot occur for basic
blocks that contain convergent instructions.
2022-07-14 10:33:52 -05:00
Warren Ristow 230c8c56f2 [Reassociate] Cleanup minor missed optimizations
In analyzing issue #56483, it was noticed that running `opt` with
`-reassociate` was missing some minor optimizations. For example,
there were cases where the running `opt` on IR with floating-point
instructions that have the `fast` flags applied, sometimes resulted in
less efficient code than the input IR (things like dead instructions
left behind, and missed reassociations). These were sometimes noted
in the test-files with TODOs, to investigate further. This commit
fixes some of these problems, removing some TODOs in the process.

FTR, I refer to these as "minor" missed optimizations, because when
running a full clang/llvm compilation, these inefficiencies are not
happening, as other passes clean that residue up. Regardless, having
cleaner IR produced by `opt`, makes assessing the quality of fixes done
in `opt` easier.
2022-07-14 08:21:04 -07:00
Brendon Cahoon c945d88d2b Revert "[StructurizeCFG] Improve basic block ordering"
This reverts commit f1b05a0a2b.

Need to revert to due to issues identified with testing. The
transformation is incorrect for blocks that contain convergent
instructions.
2022-07-14 09:40:51 -05:00
Nikita Popov 9e6e631b38 [LoopPredication] Use isSafeToExpandAt() member function (NFC)
As a followup to D129630, this switches a usage of the freestanding
function in LoopPredication to use the member variant instead. This
was the last use of the freestanding function, so drop it entirely.
2022-07-14 14:49:07 +02:00
Nikita Popov dcf4b733ef [SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506)
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.

Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.

Fixes https://github.com/llvm/llvm-project/issues/50506.

Differential Revision: https://reviews.llvm.org/D129630
2022-07-14 14:41:51 +02:00
zhongyunde fc6092fd4d [IndVars] Eliminate redundant type cast between unsigned integer and float
Extend for unsigned integer according the comment of D129191.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129358
2022-07-14 19:41:07 +08:00
Nikita Popov 7a43b382ce [IndVars] Make sure header phi simplification preserves LCSSA form
When simplifying instructions, make sure that the replacement
preserves LCSSA form. This fixes the issue reported at:
https://reviews.llvm.org/D129293#3650851
2022-07-14 11:46:48 +02:00
Nikita Popov ebc54e0cd4 [SCCP] Make check for unknown/undef in unary op handling more explicit (NFCI)
Make the implementation more similar to other functions, by
explicitly skipping an unknown/undef first, and always falling
back to overdefined at the end. I don't think it makes a difference
now, but could make one once the constant evaluation can fail. In
that case we would directly mark the result as overdefined now,
rather than keeping it unknown (and later making it overdefined
because we think it's undef-based).
2022-07-14 10:56:11 +02:00
Nikita Popov 6db3edc858 [SCCP] Don't check for UndefValue before calling markConstant()
The value lattice explicitly represents undef, and markConstant()
internally checks for UndefValue and will create an undef rather
than constant lattice element in that case.

This is mostly a code simplification, it has little practical impact
because we usually get undef results from undef operands, and those
don't get processed.

Only leave the check behind for the CmpInst case, because it
currently goes through this incorrect code in the getCompare()
implementation: f98697642c/llvm/include/llvm/Analysis/ValueLattice.h (L456-L457)

Differential Revision: https://reviews.llvm.org/D128330
2022-07-14 10:05:56 +02:00
Kazu Hirata 611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Florian Hahn ee37ae91b6
[VPlan] Move VPBB verification to separate function (NFC). 2022-07-13 18:53:40 -07:00
Florian Hahn 6f7347b888
[LV] Use PredRecipe directly instead of getOrAddVPValue (NFC).
There is no need to look up the VPValue for Instr, PredRecipe can be
used directly.
2022-07-13 17:01:42 -07:00
Alexander Shaposhnikov c916840539 [SimplifyCFG] Improve SwitchToLookupTable optimization
Try to use the original value as an index (in the lookup table)
in more cases (to avoid one subtraction and shorten the dependency chain)
(https://github.com/llvm/llvm-project/issues/56189).

Test plan:
1/ ninja check-all
2/ bootstrapped LLVM + Clang pass tests

Differential revision: https://reviews.llvm.org/D128897
2022-07-13 23:21:45 +00:00
Leonard Chan 21f72c05c4 [hwasan] Add __hwasan_add_frame_record to the hwasan interface
Hwasan includes instructions in the prologue that mix the PC and SP and store
it into the stack ring buffer stored at __hwasan_tls. This is a thread_local
global exposed from the hwasan runtime. However, if TLS-mechanisms or the
hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls.
This is the case for Fuchsia where we instrument libc, so some functions that
are instrumented but can run before hwasan initialization will incorrectly
access this global. Additionally, libc cannot have any TLS variables, so we
cannot weakly define __hwasan_tls until the runtime is loaded.

A way we can work around this is by moving the instructions into a hwasan
function that does the store into the ring buffer and creating a weak definition
of that function locally in libc. This way __hwasan_tls will not actually be
referenced. This is not our long-term solution, but this will allow us to roll
out hwasan in the meantime.

This patch includes:

- A new llvm flag for choosing to emit a libcall rather than instructions in the
  prologue (off by default)
- The libcall for storing into the ringbuffer (__hwasan_add_frame_record)

Differential Revision: https://reviews.llvm.org/D128387
2022-07-13 15:15:15 -07:00
Leonard Chan d843d5c8e6 Revert "[hwasan] Add __hwasan_record_frame_record to the hwasan interface"
This reverts commit 4956620387.

This broke a sanitizer builder: https://lab.llvm.org/buildbot/#/builders/77/builds/19597
2022-07-13 15:06:07 -07:00
Florian Hahn 225e3ec622
[LV] Move VPBranchOnMaskRecipe::execute to VPlanRecipes.cpp (NFC). 2022-07-13 14:39:59 -07:00
leonardchan 4956620387 [hwasan] Add __hwasan_record_frame_record to the hwasan interface
Hwasan includes instructions in the prologue that mix the PC and SP and store
it into the stack ring buffer stored at __hwasan_tls. This is a thread_local
global exposed from the hwasan runtime. However, if TLS-mechanisms or the
hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls.
This is the case for Fuchsia where we instrument libc, so some functions that
are instrumented but can run before hwasan initialization will incorrectly
access this global. Additionally, libc cannot have any TLS variables, so we
cannot weakly define __hwasan_tls until the runtime is loaded.

A way we can work around this is by moving the instructions into a hwasan
function that does the store into the ring buffer and creating a weak definition
of that function locally in libc. This way __hwasan_tls will not actually be
referenced. This is not our long-term solution, but this will allow us to roll
out hwasan in the meantime.

This patch includes:

- A new llvm flag for choosing to emit a libcall rather than instructions in the
  prologue (off by default)
- The libcall for storing into the ringbuffer (__hwasan_record_frame_record)

Differential Revision: https://reviews.llvm.org/D128387
2022-07-14 05:07:11 +08:00
Martin Sebor ab7ee3c991 [InstCombine] Enable strtol folding with nonnull endptr
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129593
2022-07-13 09:26:34 -06:00
Nikita Popov 07146a9e64 [SCCP] Fix typo in previous commit
Ooops, I tested a build from the wrong checkout.
2022-07-13 16:22:40 +02:00
Nikita Popov e298dfbc1b [SCCP] Avoid ConstantExpr::get() call
Use ConstantFoldUnaryOpOperand() API instead. This is in
preparation for removing fneg constant expressions.
2022-07-13 16:20:34 +02:00
Max Kazantsev 62f4572e45 [IndVars][NFC] Make IVOperand parameter an instruction 2022-07-13 19:07:16 +07:00
Max Kazantsev 30e33b4b81 [SCEV][NFC] Make getStrengthenedNoWrapFlagsFromBinOp return optional 2022-07-13 18:54:25 +07:00
David Sherwood 307ace7f20 [LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs
When vectorising ordered reductions we call a function
LoopVectorizationPlanner::adjustRecipesForReductions to replace the
existing VPWidenRecipe for the fadd instruction with a new
VPReductionRecipe. We attempt to insert the new recipe in the same
place, but this is wrong because createBlockInMask may have
generated new recipes that VPReductionRecipe now depends upon. I
have changed the insertion code to append the recipe to the
VPBasicBlock instead.

Added a new RUN with tail-folding enabled to the existing test:

  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D129550
2022-07-13 09:29:25 +01:00
Nikita Popov af49bed933 [IndVars] Simplify instructions after replacing header phi with preheader value
After replacing a loop phi with the preheader value, it's usually
possible to simplify some of the using instructions, so do that as
part of replaceLoopPHINodesWithPreheaderValues().

Doing this as part of IndVars is valuable, because it may make GEPs
in the loop have constant offsets and allow the following SROA run
to succeed (as demonstrated in the PhaseOrdering test).

Differential Revision: https://reviews.llvm.org/D129293
2022-07-13 10:27:04 +02:00
Nikita Popov a5ee62a141 [IndVars] Call replaceLoopPHINodesWithPreheaderValues() for already constant exits
Currently we only call replaceLoopPHINodesWithPreheaderValues() if
optimizeLoopExits() replaces the exit with an unconditional exit.
However, it is very common that this already happens as part of
eliminateIVComparison(), in which case we're leaving behind the
dead header phi.

Tweak the early bailout for already-constant exits to also call
replaceLoopPHINodesWithPreheaderValues().

Differential Revision: https://reviews.llvm.org/D129214
2022-07-13 09:43:21 +02:00
Augie Fackler 9029bda041 [Attributor] Don't crash if getAnalysisResultForFunction() returns null LoopInfo
I have no idea what's going on here. This code was moved
around/introduced in change cb26b01d57 and starts crashing with a NULL
dereference once I apply https://reviews.llvm.org/D123090. I assume that
I've unwittingly taught the attributor enough that it's able to do more
clever things than in the past, and it's able to trip on this case. I
make no claims about the correctness of this patch, but it passes tests
and seems to fix all the crashes I've been seeing.

Differential Revision: https://reviews.llvm.org/D129589
2022-07-12 16:44:06 -04:00
Yuanfang Chen fcb7d76d65 [coroutine] add nomerge function attribute to `llvm.coro.save`
It is illegal to merge two `llvm.coro.save` calls unless their
`llvm.coro.suspend` users are also merged. Marks it "nomerge" for
the moment.

This reverts D129025.

Alternative to D129025, which affects other token type users like WinEH.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D129530
2022-07-12 10:39:38 -07:00
Nick Desaulniers 2240d72f15 [X86] initial -mfunction-return=thunk-extern support
Adds support for:
* `-mfunction-return=<value>` command line flag, and
* `__attribute__((function_return("<value>")))` function attribute

Where the supported <value>s are:
* keep (disable)
* thunk-extern (enable)

thunk-extern enables clang to change ret instructions into jmps to an
external symbol named __x86_return_thunk, implemented as a new
MachineFunctionPass named "x86-return-thunks", keyed off the new IR
attribute fn_ret_thunk_extern.

The symbol __x86_return_thunk is expected to be provided by the runtime
the compiled code is linked against and is not defined by the compiler.
Enabling this option alone doesn't provide mitigations without
corresponding definitions of __x86_return_thunk!

This new MachineFunctionPass is very similar to "x86-lvi-ret".

The <value>s "thunk" and "thunk-inline" are currently unsupported. It's
not clear yet that they are necessary: whether the thunk pattern they
would emit is beneficial or used anywhere.

Should the <value>s "thunk" and "thunk-inline" become necessary,
x86-return-thunks could probably be merged into x86-retpoline-thunks
which has pre-existing machinery for emitting thunks (which could be
used to implement the <value> "thunk").

Has been found to build+boot with corresponding Linux
kernel patches. This helps the Linux kernel mitigate RETBLEED.
* CVE-2022-23816
* CVE-2022-28693
* CVE-2022-29901

See also:
* "RETBLEED: Arbitrary Speculative Code Execution with Return
Instructions."
* AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion
* TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0
  2022-07-12
* Return Stack Buffer Underflow / Return Stack Buffer Underflow /
  CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702

SystemZ may eventually want to support "thunk-extern" and "thunk"; both
options are used by the Linux kernel's CONFIG_EXPOLINE.

This functionality has been available in GCC since the 8.1 release, and
was backported to the 7.3 release.

Many thanks for folks that provided discrete review off list due to the
embargoed nature of this hardware vulnerability. Many Bothans died to
bring us this information.

Link: https://www.youtube.com/watch?v=IF6HbCKQHK8
Link: https://github.com/llvm/llvm-project/issues/54404
Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html
Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html

Reviewed By: aaron.ballman, craig.topper

Differential Revision: https://reviews.llvm.org/D129572
2022-07-12 09:17:54 -07:00
David Sherwood 6b694d600a [LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF
When calculating the cost of Instruction::Br in getInstructionCost
we query PredicatedBBsAfterVectorization to see if there is a
scalar predicated block. However, this meant that the decisions
being made for a given fixed-width VF were affecting the cost for a
scalable VF. As a result we were returning InstructionCost::Invalid
pointlessly for a scalable VF that should have a low cost. I
encountered this for some loops when enabling tail-folding for
scalable VFs.

Test added here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll

Differential Revision: https://reviews.llvm.org/D128272
2022-07-12 14:53:20 +01:00
Nikita Popov 3d475dfeb9 [Mem2Reg] Consistently preserve nonnull assume for uninit load
When performing a !nonnull load from uninitialized memory, we
should preserve the nonnull assume just like in all other cases.
We already do this correctly in the generic mem2reg code, but
don't handle this case when using the optimized single-block
implementation.

Make sure that the optimized implementation exhibits the same
behavior as the generic implementation.
2022-07-12 12:53:08 +02:00
Kazu Hirata ec9a0e36d9 [IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC)
The last uses were removed on Apr 15, 2022 in commit
2e6ac54cf4.

Differential Revision: https://reviews.llvm.org/D129460
2022-07-11 20:15:24 -07:00
Florian Hahn 5d135041c5
[LV] Move VPBlendRecipe::execute to VPlanRecipes.cpp (NFC). 2022-07-11 16:01:07 -07:00
Justin Cady 3d438ceed1 [InstrProf] Mark __llvm_profile_runtime hidden to match libclang_rt.profile definition
Mark the symbol hidden to match INSTR_PROF_PROFILE_RUNTIME_VAR in compiler-rt.

Fixes second issue discussed at https://discourse.llvm.org/t/63090

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D128842
2022-07-11 11:29:20 -07:00
David Sherwood 03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
David Sherwood 02d6950d84 [LoopVectorize][NFC] Add optional Name parameter to VPInstruction
This patch is a simple piece of refactoring that now permits users
to create VPInstructions and specify the name of the value being
generated. This is useful for creating more readable/meaningful
names in IR.

Differential Revision: https://reviews.llvm.org/D128982
2022-07-11 09:23:24 +01:00
Florian Hahn 6a4bc452f8
[LV] Move VPWidenGEPRecipe::execute to VPlanRecipes.cpp (NFC). 2022-07-10 17:10:17 -07:00
Florian Hahn 13ae213469
[LV] Move VPWidenRecipe::execute to VPlanRecipes.cpp (NFC). 2022-07-09 18:46:57 -07:00
Paul Osmialowski b17754bcaa [SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer value
Since the backend's codegen is capable to expand powi into fmul's, it
is not needed anymore to do so in the ::optimizePow() function of
SimplifyLibCalls.cpp. What is sufficient is to always turn pow(x, n)
into powi(x, n) for the cases where n is a constant integer value.

Dropping the current expansion code allowed relaxation of the folding
conditions and now this can also happen at optimization levels below
Ofast.

The added CodeGen/AArch64/powi.ll test case ensures that powi is
actually expanded into fmul's, confirming that this refactor did not
cause any performance degradation.

Following an idea proposed by David Sherwood <david.sherwood@arm.com>.

Differential Revision: https://reviews.llvm.org/D128591
2022-07-09 12:00:22 -04:00
Florian Hahn 0c27b38849
[VPlan] Move VPWidenSelectRecipe::execute to VPlanRecipes.cpp (NFC).
Depends on D127968.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127970
2022-07-08 09:35:23 -07:00
Nikita Popov d287051404 [InstCombine] Avoid ConstantExpr::get() in vector binop fold (NFCI)
Use the ConstantFoldBinaryOpOperands() API instead. This case
would bail out on a non-folded result anyway.
2022-07-08 17:20:14 +02:00
Nikita Popov 29c6bf45c3 [InstCombine] Avoid ConstantExpr::get() call
Avoid calling ConstantExpr::get() for associative/commutative
binops, call ConstantFoldBinaryOpOperands() instead. We only
want to perform the reassociation of the constants actually fold.
2022-07-08 17:13:06 +02:00
Nikita Popov fc18a88231 [InstCombine] Avoid creating float binop ConstantExprs
Replace ConstantExpr:getFAdd etc with call to
ConstantFoldBinaryOpOperands(). I'm using the constant folding API
rather than IRBuilder here to ensure that this does actually
constant fold. These transforms don't use m_ImmConstant(), so this
would not otherwise be guaranteed (and apparently, they can't use
m_ImmConstant because they want to handle scalable vector splats).

There is an opportunity here to further migrate these to the
ConstantFoldFPInstOperands() API, which would respect the denormal
mode. I've held off on doing so here, because some of this code
explicitly checks for denormal results, and I don't want to touch
it in a mostly NFC change.
2022-07-08 16:36:04 +02:00
Sanjay Patel 79bb915fb6 [InstCombine] enhance fold for subtract-from-constant -> xor
A low-bit mask is not required:
https://alive2.llvm.org/ce/z/yPShss

This matches the SDAG implementation that was updated at:
8b75671314
2022-07-08 10:02:19 -04:00
zhongyunde 716e1b856a [IndVars] Eliminate redundant type cast between integer and float
Recompute the range: match for fptosi of sitofp, and then query the range of the input to the sitofp
according the comment on D129140.

Fixes https://github.com/llvm/llvm-project/issues/55505.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129191
2022-07-08 17:07:20 +08:00
ChenYang Li 6d036b83d1 [JumpThreading] Avoid threadThroughTwoBasicBlocks when PredPred BB ends with indirectbranch
Since we can't change the destination of indirectbr, so when
encounter indirectbr as PredPredBB terminator, we should pass it.

Differential Revision: https://reviews.llvm.org/D129193
2022-07-08 09:29:17 +02:00
Nikita Popov 34a5c2bcf2 [BasicBlockUtils] Allow critical edge splitting with callbr terminators
After D129205, we support SplitBlockPredecessors() for predecessors
with callbr terminators. This means that it is now also safe to
invoke critical edge splitting for an edge coming from a callbr
terminator. Remove checks in various passes that were protecting
against that.

Differential Revision: https://reviews.llvm.org/D129256
2022-07-08 09:20:44 +02:00
Craig Topper 0266773464 [SLP] Add missing space to optimization remark.
Reviewed By: vporpo

Differential Revision: https://reviews.llvm.org/D129330
2022-07-07 23:29:11 -07:00
Johannes Doerfert f6e0c05e3d Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit f17639ea0c as three
AMDGPU tests haven't been updated. Will need to verify the changes are
not regressions we should avoid.
2022-07-08 00:53:38 -05:00
Johannes Doerfert f17639ea0c [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.

Fixes: https://github.com/llvm/llvm-project/issues/54981

Note: A previous version was flawed and consequently reverted in
      6555558a80.
2022-07-08 00:38:27 -05:00
Johannes Doerfert cb26b01d57 [Attributor] Make heap2stack record alloca placement
We recently learned to place the alloca during the heap2stack
transformation in the entry block but we did not account for other
concurrent modifications. We need to record our decision rather than
checking (then outdated) passes during the manifest stage. This will
also allow us to use a custom (=optimistic) "loop info" in the future.
2022-07-07 16:49:22 -05:00
Johannes Doerfert efe8c581ff [Attributor][NFC] Improve heap2stack result readability and code style 2022-07-07 16:49:22 -05:00
Johannes Doerfert c771eaf07e [OpenMP] Ensure to not use SPMD mode in the absence of parallel regions 2022-07-07 16:49:22 -05:00
Leonard Chan 0f589826a3 [hwasan] Refactor frame record info into function
This way it can be reused easily in D128387.

Note this changes the IR slightly. Before The steps for calculating and storing the frame record info were:

1. getPC
2. getSP
3. inttoptr
4. or SP, PC
5. store

Now the steps are:

1. getPC
2. getSP
3. or SP, PC
4. inttoptr
5. store

Differential Revision: https://reviews.llvm.org/D129315
2022-07-07 14:44:39 -07:00
Martin Sebor 516915beb5 [InstCombine] Fold memchr and strchr equality with first argument
Enhance memchr and strchr handling to simplify calls to the functions
used in equality expressions with the first argument to at most two
integer comparisons:

- memchr(A, C, N) == A to N && *A == C for either a dereferenceable
  A or a nonzero N,
- strchr(S, C) == S to *S == C for any S and C, and
- strchr(S, '\0') == 0 to true for any S

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128939
2022-07-07 15:14:23 -06:00
Zaara Syeda 58b9666dc1 [LSR] Fix bug - check if loop has preheader before calling isInductionPHI
Fix bug exposed by https://reviews.llvm.org/D125990
rewriteLoopExitValues calls InductionDescriptor::isInductionPHI which requires
the PHI node to have an incoming edge from the loop preheader. This adds checks
before calling InductionDescriptor::isInductionPHI to see that the loop has a
preheader. Also did some refactoring.

Differential Revision: https://reviews.llvm.org/D129297
2022-07-07 15:11:33 -04:00
Daniel Bertalan ef7aed3e11 [InstCombine] Do not fold 'and (sext (ashr X, Shift)), C' if Shift < 0
The 'and (sext (ashr X, ShiftC)), C' --> 'lshr (sext X), ShiftC'
transformation would access out of bounds bits in APInt::getLowBitsSet
if the shift count was larger than X's bit width or if it was negative.

Fixes #56424
2022-07-07 19:13:55 +02:00
Joseph Huber 41fba3c107 [Metadata] Add 'exclude' metadata to add the exclude flags on globals
This patchs adds a new metadata kind `exclude` which implies that the
global variable should be given the necessary flags during code
generation to not be included in the final executable. This is done
using the ``SHF_EXCLUDE`` flag on ELF for example. This should make it
easier to specify this flag on a variable without needing to explicitly
check the section name in the target backend.

Depends on D129053 D129052

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D129151
2022-07-07 12:20:40 -04:00
Joseph Huber ed801ad5e5 [Clang] Use metadata to make identifying embedded objects easier
Currently we use the `embedBufferInModule` function to store binary
strings containing device offloading data inside the host object to
create a fatbinary. In the case of LTO, we need to extract this object
from the LLVM-IR. This patch adds a metadata node for the embedded
objects containing the embedded pointers and the sections they were
stored at. This should create a cleaner interface for identifying these
values.

In the future it may be worthwhile to also encode an `ID` in the
metadata corresponding to the object's special section type if relevant.
This would allow us to extract the data from an object file and LLVM-IR
using the same ID.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D129033
2022-07-07 12:20:25 -04:00
Florian Hahn bc19b7c3cc
[LV] Remove collectTriviallyDeadInstructions, already handled by VP DCE.
Now that removeDeadRecipes can remove most dead recipes across a whole
VPlan, there is no need to first collect some dead instructions.
Instead removeDeadRecipes can simply clean them up.

Depends D127580.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D128408
2022-07-07 08:40:27 -07:00
Sander de Smalen 519d7876cb [VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector.
This addresses https://github.com/llvm/llvm-project/issues/56377

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D129136
2022-07-07 08:37:04 +00:00
Nikita Popov 40a4078e14 [BasicBlockUtils] Allow splitting predecessors with callbr terminators
SplitBlockPredecessors currently asserts if one of the predecessor
terminators is a callbr. This limitation was originally necessary,
because just like with indirectbr, it was not possible to replace
successors of a callbr. However, this is no longer the case since
D67252. As the requirement nowadays is that callbr must reference
all blockaddrs directly in the call arguments, and these get
automatically updated when setSuccessor() is called, we no longer
need this limitation.

The only thing we need to do here is use replaceSuccessorWith()
instead of replaceUsesOfWith(), because only the former does the
necessary blockaddr updating magic.

I believe there's other similar limitations that can be removed,
e.g. related to critical edge splitting.

Differential Revision: https://reviews.llvm.org/D129205
2022-07-07 09:13:25 +02:00
Chuanqi Xu 66e15d4c01 [NFC] [Coroutines] Update the comments for lowering coro.save
The original comment is not right. We don't store 0 all the time.
2022-07-07 14:57:41 +08:00
Florian Hahn 17d48c3169
[VPlan] Move remove dead recipes before merging regions.
This can enable additional region merging,  while not losing
opportunities as region merging does not produce dead recipes.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D128831
2022-07-06 20:38:38 -07:00
Chuanqi Xu e3b4452e07 [Debug] [Coroutines] Get rid of DW_ATE_address
Closing https://github.com/llvm/llvm-project/issues/55916

This patch tries to get rid of DW_ATE_address and enhance the test
coverage.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D127625
2022-07-07 10:47:09 +08:00
Chuanqi Xu 7137ebc4ce [Debug] [Coroutine] Adjust the scope and name for coroutine frame
Previously the scope of debug type of __coro_frame is limited in the
current function. It looked good at the first sight. But it prevent us
to print the type in splitted functions and other functions. Also the
debug type is different for different coroutine functions. So it makes
sense to rename the debug type to make it related to the function name.

After this patch, we could access the coroutine frame type in a function
by `function_name.coro_frame_ty`.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D127623
2022-07-07 10:35:32 +08:00
Vir Narula 89a99ec900
[GVN] Bug fix to reportMayClobberedLoad remark
Bug fix to avoid assert crashing when generating remarks for GVN crashing.

Intention of assert is correct but ignores edge case of instructions being equivalent.

Reduced input that causes crash when remarks are turned on:
```
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx12.0.0"

define ptr @ReplaceWithTidy(ptr %zz_hold) {
cond.end480.us:
  %0 = load ptr, ptr null, align 8
  store ptr %0, ptr %0, align 8
  store ptr null, ptr %zz_hold, align 8
  %1 = load ptr, ptr %0, align 8
  store ptr %1, ptr null, align 8
  ret ptr null
}
```

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D129235
2022-07-06 17:42:05 -07:00
Wolfgang Pieb ff87ee4dee [Metadata] Utilize the resizing capability of MDNodes in Moduleflag processing.
This mostly affects PGO/LTO builds which use module flags describing the call
graph. Fixes Issue #51893.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D125999
2022-07-06 10:18:33 -07:00
Nikola Tesic b5b6d3a41b [Debugify] Port verify-debuginfo-preserve to NewPM
Debugify in OriginalDebugInfo mode, introduced with D82545,
runs only with legacy PassManager.

This patch enables this utility for the NewPM.

Differential Revision: https://reviews.llvm.org/D115351
2022-07-06 17:07:20 +02:00
Shilei Tian 1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Nikita Popov 20962c1240 [SimplifyCFG] Don't split predecessors of callbr terminator
This addresses the assertion failure reported in
https://reviews.llvm.org/D124159#3631240.

I believe that this limitation in SplitBlockPredecessors is not
actually necessary (because unlike with indirectbr, callbr is
restricted in a way that does allow updating successors), but for
now fix the assertion failure the same way we do everywhere else,
by also skipping callbr.
2022-07-06 15:38:53 +02:00
Dimitrije Milosevic 9f492a9ae5 [MIPS] Fix the ASAN shadow offset hook for the N32 ABI
Currently, LLVM doesn't have the correct shadow offset
mapping for the n32 ABI.
This patch introduces the correct shadow offset value
for the n32 ABI - 1ULL << 29.

Differential Revision: https://reviews.llvm.org/D127096
2022-07-06 12:44:28 +02:00
Nikita Popov f96cb66d19 [ValueTracking] Accept Instruction in isSafeToSpeculativelyExecute() (NFC)
As constant expressions can no longer trap, it only makes sense to
call isSafeToSpeculativelyExecute on Instructions, so limit the
API to accept only them, rather than general Operators or Values.
2022-07-06 11:12:49 +02:00
Chenbing Zheng 851447cb32 [InstCombine] remove useless insertelement
extractelement (bitcast (insertelement (Vec, b)), a) ->
extractelement (bitcast (Vec), a)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128890
2022-07-06 17:05:27 +08:00
Nikita Popov 1ed8b29302 [LoopVectorizationLegality] Drop unused variable (NFC) 2022-07-06 10:43:39 +02:00
Nikita Popov 8ee913d83b [IR] Remove Constant::canTrap() (NFC)
As integer div/rem constant expressions are no longer supported,
constants can no longer trap and are always safe to speculate.
Remove the Constant::canTrap() method and its usages.
2022-07-06 10:36:47 +02:00
Yuanfang Chen b170d856a3 [SimplifyCFG] Skip hoisting common instructions that return token type
By LangRef, hoisting token-returning instructions obsures the origin
so it should be skipped. Found this issue while investigating a
CoroSplit pass crash.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129025
2022-07-05 11:21:57 -07:00
Zaara Syeda dbf6ab5ef9 [LSR] Fix bug for optimizing unused IVs to final values
This is a fix for a crash reported for https://reviews.llvm.org/D118808
The fix is to only consider PHINodes which are induction phis.
Fixes #55529

Differential Revision: https://reviews.llvm.org/D125990
2022-07-05 12:30:58 -04:00
David Green 5493f8fc59 [VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
  %x = shuffle %i1, %i2
  %y = shuffle %i1, %i2
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
  %x = shuffle %i1, %i2
  %y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

This is a recommit with some additional checks for supported forms and
out-of-bounds mask elements, with some extra tests.

Differential Revision: https://reviews.llvm.org/D128732
2022-07-05 17:16:18 +01:00
Nikita Popov a4772cbaf0 Revert "[SimplifyCFG] Thread branches on same condition in more cases (PR54980)"
This reverts commit 4e545bdb35.

The newly added test is the third infinite combine loop caused by
this change. In this case, it's a combination of the branch to
common dest and jump threading folds that keeps peeling off loop
iterations.

The core problem here is that we ideally would not thread over
loop backedges, both because it is potentially non-profitable
(it may break canonical loop structure) and because it may result
in these kinds of loops. Unfortunately, due to the lack of a
dominator tree in SimplifyCFG, there is no good way to prevent
this. While we have LoopHeaders, this is an optional structure and
we don't do a good job of keeping it up to date. It would be fine
for a profitability check, but is not suitable for a correctness
check.

So for now I'm just giving up here, as I don't see a good way to
robustly prevent infinite combine loops.

Fixes https://github.com/llvm/llvm-project/issues/56203.
2022-07-05 16:57:46 +02:00
Nikita Popov 935570b2ad [ConstExpr] Don't create div/rem expressions
This removes creation of udiv/sdiv/urem/srem constant expressions,
in preparation for their removal. I've added a
ConstantExpr::isDesirableBinOp() predicate to determine whether
an expression should be created for a certain operator.

With this patch, div/rem expressions can still be created through
explicit IR/bitcode, forbidding them entirely will be the next step.

Differential Revision: https://reviews.llvm.org/D128820
2022-07-05 15:54:53 +02:00
Nikita Popov dc969061c6 [SimplifyCFG] Thread all predecessors with same value at once
If there are multiple predecessors that have the same condition
value (and thus same "real destination"), these were previously
handled by copying the threaded block for each predecessor.
Instead, we can reuse one block for all of them. This makes the
behavior of SimplifyCFG's jump threading match that of the
actual JumpThreading pass.

This also avoids the infinite combine loop reported in:
https://reviews.llvm.org/D124159#3624387
2022-07-05 14:33:53 +02:00
Florian Hahn ebb78a95ce
[LV] Remove stray dbgs() call after 774fc63490. 2022-07-05 12:58:18 +01:00
Chenbing Zheng b43dd2f6c4 [InstCombine] improve fold for icmp_eq_and to icmp_ult
In D95959, the improve analysis for "C >> X" broken the fold
((%x & C) == 0) --> %x u< (-C) iff (-C) is power of two.

It simplifies C, but fails to satisfy the fold condition.
This patch try to restore C before the fold.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D128790
2022-07-05 17:18:23 +08:00
Chenbing Zheng b66220f25a [InstCombine] [NFC] use C.isNegatedPowerOf2() instead of (~C + 1).isPowerOf2()
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D129103
2022-07-05 17:04:59 +08:00
Florian Hahn 774fc63490
[LV] Consider minimum vscale assmuption for RT check cost.
For scalable VFs, the minimum assumed vscale needs to be included in the
cost-computation, otherwise a smaller VF may be used for RT check cost
computation than was used for earlier cost computations.

Fixes a RISCV test failing with UBSan due to both scalar and vector
loops having the same cost.
2022-07-05 09:41:58 +01:00
Nikita Popov b69c75d53f Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles"
This reverts commit 19a1e20b8a.

Clang crashes while linking bullet from llvm-test-suite in
ReleaseLTO-g cmake configuration.
2022-07-05 09:31:20 +02:00
zhongyunde b2b4c8721d [InstCombine] Make use of low zero bits to determine exact int->fp cast
According the comment https://reviews.llvm.org/D127854#inline-1226805,
We could also make use of these low zero bits, https://alive2.llvm.org/ce/z/GYxTRu

Reviewed By: spatel, nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D128895
2022-07-05 09:15:12 +08:00
Sanjay Patel 142aca7741 [InstCombine] fold sub of min/max of sub with common operand
x - max(x - y, 0) --> min(x, y)
  x - min(x - y, 0) --> max(x, y)

https://alive2.llvm.org/ce/z/2YkqFe

issue #55470
2022-07-04 18:55:24 -04:00
Sanjay Patel 4276d00b12 [InstCombine] add helper function for sub-of-min/max folds; NFC
The test diffs are cosmetic -- but improvements -- because we
let instcombine handle replacement. Instead of dropping the
old value name, it propagates to the new instruction.
2022-07-04 17:43:18 -04:00
Florian Hahn 2a82c15f63
[LV] Consider runtime checks profitable if scalar cost is zero.
This fixes an UBSan failure after 644a965c1e. When using
user-provided VFs/ICs (via the force-vector-width /
force-vector-interleave options) the scalar cost is zero, which would
cause divide-by-zero.

When forcing vectorization using the options, the cost of the runtime
checks should not block vectorization.
2022-07-04 21:37:16 +01:00
Florian Hahn 9eb6572786
[LV] Add back CantReorderMemOps remark.
Add back remark unintentionally dropped by 644a965c1e.

I will add a LV test separately, so we do not have to rely on a Clang
test to catch this.
2022-07-04 17:23:47 +01:00
Nikita Popov abbd684c02 [InstCombine] Avoid ConstantExpr::get() in phi binop fold
Use ConstantFoldBinaryOpOperands() instead, in preparation for not
all binops having a supported constant expression.
2022-07-04 16:46:27 +02:00
Peter Waller c146af3f46 [LoopVectorize][NFC] Reinstate TTICapture workaround for gcc-6
Fixes #56374.
2022-07-04 14:14:15 +00:00
Florian Hahn 644a965c1e
[LV] Vectorize cases with larger number of RT checks, execute only if profitable.
This patch replaces the tight hard cut-off for the number of runtime
checks with a more accurate cost-driven approach.

The new approach allows vectorization with a larger number of runtime
checks in general, but only executes the vector loop (and runtime checks) if
considered profitable at runtime. Profitable here means that the cost-model
indicates that the runtime check cost + vector loop cost < scalar loop cost.

To do that, LV computes the minimum trip count for which runtime check cost
+ vector-loop-cost < scalar loop cost.

Note that there is still a hard cut-off to avoid excessive compile-time/code-size
increases, but it is much larger than the original limit.

The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource
is mostly neutral, but the new approach can give substantial gains in cases where
we failed to vectorize before due to the over-aggressive cut-offs.

On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%)
and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system.

```
CFP2006/447.dealII/447.dealII    -1.9%
CINT2017rate/525.x264_r/525.x264_r    -2.2%
ASC_Sequoia/IRSmk/IRSmk       -9.2%
Rodinia/srad/srad     -36.1%
```

`size` regressions on AArch64 with -O3 are

```
MultiSource/Applications/hbd/hbd                 90256.00   106768.00 18.3%
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000     240676.00   257268.00  6.9%
MultiSourc...enchmarks/mafft/pairlocalalign     472603.00   489131.00  3.5%
External/S...2017rate/525.x264_r/525.x264_r     613831.00   630343.00  2.7%
External/S...NT2006/464.h264ref/464.h264ref     818920.00   835448.00  2.0%
External/S...te/538.imagick_r/538.imagick_r    1994730.00  2027754.00  1.7%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4    1236471.00  1253015.00  1.3%
MultiSource/Applications/oggenc/oggenc         2108147.00  2124675.00  0.8%
External/S.../CFP2006/447.dealII/447.dealII    4742999.00  4759559.00  0.3%
External/S...rate/510.parest_r/510.parest_r   14206377.00 14239433.00  0.2%
```

Reviewed By: lebedev.ri, ebrevnov, dmgreen

Differential Revision: https://reviews.llvm.org/D109368
2022-07-04 15:11:39 +01:00
David Green 2de05afc19 [SLP] Peek into loads when hitting the RecursionMaxDepth
This patch slightly extends the limit on the RecursionMaxDepth inside
the SLP vectorizer. It does it only when it hits a load (or zext/sext of
a load), which allows it to peek through in the places where it will be
the most valuable, without ballooning out the O(..) by any 2^n factors.

Differential Revision: https://reviews.llvm.org/D122148
2022-07-04 14:22:50 +01:00
Nikita Popov 93cbdaef04 [Reassociate] Avoid ConstantExpr::get()
Use ConstantFoldBinaryOpOperands() instead, to handle the case
where not all binary ops have a constant expression variant.

This is a bit awkward because we only want to pop the element from
Ops once we're sure that it has folded.
2022-07-04 15:17:22 +02:00
Nikita Popov 32a76fc292 [SCEVExpander] Avoid ConstantExpr::get() (NFCI)
Use ConstantFoldBinaryOpOperands() instead. This will be important
when not all binops have constant expression variants.
2022-07-04 14:59:00 +02:00
David Green 19a1e20b8a [VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
  %x = shuffle %i1, %i2
  %y = shuffle %i1, %i2
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
  %x = shuffle %i1, %i2
  %y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

Differential Revision: https://reviews.llvm.org/D128732
2022-07-04 13:38:43 +01:00
Nikita Popov 9604601c93 [SimplifyCFG] Remove redundant checks for hoisting (NFCI)
These conditions are later checked in the HoistTerminator code
path. Checking them here is somewhat confusing, because this code
only checks the first instruction in the block, which is not
necessarily the terminator.
2022-07-04 10:53:54 +02:00
Florian Hahn b4694229aa
[LV] Simplify setDebugLocFromInst by using early exit (NFC).
Suggested as separate improvement in D128657.
2022-07-04 09:25:26 +01:00
Sanjay Patel f9f40aa10d [InstCombine] fold negated low-bit-mask to cmp+select
(-(X & 1)) & Y --> (X & 1) == 0 ? 0 : Y
https://alive2.llvm.org/ce/z/rhpH3i

This is noted as a missing IR canonicalization in issue #55618.
We already managed to fix codegen to the expected form.
2022-07-03 12:25:26 -04:00
Nuno Lopes 53dc0f1078 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-07-03 14:34:03 +01:00
Nuno Lopes 022bd92c78 [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] 2022-07-03 12:32:19 +01:00
Florian Hahn b0da3c6fa4
[VPlan] Move setDebugLocFromInst to VPTransformState (NFC).
The moved helpers are only used for codegen. It will allow moving the
remaining ::execute implementations out of LoopVectorize.cpp.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D128657
2022-07-02 15:18:17 +01:00
Johannes Doerfert 07766f4070 [Attributor] Move heap2stack allocas to the entry block if possible
If we are certainly not in a loop we can directly emit the heap2stack
allocas in the function entry block. This will help to get rid of them
(SROA) and avoid stacksave/restore intrinsics when the function is
inlined.
2022-07-01 21:34:12 -05:00
Nuno Lopes 7c4f45f87a Revert [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC]
This reverts commits 47e6f98f84 and 3e701bcd2a
2022-07-01 23:53:41 +01:00
Nuno Lopes 47e6f98f84 [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] 2022-07-01 23:31:31 +01:00
Sanjay Patel 9c8a39c67b [InstCombine] restrict select of bit-tests to constant shift amounts
This transform is responsible for a long-standing miscompile
as discussed in issue #47012 (was bugzilla #47668).

There was a proposal to correct it in D88432, but that was
abandoned and there hasn't been any recent activity to fix
it AFAICT.

The original patch D45108 started with a constant-shift-only
restriction and only expanded during review, so I don't think
there's much risk of perf regression on the motivating code.
2022-07-01 16:24:34 -04:00
Martin Sebor 0d68ff87d2 [InstCombine] Transform strrchr to memrchr for constant strings
Add an emitter for the memrchr common extension and simplify the strrchr
call handler to use it. This enables transforming calls with the empty
string to the test C ? S : 0.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128954
2022-07-01 11:10:00 -06:00
Nikita Popov 65d59b4265 [LoopDeletion] Fix deletion with unusual predecessor terminator (PR56266)
LoopSimplify only requires that the loop predecessor has a single
successor and is safe to hoist into -- it doesn't necessarily have
to be an unconditional BranchInst.

Adjust LoopDeletion to assert conditions closer to what it actually
needs for correctness, namely a single successor and a
side-effect-free terminator (as the terminator is getting dropped).

Fixes https://github.com/llvm/llvm-project/issues/56266.
2022-07-01 16:13:35 +02:00
Florian Hahn 0dddf04cab
[LV] Don't optimize exit cond during epilogue vectorization.
At the moment, the same VPlan can be used code generation of both the
main vector and epilogue vector loop. This can lead to wrong results, if
the plan is optimized based on the VF of the main vector loop and then
re-used for the epilogue loop.

One example where this is problematic is if the scalar loops need to
execute at least one iteration, e.g. due to interleave groups.

To prevent mis-compiles in the short-term, disable optimizing exit
conditions for VPlans when using epilogue vectorization. The proper fix
is to avoid re-using the same plan for both loops, which will require
support for cloning plans first.

Fixes #56319.
2022-07-01 13:48:38 +01:00
Nikita Popov fabe915705 [SimplifyLibCalls] Use inbounds GEP
When converting strchr(p, '\0') to p + strlen(p) we know that
strlen() must return an offset that is inbounds of the allocated
object (otherwise it would be UB), so we can use an inbounds GEP.
An equivalent argument can be made for the other cases.
2022-07-01 14:31:44 +02:00
Sanjay Patel ab372cdd6f [InstCombine] add code comment for icmp transform; NFC
This was accidentally left out of cc88445a91
2022-07-01 08:21:55 -04:00
Florian Hahn 583abd0e36
[VPlan] Move addMetadata to VPTransformState (NFC).
The moved helpers are only used for codegen. It will allow moving the
remaining ::execute implementations out of LoopVectorize.cpp.

Depends on D127966.
Depends on D127965.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127968
2022-07-01 12:03:25 +01:00
Nikita Popov 9b994593cc [SCCP] Only handle unknown lattice values in resolvedUndefsIn()
This is a minor refinement of resolvedUndefsIn(), mostly for clarity.
If the value of an instruction is undef, then that's already a legal
final result -- we can safely rauw such an instruction with undef.
We only need to mark unknown values as overdefined, as that's the
result we get for an instruction that has not been processed because
it has an undef operand.

Differential Revision: https://reviews.llvm.org/D128251
2022-07-01 09:14:37 +02:00
Chen Zheng 39fe49aa57 [Inline] don't add noalias metadata for unknown objects.
The unidentified objects recognized in `getUnderlyingObjects` may
still alias to the noalias parameter because `getUnderlyingObjects`
may not check deep enough to get the underlying object because of
`MaxLookup`. The real underlying object for the unidentified object
 may still be the noalias parameter.

Originally Patched By: tingwang

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D127202
2022-07-01 02:16:55 -04:00
Alexey Bataev 4be3fc35aa [SLP][NFC]Cleanup up operands of the removed insertelements, NFC.
Replace all operands of the insertelement instruction, replaced by
shuffles, by poisons to avoid false-positive reports about incorrect function.
2022-06-30 17:51:43 -07:00
Nuno Lopes 373571dbb4 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-06-30 23:01:43 +01:00
William Huang a9119143a2 [InstCombine] Changing constant-indexed GEP of GEP to i8* for merging
When merging GEP of GEP with constant indices, if the second GEP's offset is not divisible by the first GEP's element size, convert both type to i8* and merge.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D125934
2022-06-30 21:26:11 +00:00
Nuno Lopes 0586d1cac2 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-06-30 21:47:31 +01:00
Craig Topper e633f8cd14 [InstCombine] Fix a Wparentheses warning in an assert. NFC 2022-06-30 13:03:32 -07:00
Sanjay Patel cc88445a91 [InstCombine] canonicalize 'icmp (trunc X), C' to 'icmp (X & Mask), C'
I looked at canonicalizing in the other direction, but that causes
many potential regressions and infinite loops because we already
(possibly wrongly) canonicalize "trunc X to i1" into an and+icmp.

This has a data layout restriction to avoid creating illegal
mask instructions, but we could remove that if we can show
that the backend can undo this when needed.

The motivating example from issue #56119 is modeled by the
PhaseOrdering test.
2022-06-30 15:51:39 -04:00
Martin Sebor 3a743a5892 [InstCombine] Fix memrchr logic error that prevents folding
Correct a logic bug in the memrchr enhancement added in D123629 that
makes it ineffective in a subset of cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128856
2022-06-30 11:35:26 -06:00
Nikita Popov f34dcf2763 [IRBuilder] Migrate all binops to folding API
Migrate all binops to use FoldXYZ rather than CreateXYZ APIs,
which are compatible with InstSimplifyFolder and fallible constant
folding.

Rather than continuing to add one method for every single operator,
add a generic FoldBinOp (plus variants for nowrap, exact and fmf
operators), which we would need anyway for CreateBinaryOp.

This change is not NFC because IRBuilder with InstSimplifyFolder
may perform more folding. However, this patch changes SCEVExpander
to not use the folder in InsertBinOp to minimize practical impact
and keep this change as close to NFC as possible.
2022-06-30 16:41:17 +02:00
Nikita Popov 588e229bf9 [VNCoercion] Separate constant/non-constant mem intrinsic implementations (NFCI)
This means we no longer need to have the same API between IRBuilder
and IRBuilderFolder.

The constant case is substantially simpler, so implementing it
separately isn't an undue burden.
2022-06-30 15:26:06 +02:00
Nikita Popov 014c4bdb9d [VNCoercion] Use ConstantFoldLoadFromConst API (NFCI)
Nowdays we have a generic constant folding API to load a type from
an offset. It should be able to do anything that VNCoercion can do.

This avoids the weird templating between IRBuilder and ConstantFolder
in one function, which is will stop working as the IRBuilderFolder
moves from CreateXYZ to FoldXYZ APIs.

Unfortunately, this doesn't eliminate this pattern from VNCoercion
entirely yet.
2022-06-30 14:52:27 +02:00
Florian Hahn 68884dde70
[LV] Move LoopVersioning creation to LVP::execute.
At the moment LoopVersioning is only created for inner-loop
vectorization. This patch moves it to LVP::execute, which means it will
also be added for epilogue vectorization. As a consequence, the proper
noalias metadata is now also added to epilogue vector loops.

LVer will be moved to VPTransformState as follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127966
2022-06-30 12:14:32 +01:00
Sanjay Patel 7c4b90a98d [InstCombine] fix overzealous assert in icmp-shr fold
The assert was added with 0399473de8 and is correct for that
pattern, but it is off-by-1 with the enhancement in d4f39d8333.

The transforms are still correct with the new pre-condition:
https://alive2.llvm.org/ce/z/6_6ghm
https://alive2.llvm.org/ce/z/_GTBUt

And as shown in the new test, the transform is expected with
'ult' - in that case, the icmp reduces to test if the shift
amount is 0.
2022-06-30 06:28:48 -04:00
Nikita Popov 1579fc62fe [Evaluator] Add missing LLVM_DEBUG()
Missed these in 41f0b6a781, resulting
in unconditional debug output.
2022-06-30 11:54:47 +02:00
Chen Zheng b05801de35 [InlineFunction] Only check pointer arguments for a call
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128529
2022-06-30 05:39:47 -04:00
Nikita Popov 41f0b6a781 [Evaluator] Use ConstantFoldInstOperands()
For instructions that don't need any special handling, use
ConstantFoldInstOperands(), rather than re-implementing individual
cases.

This is probably not NFC because it can handle cases the previous
code missed (e.g. vector operations).
2022-06-30 11:10:17 +02:00
Nikita Popov a6d4b4138f [ConstantFold] Supports compares in ConstantFoldInstOperands()
Support compares in ConstantFoldInstOperands(), instead of
forcing the use of ConstantFoldCompareInstOperands(). Also handle
insertvalue (extractvalue was already handled).

This removes a footgun, where many uses of ConstantFoldInstOperands()
need a separate check for compares beforehand. It's particularly
insidious if called on a constant expression, because it doesn't
fail in that case, but will just not do DL-dependent folding.
2022-06-30 11:05:24 +02:00
Florian Hahn 24b5f8e0d0
[VPlan] Make sure optimizeInductions removes wide ind from scalar plan.
In some cases, there may be widened users of inductions even though the
plan includes the scalar VF. In those cases, make sure we still replace
the VPWidenIntOrFpInductionRecipe with scalar steps, as otherwise we may
try to execute a VPWidenIntOrFpInductionRecipe with a scalar VF.

Alternatively the patch could also split the range if needed.

This fixes a crash exposed by D123720.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D128755
2022-06-30 09:11:48 +01:00
Nikita Popov 10c531cd5b [SCCP] Simplify CFG in SCCP as well
Currently, we only remove dead blocks and non-feasible edges in
IPSCCP, but not in SCCP. I'm not aware of any strong reason for
that difference, so this patch updates SCCP to perform the CFG
cleanup as well.

Compile-time impact seems to be pretty minimal, in the 0.05%
geomean range on CTMark.

For the test case from https://reviews.llvm.org/D126962#3611579
the result after -sccp now looks like this:

    define void @test(i1 %c) {
    entry:
      br i1 %c, label %unreachable, label %next
    next:
      unreachable
    unreachable:
      call void @bar()
      unreachable
    }

-jump-threading does nothing on this, but -simplifycfg will produce
the optimal result.

Differential Revision: https://reviews.llvm.org/D128796
2022-06-30 09:25:03 +02:00
Chuanqi Xu 0b5ead6590 [WebAssembly] Don't set musttail for coroutines when tail-call is not
enabled

The C++20 Coroutines couldn't be compiled to WebAssembly due to an
optimization named symmetric transfer requires the support for musttail
calls but WebAssembly doesn't support it yet.

This patch tries to fix the problem by adding a supportsTailCalls
method to TargetTransformImpl to skip the symmetric transfer when
tail-call feature is not supported.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D128794
2022-06-30 11:15:40 +08:00
zhongyunde 404479b4b0 [InstCombine] Use known bits to determine exact int->fp cast
Reviewed By: spatel, nikic

Differential Revision: https://reviews.llvm.org/D127854
2022-06-30 09:45:11 +08:00
Florian Hahn 6d5f814357
[LoopUnrollRuntime] Invalidate SCEV for exit phi in ConnectProlog.
ConnectProlog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47.

Fix is analog to cfc741bc0e.

Fixes #56286.
2022-06-29 20:28:43 +01:00
Florian Hahn 9a35f19e3e
[UnrollRuntime] Invalidate SCEVs for modified phis in ConnectEpilog.
ConnectEpilog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47.

Fix is analog to cfc741bc0e.

Fixes #56282.
2022-06-29 18:26:00 +01:00
Sanjay Patel d4f39d8333 [InstCombine] add fold for (ShiftC >> X) >u C
This is the 'ugt' sibling to:
0399473de8

Decrement the input compare constant (and implicitly
decrement the new compare constant):
https://alive2.llvm.org/ce/z/iELmct
2022-06-29 12:30:01 -04:00
Nikita Popov bdba8278d9 [VectorCombine] Avoid ConstantExpr::get() (NFC)
Use IRBuilder APIs instead, which will still constant fold.
2022-06-29 17:17:52 +02:00
Nikita Popov 2124b2f0e6 [JumpThreading] Avoid ConstantExpr::get() (NFCI)
This code requires the result to be an UndefValue/ConstantInt
anyway (checked by getKnownConstant), so we are only interested
in the case where this folds.
2022-06-29 16:43:05 +02:00
Nikita Popov df698a5762 [InstCombine] Avoid some calls to ConstantExpr::get() (NFCI)
Replace some calls to ConstantExpr::get() with IRBuilder APIs
(which will also constant fold if possible).
2022-06-29 16:26:02 +02:00
Nikita Popov 0af53fcb99 [SROA] Don't create constant expressions (NFC)
Use IRBuilder instead, which will fold these. Just to clarify
that this does not actually create any udiv expression.
2022-06-29 11:51:22 +02:00
Pavel Samolysov 3d9ce9e43d [ArgPromotion] Remove all the getters and ReplaceCallSite (NFC)
AARGetter is an abstraction over a source of the `AAResults` introduced
to support the legacy pass manager as well as the modern one. Since the
Argument Promotion pass doesn't support the legacy pass manager anymore,
the abstraction is not required and `AAResults` may be used directly.

The instance of the `FunctionAnalysisManager` is passed through the
functions to get all the required analyses just wherever they are
required and do not use the awkward getter callbacks.

The `ReplaceCallSite` parameter was required for the legacy pass manager
only and isn't used anymore, so the parameter has been eliminated.

Differential Revision: https://reviews.llvm.org/D128727
2022-06-29 10:45:11 +03:00
Pavel Samolysov 8958057fb1 [ArgPromotion] Move isDenselyPacked static member (NFC)
The `isDenselyPacked` static member of the `ArgumentPromotionPass` class
is not used in the class itself anymore. The single known user of the
function is in the `AttributorAttributes.cpp` file, so the function has
been moved into the file.

Differential Revision: https://reviews.llvm.org/D128725
2022-06-29 10:45:10 +03:00
Martin Sebor 8827679826 [InstCombine] Fold strncmp of constant arrays and variable size
Extend the solution accepted in D127766 to strncmp and simplify
strncmp(A, B, N) calls with constant A and B and variable N to
the equivalent of

  N <= Pos ? 0 : (A < B ? -1 : B < A ? +1 : 0)

where Pos is the offset of either the first mismatch between A
and B or the terminating null character if both A and B are equal
strings.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D128089
2022-06-28 15:59:14 -06:00
Martin Sebor e263a7670e [InstCombine] Look through more casts when folding memchr and memcmp
Enhance getConstantDataArrayInfo to let the memchr and memcmp library
call folders look through arbitrarily long sequences of bitcast and
GEP instructions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128364
2022-06-28 15:58:42 -06:00
Alexey Bataev bf4dcbd2df [SLP]Fix PR56251: Do not remove the reordering from the root node, being used as an operand.
If the root order itself does not require reordering, we can just
remove its reorder mask safely (e.g., if the root node is a vector of
phis). But if this node is used as an operand in the graph, we cannot
delete the reordering, need to keep it. Otherwise the graph nodes are
not synchronized with the operands. It may cause an extra gather
instruction(s) or a compiler crash.
Also, need to be very careful when selecting the gather nodes for
reordering since there might several gather nodes with the same scalars
and we can try to reorder just the same node many times instead of
different nodes.

Differential Revision: https://reviews.llvm.org/D128680
2022-06-28 13:42:05 -07:00
Leonard Chan 9553d69580 [NFC][HWASan] Refactor hwasan pass
This moves some code for getting PC and SP into their own functions. Since SP
is also retrieved in the prologue and getting the stack tag, we can cache the
SP if we get it once in the prologue. This caching will really only be relevant
in D128387 where StackBaseTag may not be set in the prologue if __hwasan_tls
is not used.

Differential Revision: https://reviews.llvm.org/D128551
2022-06-28 12:09:20 -07:00
Pavel Samolysov 170c4d21bd [ArgPromotion] Unify byval promotion with non-byval
It makes sense to handle byval promotion in the same way as non-byval
but also allowing `store` instructions. However, these should
use the same checks as the `load` instructions do, i.e. be part of the
`ArgsToPromote` collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted `alloca` instructions. To
optimize these new `alloca`s out, the `PromoteMemToReg` function from
`Transforms/Utils/PromoteMemoryToRegister.cpp` file is invoked after
promotion.

In order to let the `PromoteMemToReg` promote as many `alloca`s as it
is possible, there should be no `GEP`s from the `alloca`s. To
eliminate the `GEP`s, its own `alloca` is generated for every argument
part because a single `alloca` for the whole argument (that
significantly simplifies the code of the pass though) unfortunately
cannot be used.

The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676

Differential Revision: https://reviews.llvm.org/D125485
2022-06-28 15:19:58 +03:00
Mikhail Goncharov c6c124ca80 Fixed unused variable warning. 2022-06-28 11:44:16 +02:00
Florian Hahn 03975b7f0e
[VPlan] Move recipe implementations to separate file (NFC).
This patch moves the code for recipe implementations to a separate file.

The benefits are:
 * Keep VPlan.cpp smaller => faster compile-time during parallel builds.
 * Keep code for logical units together

As a follow-up I am also planning on moving all ::execute
implemetnations from LoopVectorize.cpp over to the new file, which
should help to reduce the size of the file a bit.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127965
2022-06-28 10:34:30 +01:00
Nikita Popov 5548e807b5 [IR] Remove support for extractvalue constant expression
This removes the extractvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
extractvalue is already not supported in bitcode, so we do not need
to worry about bitcode auto-upgrade.

Uses of ConstantExpr::getExtractValue() should be replaced with
IRBuilder::CreateExtractValue() (if the fact that the result is
constant is not important) or ConstantFoldExtractValueInstruction()
(if it is). Though for this particular case, it is also possible
and usually preferable to use getAggregateElement() instead.

The C API function LLVMConstExtractValue() is removed, as the
underlying constant expression no longer exists. Instead,
LLVMBuildExtractValue() should be used (which will constant fold
or create an instruction). Depending on the use-case,
LLVMGetAggregateElement() may also be used instead.

Differential Revision: https://reviews.llvm.org/D125795
2022-06-28 10:40:17 +02:00
Guillaume Chatelet 3c126d5fe4 [Alignment] Replace commonAlignment with std::min
`commonAlignment` is a shortcut to pick the smallest of two `Align`
objects. As-is it doesn't bring much value compared to `std::min`.

Differential Revision: https://reviews.llvm.org/D128345
2022-06-28 07:15:02 +00:00
wlei 7e86b13c63 [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127031
2022-06-27 23:22:21 -07:00
wlei aa58b7b1e3 [CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie
Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127026
2022-06-27 23:22:21 -07:00
Congzhe Cao b941857b40 [LoopInterchange] New cost model for loop interchange
This is another attempt to land this patch.

The patch proposed to use a new cost model for loop interchange,
which is obtained from loop cache analysis.

Given a loopnest, what loop cache analysis returns is a vector of
loops [loop0, loop1, loop2, ...] where loop0 should be replaced as
the outermost loop, loop1 should be placed one more level inside, and
loop2 one more level inside, etc. What loop cache analysis does is not
only more comprehensive than the current cost model, it is also a "one-shot"
query which means that we only need to query it once during the entire
loop interchange pass, which is better than the current cost model where
we query it every time we check whether it is profitable to interchange
two loops. Thus complexity is reduced, especially after D120386 where we
do more interchanges to get the globally optimal loop access pattern.

Updates made to test cases are mostly minor changes and some
corrections. One change that applies to all tests is that we added an option
`-cache-line-size=64` to the RUN lines. This is ensure that loop
cache analysis receives a valid number of cache line size for correct
analysis. Test coverage for loop interchange is not reduced.

Currently we did not completely remove the legacy cost model, but
keep it as fall-back in case the new cost model did not run successfully.
This is because currently we have some limitations in delinearization, which
sometimes makes loop cache analysis bail out. The longer term goal is to
enhance delinearization and eventually remove the legacy cost model
compeletely.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124926
2022-06-28 00:08:37 -04:00
Vitaly Buka 6824eee942 [asan] Add missing dependency on Demangle
Follow up to D127911.
2022-06-27 15:10:02 -07:00
Mitch Phillips dacfa24f75 Delete 'llvm.asan.globals' for global metadata.
Now that we have the sanitizer metadata that is actually on the global
variable, and now that we use debuginfo in order to do symbolization of
globals, we can delete the 'llvm.asan.globals' IR synthesis.

This patch deletes the 'location' part of the __asan_global that's
embedded in the binary as well, because it's unnecessary. This saves
about ~1.7% of the optimised non-debug with-asserts clang binary.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D127911
2022-06-27 14:40:40 -07:00
Philip Reames 20dd3297b1 [LV] Allow scalable vectorization with vscale = 1
This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

Differential Revision: https://reviews.llvm.org/D128542
2022-06-27 13:38:57 -07:00
Yuanfang Chen e2e9e708e5 [Coroutine] Remove the '!func_sanitize' metadata for split functions
There is no proper RTTI for these split functions. So just delete the
metadata.

Fixes https://github.com/llvm/llvm-project/issues/49689.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D116130
2022-06-27 12:09:13 -07:00
Yuanfang Chen 6678f8e505 [ubsan] Using metadata instead of prologue data for function sanitizer
Information in the function `Prologue Data` is intentionally opaque.
When a function with `Prologue Data` is duplicated. The self (global
value) references inside `Prologue Data` is still pointing to the
original function. This may cause errors like `fatal error: error in backend: Cannot represent a difference across sections`.

This patch detaches the information from function `Prologue Data`
and attaches it to a function metadata node.

This and D116130 fix https://github.com/llvm/llvm-project/issues/49689.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D115844
2022-06-27 12:09:13 -07:00
Joseph Huber c7243f21d3 [OpenMP] Only strip runtime attributes if needed
Summary:
Currently in OpenMPOpt we strip `noinline` attributes from runtime
functions. This is here because the device bitcode library that we link
has problems with needed definitions getting prematurely optimized out.
This is only necessary for OpenMP offloading to GPUs so we should narrow
the scope for where we spend time doing this. In the future this
shouldn't be necessary as we move to using a linked library rather than
pulling in a bitcode library in Clang.
2022-06-27 13:35:41 -04:00
Nikita Popov f65c88c42f [GlobalOpt] Fix memset handling in global ctor evaluation (PR55859)
The global ctor evaluator currently handles by checking whether the
memset memory is already zero, and skips it in that case. However,
it only actually checks the first byte of the memory being set.

This patch extends the code to check all bytes being set. This is
done byte-by-byte to avoid converting undef values to zeros in
larger reads. However, the handling is still not completely correct,
because there might still be padding bytes (though probably this
doesn't matter much in practice, as I'd expect global variable
padding to be zero-initialized in practice).

Mostly fixes https://github.com/llvm/llvm-project/issues/55859.

Differential Revision: https://reviews.llvm.org/D128532
2022-06-27 16:50:49 +02:00
Bradley Smith a83aa33d1b [IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976
2022-06-27 10:48:45 +00:00
Nikita Popov cde402778a [FunctionAttrs] Add missing pass dependency
This pass depends on AAResults. This fixes the ocaml IPO binding
tests.
2022-06-27 10:15:06 +02:00
Nikita Popov 217e85761c [ArgPromotion] Remove legacy PM support
Support for the legacy pass manager in ArgPromotion causes
complications in D125485. As the legacy pass manager for middle-end
optimizations is unsupported, drop ArgPromotion from the legacy
pipeline, rather than introducing additional complexity to deal
with it.

Differential Revision: https://reviews.llvm.org/D128536
2022-06-27 09:42:17 +02:00
Chuanqi Xu 24e53b01d5 Revert "[Coroutines] Only do symmetric transfer if optimization is on"
This reverts commit 7782e080e8. According
to the discussion of WG21, symmetric transfer is a desired feature.
2022-06-27 10:54:56 +08:00
Kazu Hirata d08f34b592 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-26 18:31:51 -07:00
Kazu Hirata a81b64a1fb [llvm] Use Optional::has_value instead of Optional::hasValue (NFC)
This patch replaces x.hasValue() with x.has_value() where x is not
contextually convertible to bool.
2022-06-26 16:10:42 -07:00
Nuno Lopes 6ef9a2ad01 [LICM] Use poison to replace unreachable values instead of undef [NFC] 2022-06-26 14:56:35 +01:00
Nuno Lopes 3fa2411dc5 [LoopSimplifyCFG] use poison when replacing dead instructions instead of undef [NFC] 2022-06-26 14:15:55 +01:00
Nuno Lopes d46fa1fc58 [ArgumentPromotion] use poison when replacing dead instructions instead of undef [NFC] 2022-06-26 13:44:05 +01:00
Kazu Hirata a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Pavel Samolysov 6e3d4712b9 [DeadArgElim] Replace insert with emplace (NFC) 2022-06-25 10:31:27 +03:00
Mitch Phillips f57066401e [HWASan] Use new IR attribute for communicating unsanitized globals.
Globals that shouldn't be sanitized are currently communicated to HWASan
through the use of the llvm.asan.globals IR metadata. Now that we have
an on-GV attribute, use it.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D127543
2022-06-24 12:04:11 -07:00
Mingming Liu e0d069598b [Inline] Annotate inline pass name with link phase information for analysis.
The annotation is flag gated; flag is turned off by default.

Differential Revision: https://reviews.llvm.org/D125495
2022-06-24 10:06:43 -07:00
Alexey Bataev 2faacf61a5 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-24 09:28:01 -07:00
Arthur Eubanks e422c0d3b2 [GlobalOpt] Perform store->dominated load forwarding for stored once globals
The initial land incorrectly optimized forwarding non-Constants in non-nosync/norecurse functions. Bail on non-Constants since norecurse should cause global -> alloca promotion anyway.

The initial land also incorrectly assumed that StoredOnceStore was the only store to the global, but it actually means that only one value other than the global initializer is stored. Add a check that there's only one store.

Compile time tracker:
https://llvm-compile-time-tracker.com/compare.php?from=c80b88ee29f34078d2149de94e27600093e6c7c0&to=ef2c2b7772424b6861a75e794f3c31b45167304a&stat=instructions

Reviewed By: nikic, asbirlea, jdoerfert

Differential Revision: https://reviews.llvm.org/D128128
2022-06-24 09:09:26 -07:00
Florian Hahn cb69ba4faa
[LV] Create RT checks once VF/IC are selected, track scalar cost.
This patch updates LV to generate runtime after the VF & IC are selected. It
allows deciding whether to vectorize with runtime checks or not based on
their cost compared to the vector loop.

It also updates VectorizationFactor to include the scalar cost.

Reviewed By: lebedev.ri, dmgreen

Differential Revision: https://reviews.llvm.org/D75981
2022-06-24 17:42:11 +02:00
Nikita Popov 871197d0a3 [MemoryBuiltins] Accept any value in getInitialValueOfAllocation() (NFC)
Drop the requirement that getInitialValueOfAllocation() must be
passed an allocator function, shifting the responsibility for
checking that into the function (which it does anyway). The
motivation is to avoid some calls to isAllocationFn(), which has
somewhat ill-defined semantics (given the number of
allocator-related attributes we have floating around...)

(For this function, all we eventually need is an allockind of
zeroed or uninitialized.)

Differential Revision: https://reviews.llvm.org/D127274
2022-06-24 16:08:07 +02:00
Nikita Popov e523baa664 [InlineFunction] Slightly clarify noalias scope calculation (NFC)
Rename CanDeriveViaCapture -> RequiresNoCaptureBefore, drop
unnecessary const cast, reformat some code avoid an ugly
super-indented comment.
2022-06-24 12:31:46 +02:00
Florian Hahn b18141a8f2
[VPlan] Set VFs included in plan before last set of VPTransforms (NFC).
This allows VPlanTransforms to query the VFs included in the plan in the
future.
2022-06-24 10:16:56 +02:00
Florian Hahn 92f87787b3
Recommit "[ConstraintElimination] Transfer info from ULT to signed system."
This reverts commit 94ed2caf70.

The issue with no-determinism with the test has been fixed in
d9526e8a52.
2022-06-24 09:27:14 +02:00
Evgenii Stepanov 878309cc54 Revert "[LoopInterchange] New cost model for loop interchange"
llvm/lib/Analysis/LoopCacheAnalysis.cpp:702:30: runtime error: signed
integer overflow: 6148914691236517209 * 100 cannot be represented in
type 'long'

https://lab.llvm.org/buildbot/#/builders/5/builds/25185

This reverts commit 1b24fe34b0.
2022-06-23 16:10:53 -07:00
Congzhe Cao 1b24fe34b0 [LoopInterchange] New cost model for loop interchange
This is the second attempt to land this patch.

The patch proposed to use a new cost model for loop interchange,
which is obtained from loop cache analysis.

Given a loopnest, what loop cache analysis returns is a vector of
loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the
outermost loop, loop1 should be placed one more level inside, and loop2
one more level inside, etc. What loop cache analysis does is not only more
comprehensive than the current cost model, it is also a "one-shot" query
which means that we only need to query it once during the entire loop
interchange pass, which is better than the current cost model where we
query it every time we check whether it is profitable to interchange two
loops. Thus complexity is reduced, especially after D120386 where we do
more interchanges to get the globally optimal loop access pattern.

Updates made to test cases are mostly minor changes and some corrections.
One change that applies to all tests is that we added an option
`-cache-line-size=64` to the RUN lines. This is ensure that loop cache
analysis receives a valid number of cache line size for correct analysis.
Test coverage for loop interchange is not reduced.

Currently we did not completely remove the legacy cost model, but keep it
as fall-back in case the new cost model did not run successfully. This is
because currently we have some limitations in delinearization, which sometimes
makes loop cache analysis bail out. The longer term goal is to enhance
delinearization and eventually remove the legacy cost model compeletely.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124926
2022-06-23 16:34:57 -04:00
Philip Reames 46ea4b5ea1 [LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter
If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it.  (Well, we could, but we haven't implemented that case.)  This change replaces an assert with a cost-model bailout such that we reject vectorization with the scalable VF instead of crashing.
2022-06-23 12:41:09 -07:00
Alexey Bataev 3b6edef15d [SLP]Fix a crash when reorder masked gather nodes with reused scalars.
If the masked gather nodes must be reordered, we can just reorder
scalars, just like for gather nodes. But if the node contains reused
scalars, it must be handled same way as a regular vectorizable node,
since need to reorder reused mask, not the scalars directly.

Differential Revision: https://reviews.llvm.org/D128360
2022-06-23 11:32:30 -07:00
Florian Hahn d9526e8a52
[ConstraintElimination] Use stable_sort to sort worklist.
If there are multiple constraints in the same block, at the moment the
order they are processed may be different depending on the sort
implementation.

Use stable_sort to ensure consistent ordering.
2022-06-23 19:22:15 +02:00
Florian Hahn 94ed2caf70
Revert "[ConstraintElimination] Transfer info from ULT to signed system."
This reverts commit 316e106f49.

This breaks a bot with expensive checks.
2022-06-23 17:27:33 +02:00
Florian Hahn 316e106f49
[ConstraintElimination] Transfer info from ULT to signed system.
If A u< B holds, then A s>= 0 && A s< B holds if B s>= 0.

https://alive2.llvm.org/ce/z/RrNxHh
2022-06-23 17:17:01 +02:00
Florian Hahn 9a33f3975e
[ConstraintElimination] Transfer info from SLT to unsigned system.
If A s< B holds, then A u< also holds, if A s>= 0.

https://alive2.llvm.org/ce/z/J4JZuN
2022-06-23 15:57:59 +02:00
chenglin.bi 30e49a3794 [InstCombine] Optimise shift+and+boolean conversion pattern to simple comparison
if (`C1` is pow2) & (`(C2 & ~(C1-1)) + C1)` is pow2):
    ((C1 << X) & C2) == 0 -> X >= (Log2(C2+C1) - Log2(C1));
https://alive2.llvm.org/ce/z/EJAl1R
    ((C1 << X) & C2) != 0 -> X  < (Log2(C2+C1) - Log2(C1));
https://alive2.llvm.org/ce/z/3bVRVz

And remove dead code.

Fix: https://github.com/llvm/llvm-project/issues/56124

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126591
2022-06-23 21:53:07 +08:00
Florian Hahn 569d84fe99
[VPlan] Remove dead recipes across whole plan.
This extends removeDeadRecipe to remove recipes across the whole plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127580
2022-06-23 13:36:02 +02:00
Florian Hahn 24a98881cd
[ConstraintElimination] Transfer info from SGT to unsigned system.
If A >s B then A >=u 0, if B >=s -1.

https://alive2.llvm.org/ce/z/cncGKi
2022-06-23 11:04:51 +02:00
Fangrui Song 1ffd2d99c2 Revert D115462 "[SLP]Improve shuffles cost estimation where possible."
This reverts commit cac60940b7.

Caused -Os -fsanitize=memory -march=haswell miscompile to pytorch/cpuinfo.
See my latest comment (may update) on D115462.
2022-06-22 23:16:31 -07:00
Fangrui Song a411bc11d6 Revert "[SLP]Fix a crash when insert subvector is out of range."
This reverts commit f1ee2738b3.

Revert due to the revert of a dependent commit `[SLP]Improve shuffles cost estimation where possible.`
2022-06-22 23:16:25 -07:00
Serguei Katkov 5e1ccdf960 [RS4GC] Handle freeze case for vector
Finding BDV for vector value does not handle freeze instruction.
Adding its handling as it is done for scalar case.

Reviewed By: apilipenko
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D128254
2022-06-23 11:58:41 +07:00
Mingming Liu bc856eb3fc [SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information.
Differential Revision: https://reviews.llvm.org/D126833
2022-06-22 17:00:53 -07:00
Florian Mayer 9320a32bb9 [MTE] [HWASan] Use LoopInfo for reachability queries.
The reachability queries default to "reachable" after exploring too many
basic blocks. LoopInfo helps it skip over the whole loop.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D127917
2022-06-22 15:28:49 -07:00
Adrian Tong 4e555a3df4 Fix a misspell. NFC 2022-06-22 21:23:21 +00:00
Brendon Cahoon f1b05a0a2b [StructurizeCFG] Improve basic block ordering
StructurizeCFG linearizes the successors of branching basic block
by adding Flow blocks to record the true/false path for branches
and back edges. This patch reduces the number of Phi values needed
to capture the control flow path by improving the basic block
ordering.

Previously, StructurizeCFG adds loop exit blocks outside of the
loop. StructurizeCFG sets a boolean value to indicate the path
taken, and all exit block live values extend to after the loop.
For loops with a large number of exits blocks, this creates a
huge number of values that are maintained, which increases
compilation time and register pressure. This is problem
especially with ASAN, which adds early exits to blocks with
unreachable instructions for each instrumented check in the loop.

In specific cases, this patch reduces the number of values needed
after the loop by moving the exit block into the loop. This is
done for blocks that have a single predecessor and single successor
by moving the block to appear just after the predecessor.

Differential Revision: https://reviews.llvm.org/D123231
2022-06-22 16:10:41 -05:00
Brendon Cahoon e13248ab0e [UnifyLoopExits] Reduce number of guard blocks
UnifyLoopExits creates a single exit, a control flow hub, for
loops with multiple exits. There is an input to the block for
each loop exiting block and an output from the block for each
loop exit block. Multiple checks, or guard blocks, are needed
to branch to the correct exit block.

For large loops with lots of exit blocks, all the extra guard
blocks cause problems for StructurizeCFG and subsequent passes.
This patch reduces the number of guard blocks needed when the
exit blocks branch to a common block (e.g., an unreachable
block). The guard blocks are reduced by changing the inputs
and outputs of the control flow hub. The inputs are the exit
blocks and the outputs are the common block.

Reducing the guard blocks enables StructurizeCFG to reorder the
basic blocks in the CFG to reduce the values that exit a loop
with multiple exits. This reduces the compile-time of
StructurizeCFG and also reduces register pressure.

Differential Revision: https://reviews.llvm.org/D123230
2022-06-22 15:44:23 -05:00
Evgenii Stepanov 5011b4ca0e Revert "[Attributor] Ensure to use the proper liveness AA"
Reason: memory leaks

This reverts commit 083010312a.
2022-06-22 13:40:45 -07:00
Florian Mayer 476ced4b89 [MTE] [HWASan] Support diamond lifetimes.
We were overly conservative and required a ret statement to be dominated
completely be a single lifetime.end marker. This is quite restrictive
and leads to two problems:

* limits coverage of use-after-scope, as we degenerate to
  use-after-return;
* increases stack usage in programs, as we have to remove all lifetime
  markers if we degenerate to use-after-return, which prevents
  reuse of stack slots by the stack coloring algorithm.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D127905
2022-06-22 11:16:34 -07:00
Florian Mayer acc9721e38 [NFC] [HWASan] Remove indirection for getting analyses.
This was necessary for code reuse between the old and new passmanager.
With the old pass-manager gone, this is no longer necessary.

Reviewed By: eugenis, myhsu

Differential Revision: https://reviews.llvm.org/D127913
2022-06-22 10:53:20 -07:00
Mingming Liu 67dc8021a1 [Support] Change TrackingStatistic and NoopStatistic to use uint64_t instead of unsigned.
Binary size of `clang` is trivial; namely, numerical value doesn't
change when measured in MiB, and `.data` section increases from 139Ki to
173 Ki.

Differential Revision: https://reviews.llvm.org/D128070
2022-06-22 10:11:40 -07:00
Max Kazantsev cff4f04e2e [LSR] Don't allow zero quotient as scale ref. PR56160
Scale reg should never be zero, so when the quotient is zero, we
cannot assign it there. Limit this transform to avoid this situation.

Differential Revision: https://reviews.llvm.org/D128339
Reviewed By: eopXD
2022-06-22 23:33:57 +07:00
Guillaume Chatelet 57ffff6db0 Revert "[NFC] Remove dead code"
This reverts commit 8ba2cbff70.
2022-06-22 14:55:47 +00:00
Guillaume Chatelet 8ba2cbff70 [NFC] Remove dead code 2022-06-22 13:33:58 +00:00
Florian Hahn 098b0b18a7
[ConstraintElimination] Transfer info from SGE to unsigned system.
This patch adds a new transferToOtherSystem helper that tries to
transfer information from signed predicates to the unsigned system and
vice versa.

The initial version adds A >=u B for A >=s B && B >=s 0

https://alive2.llvm.org/ce/z/8b6F9i
2022-06-22 15:27:59 +02:00
Nikita Popov 1f88d80408 [SCCP] Don't mark edges feasible when resolving undefs
As branch on undef is immediate undefined behavior, there is no need
to mark one of the edges as feasible. We can leave all the edges
non-feasible. In IPSCCP, we can replace the branch with an unreachable
terminator.

Differential Revision: https://reviews.llvm.org/D126962
2022-06-22 10:28:27 +02:00
Florian Hahn ac62b8f704
[ConstraintElimination] Update addFact to take Predicate and ops (NFC).
This allows adding facts without necessarily having a corresponding
CmpInst.
2022-06-22 08:36:41 +02:00
Pavel Samolysov f44bf3805a [DeadArgElim] Reformat the pass in accordance with the code style
The code has been reformatted in accordance with the code style. Some
function comments were extended to the Doxygen ones and reworded a bit
to eliminate the duplication of the function's/class' name in the
comment.

Differential Revision: https://reviews.llvm.org/D128168
2022-06-22 09:13:00 +03:00
chenglin.bi 810b5c471f [NewGVN] add context instruction for SimplifyQuery
NewGVN will find operator from other context. ValueTracking currently doesn't have a way to run completely without context instruction.
So it will use operator itself as conext instruction.
If the operator in another branch will never be executed but it has an assume, it may caused value tracking use the assume to do wrong simpilfy.

It would be better to make these simplification queries not use context at all, but that would require some API changes.
For now we just use the orignial instruction as context instruction to fix the issue.

Fix #56039

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D127942
2022-06-22 12:25:24 +08:00
Serguei Katkov 8f891b7c39 [LoopVectorize] Uninitialized phi node leads to a crash in SSAUpdater.
createInductionResumeValues creates a phi node placeholder
without filling incoming values. Then it generates the incoming values.

It includes triggering of SCEV expander which may invoke SSAUpdater.
SSAUpdater has an optimization to detect number of predecessors
basing on incoming values if there is phi node.
In case phi node is not filled with incoming values - the number of predecessors
is detected as 0 and this leads to segmentation fault.

In other words SSAUpdater expects that phi is in good shape while
LoopVectorizer breaks this requirement.

The fix is just prepare all incoming values first and then build a phi node.

Reviewed By: fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D128033
2022-06-22 10:49:27 +07:00
Johannes Doerfert b7cc3b10c5 [Attributor][FIX] Avoid empty bin in AAPointerInfo
This avoid creating empty bins in AAPointerInfo which can lead to
segfaults. Also ensure we do not try to translate from callee to caller
except if we really take the argument state and move it to the call site
argument state.

Fixes: https://github.com/llvm/llvm-project/issues/55726
2022-06-21 21:30:57 -05:00
Johannes Doerfert 083010312a [Attributor] Ensure to use the proper liveness AA
When determining liveness via Attributor::isAssumedDead(...) we might
end up without a liveness AA or with one pointing into another function.
Neither is helpful and we will avoid both from now on.

Reapplied after fixing the ASAN error which caused the revert:
db68a25ca9
2022-06-21 21:28:26 -05:00
Vasileios Porpodas 7a9ad25769 Recommit "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6d6268dcbf.

Review: https://reviews.llvm.org/D125712
2022-06-21 18:35:29 -07:00
Vasileios Porpodas 6d6268dcbf Revert "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6f88acf410.
2022-06-21 17:07:21 -07:00
Vasileios Porpodas 6f88acf410 [SLP][X86] Improve reordering to consider alternate instruction bundles
During the reordering transformation we should try to avoid reordering bundles
like fadd,fsub because this may block them being matched into a single vector
instruction in x86.
We do this by checking if a TreeEntry is such a pattern and adding it to the
list of TreeEntries with orders that need to be considered.

Differential Revision: https://reviews.llvm.org/D125712
2022-06-21 16:44:48 -07:00
Florian Hahn 88ce403c6a
[LV] Add new block to place recurrence splice, if needed.
In some cases, a recurrence splice instructions needs to be inserted
between to regions, for example if the regions get re-arranged during
sinking.

Fixes #56146.
2022-06-21 21:54:37 +02:00
Heejin Ahn 27e4afcea7 [DSE] Don't remove nounwind invokes
For non-mem-intrinsic and non-lifetime `CallBase`s, the current
`isRemovable` function only checks if the `CallBase` 1. has no uses 2.
will return 3. does not throw:
80fb782336/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp (L1017)

But we should also exclude invokes even in case they don't throw,
because they are terminators and thus cannot be removed. While it
doesn't seem to make much sense for `invoke`s to have an `nounwind`
target, this kind of code can be generated and is also valid bitcode.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128224
2022-06-21 11:54:09 -07:00
Martin Sebor b19194c032 [InstCombine] handle subobjects of constant aggregates
Remove the known limitation of the library function call folders to only
work with top-level arrays of characters (as per the TODO comment in
the code) and allows them to also fold calls involving subobjects of
constant aggregates such as member arrays.
2022-06-21 11:55:14 -06:00
Alexey Bataev d4ee43153d [SLP][NFC]Fix a warning in a comparison, NFC.
Fixed signedness warning.
2022-06-21 10:19:47 -07:00
serge-sans-paille aaf1630ac3 [Scalarizer] No need to gather a scattered extracted element
ExtractElement does not produce a vector out of a vector, so there's no need to
call a gather once done.

Fix #54469

Credits to npopov@redhat.com for the original approach.

Differential Revision: https://reviews.llvm.org/D126012
2022-06-21 18:43:54 +02:00
Arthur Eubanks b5db65e0da Reland [GlobalOpt] Preserve CFG analyses
The only place we modify the CFG is when calling
removeUnreachableBlocks(), so insert a callback there which invalidates
analyses for that function (or recomputes DT in the legacy PM).

We may delete functions, make sure to clear analyses for those
functions. (this was missed in the original revision)

Small compile time wins across the board:
https://llvm-compile-time-tracker.com/compare.php?from=f444ea8ce0aaaa5ec1a4129809389da15cc41396&to=698f41f4fc26cbf1006ed5d88e9d658edfc5b749&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128145
2022-06-21 09:19:59 -07:00
Alexey Bataev f1ee2738b3 [SLP]Fix a crash when insert subvector is out of range.
If the OffsetBeg + InsertVecSz is greater than VecSz, need to estimate
the cost as shuffle of 2 vector, not as insert of subvector. Otherwise,
the inserted subvector is out of range and compiler may crash.

Differential Revision: https://reviews.llvm.org/D128071
2022-06-21 07:16:35 -07:00
Florian Hahn 4ea6891f95
[ConstraintElimination] Remove unneeded StackEntry::Condition (NFC).
The field was only used for debug printing. Print constraint from the
system instead.
2022-06-21 15:57:29 +02:00
Florian Hahn 2a9313ee0b
[ConstraintElimination] Move logic to check condition to helper (NFC). 2022-06-21 11:50:33 +02:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Florian Hahn 6dd772d348
[ConstraintElimination] Move logic to get a constraint to helper (NFC). 2022-06-20 21:34:07 +02:00
Kazu Hirata ad7ce1e769 Don't use Optional::hasValue (NFC) 2022-06-20 11:49:10 -07:00
Kazu Hirata 5413bf1bac Don't use Optional::hasValue (NFC) 2022-06-20 11:33:56 -07:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Arthur Eubanks 13ff7d6f39 Revert "[GlobalOpt] Perform store->dominated load forwarding for stored once globals"
This reverts commit 6f348b146b.

Am seeing internal test failures plus a linux kernel breakage reported due to this.
2022-06-20 10:26:47 -07:00
Arthur Eubanks 1cd2c72bef Revert "[GlobalOpt] Preserve CFG analyses"
This reverts commit cc65f3e167.

Causes crashes: https://github.com/llvm/llvm-project/issues/56131
2022-06-20 10:25:10 -07:00
Guillaume Chatelet 589c8d6fb9 [NFC] Simplify alignment code in MemorySanitizer 2022-06-20 15:15:53 +00:00
Guillaume Chatelet 7296811910 [NFC] Simplify alignment code in CoroFrame 2022-06-20 15:15:52 +00:00