Commit Graph

28315 Commits

Author SHA1 Message Date
Johannes Doerfert 41bd26dff9 [Attributor] Delete dead stores
D106185 allows us to determine if a store is needed easily. Using that
knowledge we can start to delete dead stores.

In AAIsDead we now track more state as an instruction can be dead (= the
old optimisitc state) or just "removable". A store instruction can be
removable while being very much alive, e.g., if it stores a constant
into an alloca or internal global. If we would pretend it was dead
instead of only removablewe we would ignore it when we determine what
values a load can see, so that is not what we want.

Differential Revision: https://reviews.llvm.org/D106188
2021-07-26 23:33:36 -05:00
Johannes Doerfert adddd3dbda [Attributor] Introduce getPotentialCopiesOfStoredValue and use it
This patch introduces `getPotentialCopiesOfStoredValue` which uses
AAPointerInfo to determine all "aliases" or "potential copies" of a
value that is stored into memory. This operation can fail but if it
succeeds it means we can visit all "uses" of a value even if it is
temporarily stored in memory.

There are two users for the function:
  1) `Attributor::checkForAllUses` which will now ignore the value use
     in a store if all "potential copies" can be identified and instead
     be visited. This allows various AAs, including AAPointerInfo
     itself, to look through memory.
  2) `AANoCapture` which uses a custom use tracking through the
     CaptureTracker interface and therefore needs to be thought
     explicitly.

Differential Revision: https://reviews.llvm.org/D106185
2021-07-26 23:33:36 -05:00
Jun Ma 958dddf7df [NFC][InstCombine] Fix typo 2021-07-27 11:33:10 +08:00
Shilei Tian e97e0a4fad [AbstractAttributor] Fold __kmpc_parallel_level if possible
Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible.
Note that `__kmpc_parallel_level` doesn't take activeness into consideration,
based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead
of 0, 129, 130, etc. that also indicate activeness.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106154
2021-07-26 22:46:19 -04:00
Johannes Doerfert be2b569646 [OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass
While rewriteDeviceCodeStateMachine should probably be folded into
buildCustomStateMachine, we at least need the optimization to happen.
This was not reliably the case in the CGSCC pass but in the Module pass
it seems to work reliably.

This also ports a test to the new kernel encoding (target_init/deinit),
and makes sure we cannot run the kernel in SPMD mode.

Differential Revision: https://reviews.llvm.org/D106345
2021-07-26 21:26:07 -05:00
Johannes Doerfert e6f3e648c9 [Attributor][FIX] Do not return CHANGED unconditionally
This caused us to rerun AAMemoryBehaviorFloating::updateImpl over and
over again. Unfortunately it turned out to be hard to reproduce the
behavior in a reasonable way.
2021-07-26 21:22:02 -05:00
Johannes Doerfert 8befd05aad [Attributor][FIX] Track change status for AAIsDead properly
If we add a new live edge we need to indicate a change or otherwise the
new live block is not shown to users. Similarly, new known dead ends and
a changed `ToBeExploredFrom` set need to cause us to return CHANGED.
2021-07-26 21:21:59 -05:00
Roman Lebedev 1901c98dd8
[SimplifyCFG] SwitchToLookupTable(): don't increase ret count
The very next SimplifyCFG pass invocation will tail-merge these two ret's
anyways, there is not much point in creating more work for ourselves.
2021-07-26 23:29:55 +03:00
Roman Lebedev 08efc2e68d
[SimplifyCFG] Drop support for simplifying cond branch to two (different) ret's
Nowadays, simplifycfg pass already tail-merges all the ret blocks together
before doing anything, and it should not increase the count of ret's,
so this is dead code.
2021-07-26 23:29:52 +03:00
Roman Lebedev 7c5f104e45
[SimplifyCFG] Drop support for duplicating ret's into uncond predecessors
This functionality existed only under a default-off flag,
and simplifycfg nowadays prefers to not increase the count of ret's.
2021-07-26 23:29:21 +03:00
Sander de Smalen 13ccb09725 [LV] Don't let ForceTargetInstructionCost override Invalid cost.
Invalid costs can be used to avoid vectorization with a given VF, which is
used for scalable vectors to avoid things that the code-generator cannot
handle. If we override the cost using the -force-target-instruction-cost
option of the LV, we would override this mechanism, rendering the flag useless.

This change ensures the cost is only overriden when the original cost that
was calculated is valid. That allows the flag to be used in combination
with the -scalable-vectorization option.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106677
2021-07-26 20:27:49 +01:00
Reid Kleckner d56e698552 [SimplifyCFG] Remove stale comment after d7378259aa, NFC 2021-07-26 12:25:29 -07:00
Joseph Huber e757a3b05f [OpenMP][NFC] Remove unncessary capture in RAII struct
Summary:
There was an unnecessary variable assigned to the information cache when we
only need it in the constructor to extract the function declaration.
2021-07-26 15:05:55 -04:00
Eli Friedman 5c486ce04d [LLVM IR] Allow volatile stores to trap.
Proposed alternative to D105338.

This is ugly, but short-term I think it's the best way forward: first,
let's formalize the hacks into a coherent model. Then we can consider
extensions of that model (we could have different flavors of volatile
with different rules).

Differential Revision: https://reviews.llvm.org/D106309
2021-07-26 10:51:00 -07:00
Florian Hahn 6d753b0751
[LAA] Remove RuntimeCheckingPtrGroup::RtCheck member (NFC).
This patch removes RtCheck from RuntimeCheckingPtrGroup to make it
possible to construct RuntimeCheckingPtrGroup objects without a
RuntimePointerChecking object. This should make it easier to
re-use the code to generate runtime checks, e.g. in D102834.

RtCheck was only used to access the pointer info for a given index.
Instead, the start and end expressions can be passed directly.

For code-gen, we also need to know the address space to use. This can
also be explicitly passed at construction.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105481
2021-07-26 17:38:10 +01:00
Sander de Smalen b9051ba848 [LV] Remove assert that VF cannot be scalable in setCostBasedWideningDecision.
Scalarization for scalable vectors is not (yet) supported, so the
LV discards a VF when scalarization is chosen as the widening
decision. It should therefore not assert that the VF is not scalable
when it computes the decision to scalarize.

The code can get here when both the interleave-cost, gather/scatter cost
and scalarization-cost are all illegal. This may e.g. happen for SVE
when the VF=1, to avoid generating `<vscale x 1 x eltty>` types that
the code-generator cannot yet handle.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106656
2021-07-26 17:11:45 +01:00
Nikita Popov f921bf6049 [MergeICmps] Collect block instructions once (NFC)
Collect the relevant instructions for a given BCECmpBlock once
on construction, rather than repeating this logic in multiple
places.
2021-07-26 18:07:20 +02:00
Nikita Popov c691651c53 [MergeICmps] Try to fix MSVC build failure
Apparently this fails to line up the types -- try to sidestep the
issue entirely by writing the code in a more reasonable way: Walk
over the operands and perform a set lookup, rather than walking
over the set and performing an operand scan.
2021-07-26 17:31:27 +02:00
Sanjay Patel 87d604ffe4 [SimplifyLibCalls] avoid crash on pointer math
We could try harder to screen out libcalls by
function signature (and that would be a much larger
change than for sprintf alone), but that might make
the transition to type-less pointers more difficult.

https://llvm.org/PR51200
2021-07-26 11:08:45 -04:00
Sanjay Patel d8260269c3 [SimplifyLibCalls] reduce code duplication; NFC 2021-07-26 11:08:45 -04:00
Nikita Popov 0d3807b365 [MergeICmps] Separate out BCECmp and use Optional (NFC)
Separate out the BCECmp part from BCECmpBlock, which just stores
the comparison atoms without the branch instruction. At the same
time switch the code to return Optional<> rather than objects in
invalid state and partially constructed objects.
2021-07-26 17:06:43 +02:00
Sander de Smalen 981e9dce54 [LV] Don't assume isScalarAfterVectorization if one of the uses needs widening.
This fixes an issue that was found in D105199, where a GEP instruction
is used both as the address of a store, as well as the value of a store.
For the former, the value is scalar after vectorization, but the latter
(as value) requires widening.

Other code in that function seems to prevent similar cases from happening,
but it seems this case was missed.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106164
2021-07-26 16:01:55 +01:00
Florian Hahn 7a1e73f0b9
Recommit "[VPlan] Add recipe for first-order rec phis, make splicing explicit."
This reverts the revert commit b1777b04dc.

The patch originally got reverted due to a crash:
https://bugs.chromium.org/p/chromium/issues/detail?id=1232798#c2

The underlying issue was that we were not using the stored values from
the modified memory recipes, but the out-of-date values directly from
the IR (accessed via the VPlan). This should be fixed in d995d6376. A
reduced version of the reproducer has been added in 93664503be.
2021-07-26 15:50:30 +01:00
Nikita Popov 33146857e9 [IR] Consider non-willreturn as side effect (PR50511)
This adjusts mayHaveSideEffect() to return true for !willReturn()
instructions. Just like other side-effects, non-willreturn calls
(aka "divergence") cannot be removed and cannot be reordered relative
to other side effects. This fixes a number of bugs where
non-willreturn calls are either incorrectly dropped or moved. In
particular, it also fixes the last open problem in
https://bugs.llvm.org/show_bug.cgi?id=50511.

I performed a cursory review of all current mayHaveSideEffect()
uses, which convinced me that these are indeed the desired default
semantics. Places that do not want to consider non-willreturn as a
sideeffect generally do not want mayHaveSideEffect() semantics at
all. I identified two such cases, which are addressed by D106591
and D106742. Finally, there is a use in SCEV for which we don't
really have an appropriate API right now -- what it wants is
basically "would this be considered forward progress". I've just
spelled out the previous semantics there.

Differential Revision: https://reviews.llvm.org/D106749
2021-07-26 16:35:14 +02:00
Alexey Bataev 6ca48efcf6 [SLP]Fix costs calculations.
Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.

Differential Revision: https://reviews.llvm.org/D106578
2021-07-26 07:14:03 -07:00
Nikita Popov ffb3277b00 [SimplifyCFG] Improve store speculation check
isSafeToSpeculateStore() looks for a preceding store to the same
location to make sure that introducing a new store of the same
value is safe. It currently bails on intervening mayHaveSideEffect()
instructions. However, I believe just checking mayWriteToMemory()
is sufficient there -- we just need to make sure that we know which
value was stored, we don't care if we can unwind in the meantime.

While looking into this, I started having some doubts about the
correctness of the transform with regard to thread safety. While
we don't try to hoist non-simple stores, I believe we also need
to make sure that the preceding store is simple as well. Otherwise
we could introduce a spurious non-atomic write after an atomic write
-- under our memory model this would result in a subsequent undef
atomic read, even if the second write stores the same value as the
first.

Example: https://alive2.llvm.org/ce/z/q_3YAL

Differential Revision: https://reviews.llvm.org/D106742
2021-07-26 15:01:00 +02:00
Kerry McLaughlin e484e1ae03 [SVE] Fix casts to <FixedVectorType> in truncateToMinimalBitwidths
Fixes more casts to `<FixedVectorType>` for the cases where the
instruction is a Insert/ExtractElementInst.

For fixed-width, this part of truncateToMinimalBitWidths is tested by
AArch64/type-shrinkage-insertelt.ll. I attempted to write a test case for this part
of truncateToMinimalBitWidths which uses scalable vectors, but was unable to add
one. The tests in type-shrinkage-insertelt.ll rely on scalarization to create extract
element instructions for instance, which is not possible for scalable vectors.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106163
2021-07-26 13:44:51 +01:00
Alexey Bataev d7cb2a0796 Revert "[SLP]Fix costs calculations."
This reverts commit a053afed49 to fix
buildbots.
2021-07-26 05:42:34 -07:00
Alexey Bataev a053afed49 [SLP]Fix costs calculations.
Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.

Differential Revision: https://reviews.llvm.org/D106578
2021-07-26 04:37:22 -07:00
Florian Hahn d995d63767
[VPlan] Use stored value from recipes for interleave groups.
Instead of getting the VPValue for the stored IR values through the
current plan, use the stored value of the recipes directly.

This way, the correct VPValues are used if the store recipes have been
modified in the VPlan and the IR value is not correct any longer. This
can happen, e.g. due to D105008.
2021-07-26 12:05:23 +01:00
Dylan Fleming 20b0fa91c9 [SVE] Add support for folding for select + masked loads
Add folds to instcombine to support the removal of select instruction when the masked_load is guaranteed to zero the same lanes, i.e. select(mask, mload(,,mask,0), 0) -> mload(,,mask,0).

Patch originally authored by @paulwalker-arm

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106376
2021-07-26 11:58:41 +01:00
David Sherwood 0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
Roman Lebedev c2dacb1cd3
[SimplifyCFG] Fold branch to common dest: if branch is unpredictable, prefer to speculate
This is consistent with the two other usages of prof md in this pass.
2021-07-26 02:57:19 +03:00
Roman Lebedev 59a5964e03
[SimplifyCFG] Don't speculatively execute BB[s] if they are predictably not taken
Same as D106650, but for `FoldTwoEntryPHINode()`

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D106717
2021-07-26 02:55:15 +03:00
Roman Lebedev e58ce35f7b
[SimplifyCFG] Don't speculatively execute BB if it's predictably not taken
If the branch isn't `unpredictable`, and it is predicted to *not* branch
to the block we are considering speculatively executing,
then it seems counter-productive to execute the code that is predicted not to be executed.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D106650
2021-07-26 02:55:14 +03:00
Nico Weber b1777b04dc Revert "[VPlan] Add recipe for first-order rec phis, make splicing explicit."
Makes clang crash: https://reviews.llvm.org/D105008#2903350
This reverts commit d2a73fb44e.

Also revert a minor formatting follow-up:
This reverts commit 82834a6732.
2021-07-25 17:39:28 -04:00
Joseph Huber 58725c12bb [OpenMP] Introduce RAII to protect certain RTL calls from DCE
This patch introduces a new RAII struct that will temporarily make an OpenMP
RTL function have external linkage. This is done before the attributor is
invoked to prevent it from incorrectly removing some function definitions that
we will use later. For example, if we determine all calls to one function are
dead, because it has internal linkage it can safely be removed. Later when we
try to get an instance to that function to modify the source using
`getOrCreateRuntimeFunction` we will then get an empty declaration for that
function that won't be defined anywhere. This patch prevents this from
occurring.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106707
2021-07-25 14:15:47 -04:00
Nikita Popov 087a8eea35 [Attributes] Clean up handling of UB implying attributes (NFC)
Rather than adding methods for dropping these attributes in
various places, add a function that returns an AttrBuilder with
these attributes, which can then be used with existing methods
for dropping attributes. This is with an eye on D104641, which
also needs to drop them from returns, not just parameters.

Also be more explicit about the semantics of the method in the
documentation. Refer to UB rather than Undef, which is what this
is actually about.
2021-07-25 18:21:13 +02:00
Krishna Kariya 7bd361200a [InstCombine] Fix PR47960 - Incorrect transformation of fabs with nnan flag
Bug Fix for PR: https://llvm.org/PR47960

This patch makes sure that the fast math flag used in the 'select'
instruction is the same as the 'fabs' instruction after the transformation.

Differential Revision: https://reviews.llvm.org/D101727
2021-07-25 10:43:33 -04:00
hyeongyu kim aca5aeb752 [InstCombine] Add freezeAllUsesOfArgument to visitFreeze
In D106041, a freeze was added before the branch condition to solve the miscompilation problem of SimpleLoopUnswitch.
However, I found that the added freeze disturbed other optimizations in the following situations.
```
arg.fr = freeze(arg)
use(arg.fr)
...
use(arg)
```
It is a problem that occurred when arg and arg.fr were recognized as different values.
Therefore, changing to use arg.fr instead of arg throughout the function eliminates the above problem.
Thus, I add a function that changes all uses of arg to freeze(arg) to visitFreeze of InstCombine.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D106233
2021-07-24 18:08:58 +09:00
Kuter Dinel 96709823ec [AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: https://reviews.llvm.org/D104997
2021-07-24 06:07:15 +03:00
Kuter Dinel 0cd964ff25 [Attributor][FIX] checkForAllInstructions, correctly handle declarations
checkForAllInstructions was not handling declarations correctly.
It should have been returning false when it gets called on a declaration

The patch also fixes a test case for AAFunctionReachability for it to be able
to pass after the changes to the checkForAllinstructions.

Differential Revision: https://reviews.llvm.org/D106625
2021-07-24 02:21:29 +03:00
Roman Lebedev 943f85123b
[NFC][SimplifyCFG] Make 'conditional block' handling more straight-forward
This will simplify making use of profile weights
to not perform the speculation when obviously unprofitable.
2021-07-24 00:18:27 +03:00
Roman Lebedev 418dba0606
[NFC][SimplifyCFG] FoldTwoEntryPHINode(): make better use of GetIfCondition() returning dom block 2021-07-24 00:18:26 +03:00
Roman Lebedev 2aa2fdeed9
[NFC][BasicBlockUtils] Refactor GetIfCondition() to return the branch, not it's condition
Otherwise e.g. the FoldTwoEntryPHINode() has to do a lot of legwork
to re-deduce what is the dominant block (i.e. for which block
is this branch the terminator).
2021-07-24 00:18:26 +03:00
Nikita Popov f502683750 [MergeICmps] Relax sinking check
The check for sinking instructions past the load + cmp sequence
currently checks for side-effects, which includes writing to memory
and unwinding. However, I don't believe we care about sinking the
instructions past an unwind (as they don't have any side-effects
themselves).

Differential Revision: https://reviews.llvm.org/D106591
2021-07-23 22:16:11 +02:00
Shilei Tian ae69f46867 [AbstractAttributor] Refine logic to indicate pessimistic fixed point when folding `__kmpc_is_spmd_exec_mode`
Since we are using assumed information now, the logic should be refined to avoid
unncessary assertion.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106630
2021-07-23 13:36:47 -04:00
Giorgis Georgakoudis f97de4cb0b [OpenMPOpt] Move dedup runtime calls after init for target regions
Deduplication in OpenMPOpt finds redundant OpenMP runtime calls and replaces them with a single call placed in the earliest safe location in the IR. When deduplication happens in a target region this patch makes sure replacement calls are put after target_init.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106556
2021-07-23 05:54:01 -07:00
Dawid Jurczak bc536c7101 Revert "[DSE] Transform memset + malloc --> calloc (PR25892)"
This reverts commit 43234b1595.

Reason: We should detect that we are implementing 'calloc' and bail out.
2021-07-23 11:51:59 +02:00
Vitaly Buka fef86a380a [hwasan] Fix uninitialized DisableOptimization 2021-07-23 02:25:33 -07:00
Serge Pavlov 1c64b5dc5e [ConstantFolding] Fold constrained arithmetic intrinsics
Constfold constrained variants of operations fadd, fsub, fmul, fdiv,
frem, fma and fmuladd.

The change also sets up some means to support for removal of unused
constrained intrinsics. They are declared as accessing memory to model
interaction with floating point environment, so they were not removed,
as they have side effect. Now constrained intrinsics that have
"fpexcept.ignore" as exception behavior are removed if they have no uses.
As for intrinsics that have exception behavior other than "fpexcept.ignore",
they can be removed if it is known that they do not raise floating point
exceptions. It happens when doing constant folding, attributes of such
intrinsic are changed so that the intrinsic is not claimed as accessing
memory.

Differential Revision: https://reviews.llvm.org/D102673
2021-07-23 14:39:51 +07:00
Johannes Doerfert 6ca969353c [Attributor] If provided, only look at simplification callbacks not IR
A simplification callback can mean that the IR value is modified beyond
the apparent IR semantics. That is, a `i1 true` could be replaced by an
`i1 false` based on high-level domain-specific information. If a user
provides a simplification callback we will not look at the IR but
instead give up if the callback returns a nullptr.
2021-07-22 23:57:37 -05:00
Giorgis Georgakoudis eaab880e45 [Attributor][Fix] Add overrides for AA2HS analysis 2021-07-22 18:20:14 -07:00
Giorgis Georgakoudis f8c40ed8f8 [OpenMP] Use AAHeapToStack/AAHeapToShared analysis in SPMDization
SPMDization D102307 detects incompatible OpenMP runtime calls to abort converting a target region to SPMD mode. Calls to memory allocation/de-allocation routines kmpc_alloc_shared, kmpc_free_shared are incompatible unless they are removed by AAHeapToStack/AAHeapToShared analysis. This patch extends SPMDization detection to include AAHeapToStack/AAHeapToShared analysis results for enlarging the scope of possible SPMDized regions detected.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105634
2021-07-22 18:08:37 -07:00
Hongtao Yu ab5ac659c8 [CSSPGO] Fix a typo in SampleContextTracker
Fixing a typo in SampleContextTracker to use debug name when debug linkage name is no present. This should only affect C programs.

Saw 0.6% perf win on Cinder which is mostly C code.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D106599
2021-07-22 16:44:50 -07:00
Florian Mayer 96c63492cb [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-22 16:20:27 -07:00
Roman Lebedev d7378259aa
[SimplifyCFG] SimplifyCondBranchToTwoReturns(): really only deal with different ret blocks
This function is called when some predecessor of an empty return block
ends with a conditional branch, with both successors being empty ret blocks.

Now, because of the way SimplifyCFG works, it might happen to simplify
one of the blocks in a way that makes a conditional branch
into an unconditional one, since it's destinations are now identical,
but it might not have actually simplified said conditional branch
into an unconditional one yet.

So, we have to check that ourselves first,
especially now that SimplifyCFG aggressively tail-merges
all ret and resume blocks.

Even if it was an unconditional branch already,
`SimplifyCFGOpt::simplifyReturn()` doesn't call `FoldReturnIntoUncondBranch()`
by default.
2021-07-23 00:36:59 +03:00
Roman Lebedev 7ef6f01909
[SimplifyCFG] FoldTwoEntryPHINode(): bailout on inverted logical and/or (PR51149)
The logical (select) form of and/or will now be a source of problems.
We don't really account for it's inverted form, yet it exists,
and presumably we should treat it just like non-inverted form:
https://alive2.llvm.org/ce/z/BU9AXk

https://bugs.llvm.org/show_bug.cgi?id=51149 reports a reportedly-serious
perf regression that will hopefully be mitigated by this.
2021-07-22 22:19:34 +03:00
Fangrui Song 3b181568db [Matrix] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build after D106457. NFC 2021-07-22 11:33:02 -07:00
Adam Nemet ce5b1320a7 [Matrix] Fix miscompile for NT matmul if the transpose has other use
We should only add the fake lowering entry for the matrix remark if the
transpose is not lowered on its own.  `MapVector::insert` is used to insert
the entry during proper lowering which does not overwrite the fake entry in
the map.

We actually had test coverage for this but the reference output code was
wrong; it was storing undef rather than the transposed column.

Also add an assert that would have caught this.

Differential Revision: https://reviews.llvm.org/D106457
2021-07-22 10:45:56 -07:00
Shilei Tian 1a7f779022 [OpenMPOpt] Add support for BooleanStateWithSetVector
D101977 added `BooleanStateWithPtrSetVector` to store pointers to a set meanwhile
tracking boolean state. One of the limitation is that it can only store pointer.
We might want it to store other types of values, such as integer for parallel
level. This patch generalizes the idea and create `BooleanStateWithSetVector`.
`BooleanStateWithPtrSetVector` therefore becomes a type alias of `BooleanStateWithSetVector`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106149
2021-07-22 13:12:29 -04:00
Kazu Hirata f6413d8aaa [Transforms] Remove getOrCreateInitFunction (NFC)
The last use was removed on Jan 16, 2019 in commit
81101de585.
2021-07-22 06:30:39 -07:00
Caroline Concatto 5a4de84d55 [LoopVectorize] Fix crash for predicated instruction with scalable VF
This patch avoids computing discounts for predicated instructions  when the
VF is scalable.
There is no support for vectorization of loops with division because the
vectorizer cannot guarantee that zero divisions will not happen.

This loop now does not use VF scalable

```
for (long long i = 0; i < n; i++)
    if (cond[i])
      a[i] /= b[i];
```

Differential Revision: https://reviews.llvm.org/D101916
2021-07-22 12:48:27 +01:00
Florian Mayer 789a4a2e5c Revert "[hwasan] Use stack safety analysis."
This reverts commit bde9415fef.
2021-07-22 12:16:16 +01:00
Dawid Jurczak 11338e998d [LoopIdiom] Transform memmove-like loop into memmove (PR46179)
The purpose of patch is to learn Loop idiom recognition pass how to recognize simple memmove patterns
in similar way like GCC: https://godbolt.org/z/fh95e83od
LoopIdiomRecognize already has machinery for memset and memcpy recognition, patch tries to extend exisiting capabilities with minimal effort.

Differential Revision: https://reviews.llvm.org/D104464
2021-07-22 13:05:43 +02:00
Florian Mayer bde9415fef [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-22 12:04:54 +01:00
Simon Pilgrim 1c9bec727a [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.

I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.

https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)

Differential Revision: https://reviews.llvm.org/D106450
2021-07-22 10:58:51 +01:00
Johannes Doerfert 0c0eb76782 [Attributor][FIX] Improve call graph updating
If we remove a non-intrinsic instruction we need to tell the (old) call
graph about it. This caused problems with some features down the line as
they allowed to removed calls more aggressively.
2021-07-22 00:07:56 -05:00
Johannes Doerfert 94d3b59c56 [Attributor][FIX] Do not introduce multiple instances of SSA values
If we have a recursive function we could create multiple instantiations
of an SSA value, one per recursive invocation of the function. This is a
problem as we use SSA value equality in various places. The basic idea
follows from this test:

```
static int r(int c, int *a) {
  int X;
  return c ? r(false, &X) : a == &X;
}

int test(int c) {
  return r(c, undef);
}
```

If we look through the argument `a` we will end up with `X`. Using SSA
value equality we will fold `a == &X` to true and return true even
though it should have been false because `a` and `&X` are from different
instantiations of the function.

Various tests for this  have been placed in value-simplify-instances.ll
and this commit fixes them all by avoiding to produce simplified values
that could be non-unique at runtime. Thus, the result of a simplify
value call will always be unique at runtime or the original value, both
do not allow to accidentally compare two instances of a value with each
other and conclude they are equal statically (pointer equivalence) while
they are unequal at runtime.
2021-07-22 00:07:55 -05:00
Johannes Doerfert c819266ecc [Attributor] Improve the Attributor::getAssumedConstant interface
Similar to Attributor::getAssumedSimplified we need to allow IRPs
directly to get the right simplification callback (and context).
2021-07-22 00:07:55 -05:00
Johannes Doerfert c4b1fe05dd [OpenMP][FIX] Use name + type checks not only name checks for calls
A call that is analyzed in an optimization needs to be verified against
the name and type of the runtime function to avoid that we look at
arguments that do not exist (anymore). This can happen if the signature
was rewritten. Since we will not set RFI.Declaration if the type doesn't
match we can use it (if it's not null) to determine if the signature is
as expected.

Differential Revision: https://reviews.llvm.org/D106341
2021-07-21 22:51:05 -05:00
Johannes Doerfert c7781a0978 [Attributor][NFC] Clang format 2021-07-21 22:51:05 -05:00
Joseph Huber 16206d17cd [OpenMP] Strip NoInline from known OpenMP runtime functions
This patch strips the NoInline attribute from known OpenMP runtime functions.
This is done so that we can denote certain runtime functions as NoInline to
ensure their call sites are intact so they can be checked by OpenMPOpt. We
don't wan't this noinline attribute to remain for any functions after OpenMPOpt
has been run however.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106482
2021-07-21 21:18:26 -04:00
Joseph Huber 196fe994b8 [OpenMP] Fold `__kmpc_is_generic_main_thread_id` if possible
This patch adds the ability to fold `__kmpc_is_generic_main_thread_id` if we
know for a fact that it is executed by the initial thread using
AAExecutionDomain. This combined with folding `__kmpc_is_spmd_exec_mode` will
allow us to fully fold `__kmpc_is_generic_main_thread`.

Depends on D106438 D106437

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106439
2021-07-21 21:18:22 -04:00
Joseph Huber 4a66860424 [OpenMP] Add an option to disable function internalization
Function internalization can sometimes occur in situations where we want to
keep the call sites intact. This patch adds an option to disable function
internalization and prevents the device runtime from being internalized while
creating the bitcode library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106438
2021-07-21 21:18:18 -04:00
Joseph Huber 7d57639264 [OpenMP] Add new execution mode for SPMD execution with Generic semantics
Qualified kernels can be transformed from generic-mode to SPMD mode using an
optimization in OpenMPOpt. This patch introduces a new execution mode to
indicate kernels that have been transformed from generic-mode to SPMD-mode.
These kernels have SPMD-mode execution, but need generic-mode semantics for
scheduling the blocks and threads. Without this far too few blocks will be
scheduled for a generic region as SPMD mode expects the trip count to be
divided by the number of threads.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D106460
2021-07-21 20:57:28 -04:00
Fangrui Song 7b78956224 [sanitizer] Place module_ctor/module_dtor in llvm.used
This removes an abuse of ELF linker behaviors while keeping Mach-O/COFF linker
behaviors unchanged.

ELF: when module_ctor is in a comdat, this patch removes reliance on a linker
abuse (an SHT_INIT_ARRAY in a section group retains the whole group) by using
SHF_GNU_RETAIN. No linker behavior difference when module_ctor is not in a comdat.

Mach-O: module_ctor gets `N_NO_DEAD_STRIP`. No linker behavior difference
because module_ctor is already referenced by a `S_MOD_INIT_FUNC_POINTERS`
section (GC root).

PE/COFF: no-op. SanitizerCoverage already appends module_ctor to `llvm.used`.
Other sanitizers: llvm.used for local linkage is not implemented in
`TargetLoweringObjectFileCOFF::emitLinkerDirectives` (once implemented or
switched to a non-local linkage, COFF can use module_ctor in comdat (i.e.
generalize ELF-specific rL301586)).

There is no object file size difference.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106246
2021-07-21 14:03:26 -07:00
Nikita Popov aa5adc0c1c [SimplifyCFG] Fix if conversion with opaque pointers
We need to make sure that the value types are the same. Otherwise
we both may not have the necessary dereferenceability implication,
nor can we directly form the desired select pattern.

Without opaque pointers this is enforced implicitly through the
pointer comparison.
2021-07-21 22:24:07 +02:00
Sanjay Patel f14495dc75 [SROA] avoid crash on memset with constant expression length
https://llvm.org/PR50888
2021-07-21 15:20:28 -04:00
Giorgis Georgakoudis 3f71b425b2 [Attributor] Preserve BBs and instructions added in AA manifests
Manifesting AbstractAttributes may add new BBs in the IR. This patch provides an interface to register those BBs in the Attributor so that those BBs and containing instructions are not deleted as dead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106383
2021-07-21 11:27:00 -07:00
Giorgis Georgakoudis b0e06e1fc0 [Attributor][NFC] Modify isAssumedHeapToStack for const argument
There is no need for a non-const argument interface and the const argument modification covers existing and upcoming use cases.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106418
2021-07-21 10:28:21 -07:00
Arthur Eubanks 8bc298d041 [NewPM][Inliner] Check if deleted function is in current SCC
In weird cases, the inliner will inline internal recursive functions,
sometimes causing them to have no more uses, in which case the
inliner will mark the function to be deleted. The function is
actually deleted after the call to
updateCGAndAnalysisManagerForCGSCCPass(). In
updateCGAndAnalysisManagerForCGSCCPass(), UR.UpdatedC may be set to
the SCC containing the function to be deleted. Then the inliner calls
CG.removeDeadFunction() which can cause that SCC to be deleted, even
though it's still stored in UR.UpdatedC.

We could potentially check in the wrappers/pass managers if UR.UpdatedC
is in UR.InvalidatedSCCs before doing anything with it, but it's safer
to do this as close to possible to the call to CG.removeDeadFunction()
to avoid issues with allocating a new SCC in the same address as
the deleted one.

It's hard to find a small test case since we need to have recursive
internal functions be reachable from non-internal functions, yet they
need to become non-recursive and not referenced by other functions when
inlined.

Similar to https://reviews.llvm.org/D106306.

Fixes PR50788.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D106405
2021-07-21 08:47:45 -07:00
Kazu Hirata ba2dd12d4f [InstCombine] Remove CreateOverflowTuple (NFC)
The last use was removed On Jun 3, 2020 in commit
2a6c871596.
2021-07-21 07:07:53 -07:00
David Green 72dc5cab4f [LV] Make use of PatternMatchers in getReductionPatternCost. NFC
Pulled out of D106166, this modifies getReductionPatternCost to use
PatternMatchers, hopefully simplifying the code a little.
2021-07-21 11:34:30 +01:00
Rosie Sumpter 44c9adb414 [LoopFlatten][LoopInfo] Use Loop to identify latch compare instruction
Make getLatchCmpInst non-static and use it in LoopFlatten as a more
robust way of identifying the compare.

Differential Revision: https://reviews.llvm.org/D106256
2021-07-21 10:14:18 +01:00
Vitaly Buka a4904ebb88 [NFC][hwasan] Remove "pragma GCC poison"
With ifdefs they make code less readable.
2021-07-20 19:10:05 -07:00
Vitaly Buka cd4d244757 [NFC][hwasan] Simplify expression 2021-07-20 19:10:05 -07:00
Sami Tolvanen e901e581ef Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 700d07f8ce.

Reverting due to a ThinLTO+CFI breakage on -msvc targets.
2021-07-20 13:59:46 -07:00
Fangrui Song 3924877932 [IR] Rename `comdat noduplicates` to `comdat nodeduplicate`
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.

In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics.  The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106319
2021-07-20 12:47:10 -07:00
Nikita Popov 0c794abff1 [ThinTLOBitcodeWriter] Fix unused variable warning (NFC) 2021-07-20 20:19:47 +02:00
Nikita Popov ea014c5bbf [Inline] Fix noalias addition on simplified instructions (PR50589)
When adding noalias/alias.scope metadata, we analyze the instructions
of the original callee, and then place metadata on the corresponding
inlined instructions in the caller as provided by VMap. However, this
assumes that this actually a clone of the instruction, rather than
the result of simplification. If simplification occurred, the
instruction that VMap points to may not have any relationship as far
as ModRef behavior is concerned.

Fix this by tracking simplified instructions during cloning and then
only processing instructions that have not been simplified. This is
done with an additional map form original to cloned instruction,
into which we only insert if no simplification is performed. The
mapping in VMap can then be compared to this map. If they're the
same, the instruction hasn't been simplified. (I originally wanted
to only track a set of simplified instructions, but that wouldn't
work if the instruction only gets simplified afterwards, e.g. based
on rewritten phis.)

Fixes https://bugs.llvm.org/show_bug.cgi?id=50589.

Differential Revision: https://reviews.llvm.org/D106242
2021-07-20 19:52:41 +02:00
Sami Tolvanen 700d07f8ce ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them. This version uses module inline assembly
to avoid issues with LowerTypeTestsModule.

Relands commmit 8e3b5cb39e with arch
specific tests fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-07-20 10:30:02 -07:00
Giorgis Georgakoudis e8439ec893 [OpenMP] Set RequiresFullRuntime false in SPMDization
SPMDization in D102307 does not change the RequiresFullRuntime argument of kmpc_target_init/deinit calls. However, the constraints of SPMDization detection for converting a target region to SPMD mode should guarantee that the region does not require full runtime support. Hence, this patch sets RequiresFullRuntime to false for improved execution performance.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105556
2021-07-20 09:54:51 -07:00
Shimin Cui 0b043bb39b This patch extends the OptimizeGlobalAddressOfMalloc to handle the null check of global pointer variables. It is disabled with https://reviews.llvm.org/rGb7cd291c1542aee12c9e9fde6c411314a163a8ea. This PR is to reenable it while fixing the original problem reported. The fix is to set the store value correctly when creating store for the new created global init bool symbol.
Reviewed By: efriedma

Differential Revision:  https://reviews.llvm.org/D102711
2021-07-20 12:27:26 -04:00
David Green 4272e64acd [LV] Change interface of getReductionPatternCost to return Optional
Currently the Instruction cost of getReductionPatternCost returns an
Invalid cost to specify "did not find the pattern". This changes that to
return an Optional with None specifying not found, allowing Invalid to
mean an infinite cost as is used elsewhere.

Differential Revision: https://reviews.llvm.org/D106140
2021-07-20 16:44:50 +01:00
Kazu Hirata 4a30a5c8d9 [SampleProfile] Remove ProfileIsValid (NFC)
The last use was removed on Jan 22, 2021 in commit
c9cd9a0066.
2021-07-20 08:07:04 -07:00
Caroline Concatto cf78995c4a [NFC][LoopVectorizer] Remove VF.isScalable() assertion from collectInstsToScalarize and getInstructionCost
This patch removes the assertion when VF is scalable and replaces
getKnownMinValue() by getFixedValue(),  so it still guards the code against
scalable vector types.
The assertions were used to guarantee that getknownMinValue were not used for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D106359
2021-07-20 15:56:30 +01:00
Johannes Doerfert d62bbbebbf [Attributor] Initialize effectively unused value to appease UBSAN 2021-07-20 09:18:51 -05:00
Florian Hahn 82834a6732
[VPlan] Fix formatting glitch from d2a73fb44e. 2021-07-20 16:16:30 +02:00
Florian Hahn d2a73fb44e
[VPlan] Add recipe for first-order rec phis, make splicing explicit.
This patch adds a VPFirstOrderRecurrencePHIRecipe, to further untangle
VPWidenPHIRecipe into distinct recipes for distinct use cases/lowering.
See D104989 for a new recipe for reduction phis.

This patch also introduces a new `FirstOrderRecurrenceSplice`
VPInstruction opcode, which is used to make the forming of the vector
recurrence value explicit in VPlan. This more accurately models def-uses
in VPlan and also simplifies code-generation. Now, the vector recurrence
values are created at the right place during VPlan-codegeneration,
rather than during post-VPlan fixups.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D105008
2021-07-20 16:14:17 +02:00
Dawid Jurczak 43234b1595 [DSE] Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-20 11:39:05 +02:00
Florian Mayer 5f08219322 Revert "[hwasan] Use stack safety analysis."
This reverts commit e9c63ed10b.
2021-07-20 10:36:46 +01:00
Florian Mayer e9c63ed10b [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-20 10:06:35 +01:00
Johannes Doerfert c66cbee140 [Attributor] Use set vector instead of vector to prevent duplicates 2021-07-20 01:39:34 -05:00
Johannes Doerfert 5957cf9f11 [Attributor] Simplify to values in the genericValueTraversal
We already simplified to a constant, given the new interface we can also
simplify to a generic value.
2021-07-20 01:39:34 -05:00
Johannes Doerfert 5eba7846a5 [Attributor] Use checkForAllUses instead of custom use tracking
AAMemoryBehaviorFloating used a custom use tracking mechanism even
though checkForAllUses exists and is already more powerful. Further,
AAMemoryBehaviorFloating uses AANoCapture to guarantee that there are no
aliases and following the uses is sufficient. This is an OK assumption
if checkForAllUses is used but custom tracking is easily out of sync
with AANoCapture and problems follow.
2021-07-20 01:39:33 -05:00
Johannes Doerfert b96ea6b1fd [Attributor] Ensure to simplify operands in AAValueConstantRange
As with other patches before, the simplification callback interface
requires us to go through the Attributor::getAssumedSimplified API first
before we recurs.

It is unclear if the problem can be explicitly tested with our current
infrastructure.
2021-07-20 00:35:14 -05:00
Johannes Doerfert 5fbb51d8d5 [Attributor] Extend the AAValueSimplify compare simplification logic
We first simplify the operands of a compare and then reason on the
simplified versions, e.g., with AANonNull.

This does improve the simplification capabilities but also fixes a
potential problem that has not yet been observed by simplifying the
operands first.
2021-07-20 00:35:14 -05:00
Johannes Doerfert 9c00aabd60 [Attributor][NFCI] Expose `getAssumedUnderlyingObjects` API 2021-07-20 00:35:13 -05:00
Johannes Doerfert 5e169818fb [Attributor][NFC] Fix function name spelling 2021-07-20 00:35:13 -05:00
Johannes Doerfert 44a9ee170c [Attributor][FIX] Do not simplify byval arguments
A byval argument is a different value in the caller and callee, we
cannot propagate the information as part of AAValueSimplify. Users that
want to deal with byval arguments need to specifically perform the
argument -> call site step. We do not do this for now.
2021-07-19 22:48:51 -05:00
Johannes Doerfert c2281f1565 [Attributor] Introduce AAPointerInfo
This patch introduces AAPointerInfo which tracks the uses of a pointer
and places them in "bins" based on their offset from the base and access
size.

As with other AAs, any pointer can be tracked but it is up to the user
to make sense of the results. The user in this patch is AAValueSimplify
and AAPotentialValues which both utilize AAPointerInfo to determine the
value of a load. For now, this is restricted to loads of allocas and
internal globals. Through the use of AAPointerInfo and the "bins" we can
track struct members separately. The users also know that storing only
zeros (at unknown indices) will result in loading only 0 (from unknown
indices). Other than that, the users are flow and context insensitive
(for now).

To deal with the "bins" more easily, AAPointerInfo provides a
forallInterfearingAccesses that applies a callback on all accesses
that might interfere with a given load or store.

Differential Revision: https://reviews.llvm.org/D104432
2021-07-19 22:48:35 -05:00
Johannes Doerfert 28c78a9e12 [Attributor] Simplify loads
As a first step to simplify loads we only handle `null` and `undef`
underlying objects, as well as objects that have the load as a single user.
Loads of those values can be replaced by the initializer, if any.
Proper reasoning is introduced in a follow up patch

Differential Revision: https://reviews.llvm.org/D103862
2021-07-19 22:47:29 -05:00
Johannes Doerfert 97387fdf6d [OpenMP] Fix carefully track SPMDCompatibilityTracker
We did not properly use SPMDCompatibilityTracker in various places.
This patch makes sure we look at the validity properly and also fix
the state if we can.

Differential Revision: https://reviews.llvm.org/D106085
2021-07-19 22:47:03 -05:00
Artem Belevich 1a43ee65d1 Revert "[MemCpyOpt] Enable memcpy optimizations unconditionally."
This reverts commit 2c98298a75 which breaks
sanitizers.
2021-07-19 14:27:41 -07:00
Petr Hosek 54902e00d1 [InstrProfiling] Use weak alias for bias variable
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted in which case the symbol selected
by the linker is going to depend on the order of inputs which can be
fragile.

This change replaces the use of weak definition inside the runtime with
a weak alias. We place the compiler generated symbol inside a COMDAT
group so dead definition can be garbage collected by the linker.

We also disable the use of runtime counter relocation on Darwin since
Mach-O doesn't support weak external references, but Darwin already uses
a different continous mode that relies on overmapping so runtime counter
relocation isn't needed there.

Differential Revision: https://reviews.llvm.org/D105176
2021-07-19 12:23:51 -07:00
Artem Belevich b988d69ea2 [infer-address-spaces] Handle complex non-pointer constexpr arguments.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51099

Differential Revision: https://reviews.llvm.org/D106098
2021-07-19 12:15:52 -07:00
Artem Belevich 2c98298a75 [MemCpyOpt] Enable memcpy optimizations unconditionally.
The patch does not depend on the availability of the library functions for
memcpy/memset as it operates on LLVM intrinsics.  The optimizations are useful
on the targets that have these functions disabled (e.g. NVPTX & AMDGPU).

Differential Revision: https://reviews.llvm.org/D104801
2021-07-19 11:58:02 -07:00
maekawatoshiki 74f0f9a455 [LICM] Create LoopNest Invariant Code Motion (LNICM) pass
This patch adds a new pass called LNICM which is a LoopNest version of LICM and a test case to show how LNICM works.
Basically, LNICM only hoists invariants out of loop nest (not a loop) to keep/make perfect loop nest. This enables later optimizations that require perfect loop nest.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D104180
2021-07-20 00:31:18 +09:00
Alexey Bataev d8d8b4574a [SLP]Fix possible crash on unreachable incoming values sorting.
The incoming values for PHI nodes may come from unreachable BasicBlocks,
need to handle this case.

Differential Revision: https://reviews.llvm.org/D106264
2021-07-19 04:54:53 -07:00
Mindong Chen e908e063d1 [LoopUtils] Fix incorrect RT check bounds of loop-invariant mem accesses
This fixes the lower and upper bound calculation of a
RuntimeCheckingPtrGroup when it has more than one loop
invariant pointers. Resolves PR50686.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D104148
2021-07-19 19:38:24 +08:00
Florian Mayer 807d50100c Revert "[hwasan] Use stack safety analysis."
This reverts commit 12268fe14a.
2021-07-19 12:08:32 +01:00
Florian Mayer 12268fe14a [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-19 11:54:44 +01:00
Rosie Sumpter 34d6820551 [LoopFlatten] Use Loop to identify loop induction phi. NFC
Replace code which identifies induction phi with helper function
getInductionVariable to improve robustness.

Differential Revision: https://reviews.llvm.org/D106045
2021-07-19 09:06:57 +01:00
Krishna Kariya da92e86263 [InstCombine] Fold IntToPtr/PtrToInt to bitcast
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support
to specific cases where this optimization works. This patch is the
first one on this line.

Consider the example:

    %i = ptrtoint i8* %X to i64
    %p = inttoptr i64 %i to i16*
    %cmp = icmp eq i8* %load, %p

In this specific case, the inttoptr/ptrtoint optimization is correct
as it only compares the pointer values. In this patch, we fold
inttoptr/ptrtoint to a bitcast (if src and dest types are different).

Differential Revision: https://reviews.llvm.org/D105088
2021-07-18 23:13:25 +02:00
Sanjay Patel c0f2c4ce10 [SimplifyCFG] remove unnecessary state variable; NFC
Keeping a marker for Changed might have made sense before
this code was refactored, but we never touch that variable
after initialization now.
2021-07-18 13:42:22 -04:00
Nikita Popov 59c33a0bc8 [Cloning] Remove unused parameter from CloneAndPruneFunctionInto() (NFC) 2021-07-18 18:38:06 +02:00
Sanjay Patel 0e15de2d0c [InstCombine] fold reassociative FP add into start value of fadd reduction
This pattern is visible in unrolled and vectorized loops.
Although the backend seems to be able to reassociate to
ideal form in the examples I looked at, we might as well
do that in IR for efficiency.
2021-07-18 06:26:20 -04:00
Shilei Tian d3454ee8d2 [AbstractAttributor] Fix two issues in folding __kmpc_is_spmd_exec_mode
This patch fixed two issues found when folding `__kmpc_is_spmd_exec_mode`:
1. When the reaching kernels are empty, it should not fold to generic mode.
2. When creating AA for the caller when updating information, the dependency
   should be required.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D106209
2021-07-17 13:13:44 -04:00
Wenlei He f9f3c34e0f [CSSPGO] Turn on iterative-BFI for CSSPGO
Iterative-BFI produces better count quality and performance when evaluated on internal benchmarks. Turning it on by default now for CSSPGO. We can consider turn it on by default for AutoFDO as well in the future.

Differential Revision: https://reviews.llvm.org/D106202
2021-07-16 17:35:49 -07:00
Sami Tolvanen 0ad1d9fdf2 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 8e3b5cb39e.

Reverting to investigate test failures.
2021-07-16 14:47:33 -07:00
Sami Tolvanen 8e3b5cb39e ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them. This version uses module inline assembly
to avoid issues with LowerTypeTestsModule.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-07-16 14:33:34 -07:00
Alexey Bataev da3dbfcacf [SLP]Improve calculations of the cost for reused/reordered scalars.
Part of D105020. Also, fixed FIXMEs that need to use wider vector type
when trying to calculate the cost of reused scalars. This may cause
regressions unless D100486 is landed to improve the cost estimations
for long vectors shuffling.

Differential Revision: https://reviews.llvm.org/D106060
2021-07-16 13:40:15 -07:00
Alexey Bataev 1b18e9ab67 [PATCH] D105827: [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-16 12:59:08 -07:00
Joseph Huber b910a109f8 [OpenMP][NFC] Update the comment header for optimizations. 2021-07-16 14:13:13 -04:00
Joseph Huber 2c31d5ebfb [OpenMP] Add IDs to OpenMP remarks
This patch adds unique idenfitiers to the existing OpenMP remarks. This makes
it easier to identify the corresponding documentation for each remark that will
be hosted in the OpenMP webpage.

Depends on D105898

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105939
2021-07-16 14:07:03 -04:00
Joseph Huber eef6601b0f [OpenMP] Rework OpenMP remarks
This patch rewrites and reworks a few of the existing remarks to make the mmore
concise and consistent prior to writing the documentation for them.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105898
2021-07-16 14:07:00 -04:00
Congzhe Cao a7b7d22d6e [LoopInterchange] Check lcssa phis in the inner latch in scenarios of multi-level nested loops
We already know that we need to check whether lcssa
phis are supported in inner loop exit block or in
outer loop exit block, and we have logic to check
them already. Presumably the inner loop latch does
not have lcssa phis and there is no code that deals
with lcssa phis in the inner loop latch. However,
that assumption is not true, when we have loops
with more than two-level nesting. This patch adds
checks for lcssa phis in the inner latch.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102300
2021-07-16 11:59:20 -04:00
Kerry McLaughlin 49d73130ca [LV] Avoid scalable vectorization for loops containing alloca
This patch returns an Invalid cost from getInstructionCost() for alloca
instructions if the VF is scalable, as otherwise loops which contain
these instructions will crash when attempting to scalarize the alloca.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105824
2021-07-16 11:47:13 +01:00
Sander de Smalen 239d01fa88 Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs."
The original patch was:
  https://reviews.llvm.org/D105806

There were some issues with undeterministic behaviour of the sorting
function, which led to scalable-call.ll passing and/or failing. This
patch fixes the issue by numbering all instructions in the array first,
and using that number as the order, which should provide a consistent
ordering.

This reverts commit a607f64118.
2021-07-16 10:52:01 +01:00
Max Kazantsev f98ed74f69 [LSR] Handle case 1*reg => reg. PR50918
This patch addresses assertion failure in case when the only found formula for LSR
is `1*reg => reg` which was supposed to be an impossible situation, however there
is a test that shows it is possible.

In this case, we can use scale register with scale of 1 as the missing base register.

Reviewed By: huihuiz, reames
Differential Revision: https://reviews.llvm.org/D105009
2021-07-16 11:33:59 +07:00
Shilei Tian c23da666b5 [Attributor] Add support for compound assignment for ChangeStatus
A common use of `ChangeStatus` is as follows:
```
ChangeStatus Changed = ChangeStatus::UNCHANGED;
Changed |= foo();
```
where `foo` returns `ChangeStatus` as well. Currently `ChangeStatus` doesn't
support compound assignment, we have to write as
```
Changed = Changed | foo();
```
which is not that convenient.

This patch add the support for compound assignment for `ChangeStatus`. Compound
assignment is usually implemented as a member function, and binary arithmetic
operator is therefore implemented using compound assignment. However, unlike
regular C++ class, enum class doesn't support member functions. As a result, they
can only be implemented in the way shown in the patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106109
2021-07-15 23:51:46 -04:00
Vitaly Buka bba8a76b87 [NFC][hwasan] Remove default arguments in internal class 2021-07-15 15:28:02 -07:00
Shilei Tian ca662297d5 [AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-15 18:23:23 -04:00
Sanjay Patel 81ce3aa30c [SLP] avoid leaking poison in reduction of safe boolean logic ops
This bug was introduced with D105730 / 25ee55c0ba .

If we are not converting all of the operations of a reduction
into a vector op, we need to preserve the existing select form
of the remaining ops. Otherwise, we are potentially leaking
poison where it did not in the original code.

Alive2 agrees that the version that freezes some inputs
and then falls back to scalar is correct:
https://alive2.llvm.org/ce/z/erF4K2
2021-07-15 17:33:06 -04:00
Simon Pilgrim 0a614ca225 Fix "unknown pragma 'GCC'" MSVC warning. NFCI. 2021-07-15 18:50:19 +01:00
Arthur Eubanks 99cb2507f3 Revert "[SLP]Workaround for InsertSubVector cost."
This reverts commit 2eb50baf05.

Causes hangs, see comments on D105827.
2021-07-15 10:19:41 -07:00
Nikita Popov c191035f42 [IR] Add elementtype attribute
This implements the elementtype attribute specified in D105407. It
just adds the attribute and the specified verifier rules, but
doesn't yet make use of it anywhere.

Differential Revision: https://reviews.llvm.org/D106008
2021-07-15 18:04:26 +02:00
Arthur Eubanks 04b75c05b0 [InstCombine] Look through invariant group intrinsics when removing malloc
Fixes some regressions with -fstrict-vtable-pointers in llvm-test-suite.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D106017
2021-07-15 09:02:40 -07:00
Philip Reames 95346ba877 [LV] Enable vectorization of multiple exit loops w/computable exit counts
This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch.

The majority of work to support this was done in a set of previous patches. In particular,, 72314466 avoids having multiple edges from the middle block to the exits, and 4b33b2387 which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit.

Differential Revision: https://reviews.llvm.org/D105817
2021-07-15 08:53:51 -07:00
Shilei Tian a70ef3f568 Revert "[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible"
This reverts commit 1100e4aafe.
2021-07-15 11:19:28 -04:00
Sander de Smalen a607f64118 Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs."
This reverts commit efaf3099c8.
This reverts commit dc7bdc1e71.

Reverting patches due to buildbot failures.
2021-07-15 15:21:57 +01:00
Roman Lebedev 3e6c383dc6
[SimplifyCFG] Rerun PHI deduplication after common code sinkinkg (PR51092)
`SinkCommonCodeFromPredecessors()` doesn't itself ensure that duplicate PHI nodes aren't created.
I suppose, we could teach it to do that on-the-fly (& account for the already-existing PHI nodes,
& adjust costmodel), the diff will be bigger than this.

The alternative is to schedule a new EarlyCSE pass invocation somewhere later in the pipeline.
Clearly, we don't have any EarlyCSE runs in module optimization passline, so this pattern isn't cleaned up...
That would perhaps better, but it will again have some compile time impact.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106010
2021-07-15 16:34:34 +03:00
Sander de Smalen dc7bdc1e71 [LV] Fix determinism for failing scalable-call.ll test.
The sort function for emitting an OptRemark was not deterministic,
which caused scalable-call.ll to fail on some buildbots. This patch
fixes that.

This patch also fixes an issue where `Instruction::comesBefore()`
is called when two Instructions are in different basic blocks,
which would otherwise cause an assertion failure.
2021-07-15 13:16:59 +01:00
Stephen Tozer 47633af9d4 Reapply "[DebugInfo] Enable variadic debug value salvaging"
Reapplied after previous build failures were fixed in 14b62f7e2.

This reverts commit 540b4a5fb3.
2021-07-15 12:54:51 +01:00
Simon Pilgrim 944f39f38d [InstCombine] Strip inbounds from (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) fold
As discussed on rGd561b6fbdbe6, we can't guarantee that the new gep is inbounds
2021-07-15 12:19:10 +01:00
Ilya Leoshkevich 3845f2cd94 [TSan] Use zeroext for function parameters
SystemZ ABI requires zero-extending function parameters to 64-bit. The
compiler is free to optimize the code around this assumption, e.g.
failing to zero-extend __tsan_atomic32_load()'s morder may cause
crashes in to_mo() switch table lookup.

Fix by adding zeroext attributes to TSan's FunctionCallees, similar to
how it was done in commit 3bc439bdff ("[MSan] Add instrumentation for
SystemZ"). This is a no-op on arches that don't need it.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D105629
2021-07-15 12:18:47 +02:00
Florian Mayer 0ed1747a92 [NFC] [hwasan] Split argument logic into functions.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105971
2021-07-15 10:45:43 +01:00
Chuanqi Xu 8a1727ba51 [Coroutines] Run coroutine passes by default
This patch make coroutine passes run by default in LLVM pipeline. Now
the clang and opt could handle IR inputs containing coroutine intrinsics
without special options.
It should be fine. On the one hand, the coroutine passes seems to be stable
since there are already many projects using coroutine feature.
On the other hand, the coroutine passes should do nothing for IR who doesn't
contain coroutine intrinsic.

Test Plan: check-llvm

Reviewed by: lxfind, aeubanks

Differential Revision: https://reviews.llvm.org/D105877
2021-07-15 14:33:40 +08:00
Kuter Dinel ade190c5ea [Attributor] AACallEdges, Add a way to ask nonasm unknown callees
This patch adds a feature to AACallEdges AbstractAttribute that allows
users to ask if there is a unknown callee that isn't a inline assembly.
This feature is needed by some of it's users.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105992
2021-07-15 06:10:42 +03:00
Jon Roelofs d143103068 [GlobalOpt] Fix a miscompile when evaluating struct initializers.
The bug was that evaluateBitcastFromPtr attempts a narrowing to a struct's 0th
element of a store that covers other elements. While this is okay on the load
side, applying it to stores causes us to miss the writes to the additionally
covered elements.

rdar://79503568

Differential revision: https://reviews.llvm.org/D105838
2021-07-14 15:37:01 -07:00
Arthur Eubanks 5366de7375 [SimpleLoopUnswitch] Don't non-trivially unswitch loops with catchswitch exits
SplitBlock() can't handle catchswitch.

Fixes PR50973.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D105672
2021-07-14 14:07:28 -07:00
Alexey Bataev ba2690b17b [SLP][NFC]Fix variables names, NFC. 2021-07-14 12:43:45 -07:00
Simon Pilgrim 4fd0addb68 [SLP] Fix case of variable name. NFCI. 2021-07-14 20:20:04 +01:00
Sanjay Patel ca6e117d86 [InstCombine] reorder icmp with offset folds for better results
This set of folds was added recently with:
c7b658aeb5
0c400e8953
40b752d28d

...and I noted that this wasn't likely to fire in code derived
from C/C++ source because of nsw in particular. But I didn't
notice that I had placed the code above the no-wrap block
of transforms.

This is likely the cause of regressions noted from the previous
commit because -- as shown in the test diffs -- we may have
transformed into a compare with an arbitrary constant rather
than a simpler signbit test.
2021-07-14 12:12:05 -04:00
Sander de Smalen efaf3099c8 [LV] Print remark when loop cannot be vectorized due to invalid costs.
This patch emits remarks for instructions that have invalid costs for
a given set of vectorization factors. Some example output:

  t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
      dst[i] = sinf(src[i]);
                    ^
  t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32
      dst[i] = sinf(src[i]);
               ^
  t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
      dst[i] = sinf(src[i]);
             ^

Reviewed By: fhahn, kmclaughlin

Differential Revision: https://reviews.llvm.org/D105806
2021-07-14 17:11:33 +01:00
Alexey Bataev 2eb50baf05 [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-14 07:54:24 -07:00
Sanjay Patel 25ee55c0ba [SLP] match logical and/or as reduction candidates
This has been a work-in-progress for a long time...we finally have all of
the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312

To do this (see PhaseOrdering tests), we converted SimplifyCFG and
InstCombine to the poison-safe (select) forms of the logic ops, so now we
need to have SLP recognize those patterns and insert a freeze op to make
a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah

We get the minimal patterns with this patch, but the PhaseOrdering tests
show that we still need adjustments to get the ideal IR in some or all of
the motivating cases.

Differential Revision: https://reviews.llvm.org/D105730
2021-07-14 09:02:31 -04:00
Simon Pilgrim d561b6fbdb [InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) (REAPPLIED)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Reapplied with a fix for the bpf "-bpf-disable-avoid-speculation" tests

Differential Revision: https://reviews.llvm.org/D105901
2021-07-14 12:21:01 +01:00
Chuanqi Xu 12d04ce956 [NFC] [Coroutines] Remove unused CoroFree 2021-07-14 19:13:12 +08:00
Simon Pilgrim 0722f3d0fa Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)"
Missed some BPF test changes that need addressing
2021-07-14 11:48:37 +01:00
Stephen Tozer 810e4c3c66 [DebugInfo] Correctly update dbg.values with duplicated location ops
This patch fixes code that incorrectly handled dbg.values with duplicate
location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in
question were caused by either applying an update to dbg.value multiple
times when the update is only valid once, or by updating the
DIExpression for only the first instance of a value that appears
multiple times.

Differential Revision: https://reviews.llvm.org/D105831
2021-07-14 11:17:24 +01:00
Simon Pilgrim b803294cf7 [InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Differential Revision: https://reviews.llvm.org/D105901
2021-07-14 10:49:12 +01:00
Shilei Tian 1100e4aafe [AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-13 22:28:35 -04:00
Arthur Eubanks 0024ec59a0 [NewPM][SimpleLoopUnswitch] Add option to not trivially unswitch
To help with debugging non-trivial unswitching issues.

Don't care about the legacy pass, nobody is using it.

If a pass's string params are empty (e.g. "simple-loop-unswitch"), don't
default to the empty constructor for the pass params. We should still
let the parser take care of it in case the parser has its own defaults.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D105933
2021-07-13 16:09:42 -07:00
Eli Friedman 5d1ba53404 [LoopReroll] Add an extra defensive check to avoid SCEV assertion.
Make sure getMinusSCEV() didn't return a pointer.  The following check
would never succeed if it was a pointer, anyway, but calling
getMulExpr() on a pointer SCEV now asserts.
2021-07-13 12:17:09 -07:00
Arthur Eubanks 489742991f [NFC] Inline variable to prevent unused variable warning 2021-07-13 09:57:59 -07:00
Arthur Eubanks ab5693aa4a [OpaquePtr] Use byval type more 2021-07-13 09:34:34 -07:00
Arthur Eubanks 113a807977 [OpaquePtr] Get load/store type without PointerType::getElementType() 2021-07-13 09:34:34 -07:00
Arthur Eubanks 693bc04bf6 [OpaquePtr] Use GlobalValue::getValueType() more 2021-07-13 09:34:34 -07:00
Simon Pilgrim b2f6cf1479 [InstCombine] Fold lshr/ashr(or(neg(x),x),bw-1) --> zext/sext(icmp_ne(x,0)) (PR50816)
Handle the missing fold reported in PR50816, which is a variant of the existing ashr(sub_nsw(X,Y),bw-1) --> sext(icmp_sgt(X,Y)) fold.

We also handle the lshr(or(neg(x),x),bw-1) --> zext(icmp_ne(x,0)) equivalent - https://alive2.llvm.org/ce/z/SnZmSj

We still allow multi uses of the neg(x) - as this is likely to let us further simplify other uses of the neg - but not multi uses of the or() which would increase instruction count.

Differential Revision: https://reviews.llvm.org/D105764
2021-07-13 14:44:54 +01:00
Jeroen Dobbelaere 1d8030053d [NFC] Do not track calls to inlined intrinsics in IFI.
Just like intrinsics are not tracked for IFI.InlinedCalls, they should not be tracked for IFI.InlinedCallSites.

In the current top-of-tree this change is a NFC, but the full restrict patches (D68484) potentially trigger an read-after-free
if intrinsics are also added to the InlindeCallSites, due to a late optimization potentially removing some of the inlined intrinsics.

Also see https://lists.llvm.org/pipermail/llvm-dev/2021-July/151722.html for a discussion about the problem.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105805
2021-07-13 10:18:23 +02:00
hyeongyu kim e338d08ae6 [SimplifyCFG] Fix SimplifyBranchOnICmpChain to be undef/poison safe.
This patch fixes the problem of SimplifyBranchOnICmpChain that occurs
when extra values are Undef or poison.

Suppose the %mode is 51 and the %Cond is poison, and let's look at the
case below.
```
	%A = icmp ne i32 %mode, 0
	%B = icmp ne i32 %mode, 51
	%C = select i1 %A, i1 %B, i1 false
	%D = select i1 %C, i1 %Cond, i1 false
	br i1 %D, label %T, label %F
=>
	br i1 %Cond, label %switch.early.test, label %F
switch.early.test:
	switch i32 %mode, label %T [
		i32 51, label %F
		i32 0, label %F
	]
```
incorrectness: https://alive2.llvm.org/ce/z/BWScX

Code before transformation will not raise UB because %C and %D is false,
and it will not use %Cond. But after transformation, %Cond is being used
immediately, and it will raise UB.

This problem can be solved by adding freeze instruction.

correctness: https://alive2.llvm.org/ce/z/x9x4oY

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104569
2021-07-13 15:35:18 +09:00
Arthur Eubanks 0e6424acbd [OpaquePointers][ThreadSanitizer] Cleanup calls to PointerType::getElementType()
Reviewed By: #opaque-pointers, dblaikie

Differential Revision: https://reviews.llvm.org/D105710
2021-07-12 20:46:08 -07:00
Nikita Popov 7ed3e87825 [Attributes] Determine attribute properties from TableGen data
Continuing from D105763, this allows placing certain properties
about attributes in the TableGen definition. In particular, we
store whether an attribute applies to fn/param/ret (or a combination
thereof). This information is used by the Verifier, as well as the
ForceFunctionAttrs pass. I also plan to use this in LLParser,
which also duplicates info on which attributes are valid where.

This keeps metadata about attributes in one place, and makes it
more likely that it stays in sync, rather than in various
functions spread across the codebase.

Differential Revision: https://reviews.llvm.org/D105780
2021-07-12 22:13:38 +02:00
Nikita Popov 6ac32872ee [Attributes] Replace doesAttrKindHaveArgument() (NFC)
This is now the same as isIntAttrKind(), so use that instead, as
it does not require manual maintenance. The naming is also more
accurate in that both int and type attributes have an argument,
but this method was only targeting int attributes.

I initially wanted to tighten the AttrBuilder assertion, but we
have some in-tree uses that would violate it.
2021-07-12 21:57:26 +02:00
Sanjay Patel a488c7879e [InstCombine] reduce signbit test of logic ops to cmp with zero
This is the pattern from the description of:
https://llvm.org/PR50816

There might be a way to generalize this to a smaller or more
generic pattern, but I have not found it yet.

https://alive2.llvm.org/ce/z/ShzJoF

define i1 @src(i8 %x) {
  %add = add i8 %x, -1
  %xor = xor i8 %x, -1
  %and = and i8 %add, %xor
  %r = icmp slt i8 %and, 0
  ret i1 %r
}

define i1 @tgt(i8 %x) {
  %r = icmp eq i8 %x, 0
  ret i1 %r
}
2021-07-12 09:01:26 -04:00
Yevgeny Rouban 88024a724c [RS4GC] Use one DVCache for both inlineGetBaseAndOffset() and insertParsePoints()
This new test demonstrates a case where a base ptr is generated
twice for the same value: the first one is generated while
the gc.get.pointer.base() is inlined, the second is generated
for the statepoint. This happens because the methods
inlineGetBaseAndOffset() and insertParsePoints() do not share
their defining value cache used by the findBasePointer() method.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D103240
2021-07-12 18:13:00 +07:00
Sander de Smalen d2e4ccc790 [LV] Ignore candidate VFs with invalid costs.
This follows on from discussion on the mailing-list:
  https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html

to interpret an Invalid cost as 'infinitely expensive', as this
simplifies some of the legalization issues with scalable vectors.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105473
2021-07-12 09:58:22 +01:00
Liqiang Tao 6ebeb7f8ba [llvm][Inliner] Templatize PriorityInlineOrder
The patch templatize PriorityInlinerOrder so that it can accept any type priority metric.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D104972
2021-07-12 10:38:22 +08:00
Johannes Doerfert 792aac9897 [Attributor][NFCI] Add UsedAssumedInformation to more interfaces
As with other Attributor interfaces we often want to know if assumed
information was used to answer a query. This is important if only
known information is allowed or if known information can lead to an
early fixpoint. The users have been adjusted but none of them utilizes
the new information yet.
2021-07-11 19:18:03 -05:00
Eli Friedman 6144085c29 [IndVars] Don't widen pointers in WidenIV::getWideRecurrence
It's not a reasonable transform, and calling getSignExtendExpr() on a
pointer hits an assertion.
2021-07-11 17:04:50 -07:00
Florian Hahn c6e4c1fbd8
[VPlan] Remove default arg from getVPValue (NFC).
The const version of VPValue::getVPValue still had a default value for
the value index. Remove the default value and use getVPSingleValue
instead, which is the proper function.
2021-07-11 22:03:09 +02:00
Craig Topper e38b7e8948 [DivRemPairs] Add an initial case for hoisting to a common predecessor.
This patch adds support for hoisting the division and maybe the
remainder for control flow graphs like this.

```
PredBB
  |  \
  |  Rem
  |  /
 Div
```

If we have DivRem we'll hoist both to PredBB. If not we'll just
hoist Div and expand Rem using the Div.

This improves our codegen for something like this

```
__uint128_t udivmodti4(__uint128_t dividend, __uint128_t divisor, __uint128_t *remainder) {
    if (remainder != 0)
        *remainder = dividend % divisor;
    return dividend / divisor;
}
```

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D87555
2021-07-11 10:03:07 -07:00
hyeongyu kim 1a5f4cbe1b [InstCombine] Add optimization to prevent poison from being propagated.
In D104569, Freeze was inserted just before br to solve the `branching on undef` miscompilation problem.
But value analysis was being disturbed by added freeze.

```
v = load ptr
cond = freeze(icmp (and v, const), const')
br cond, ...
```
The case in which value analysis disturbed is as above.
By changing freeze to add immediately after load, value analysis will be successful again.

```
v = load ptr
freeze(icmp (and v, const), const')
=>
v = load ptr
v' = freeze v
icmp (and v', const), const'
```
In this patch, I propose the above optimization.
With this patch, the poison will not spread as the freeze is performed early.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D105392
2021-07-11 12:40:43 +09:00
Johannes Doerfert 2e7e2994a9 [Attributor][FIX] Destroy bump allocator objects to avoid leaks
AllocationInfo and DeallocationInfo objects themselves are allocated
with the Attributor bump allocator and do not need to be deallocated.
That said, the sets in AllocationInfo and DeallocationInfo need to be
destroyed to avoid memory leaks.
2021-07-10 18:53:37 -05:00
Johannes Doerfert 514c033db1 [OpenMP] Detect SPMD compatible kernels and execute them as such
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.

The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D102307
2021-07-10 18:44:25 -05:00
Johannes Doerfert 8cb7d71355 [OpenMP][FIX] Add missing `)` to remark 2021-07-10 18:40:32 -05:00
Johannes Doerfert d9659bf6a0 [OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D101977
2021-07-10 17:57:08 -05:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Johannes Doerfert 4761d29633 [Attributor][FIX] Sanitize queries to LVI and ScalarEvolution
When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need
to be careful with the query. The particular error occurred because we
folded a PHI node before the LVI query but the context location was now
not dominated by the value anymore. This is not supported by LVI so we
have to filter these situations before we query the outside analyses.
2021-07-10 16:45:19 -05:00
Johannes Doerfert c1d53a316d [Attributor] Look through selects in genericValueTraversal
If we can simplify the select condition we can avoid one value in the
traversal.

Differential Revision: https://reviews.llvm.org/D103861
2021-07-10 16:44:05 -05:00
Johannes Doerfert c1c1fe9385 [Attributor] Reorganize AAHeapToStack
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.

This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.

Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.

Differential Revision: https://reviews.llvm.org/D104993
2021-07-10 16:32:24 -05:00
Johannes Doerfert dbb3a65f5b [Attributor][FIX] Do not replace a value with a non-dominating instruction
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
2021-07-10 16:09:30 -05:00
Johannes Doerfert 5ef18e2421 [Attributor] Use AAValueSimplify to simplify returned values
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.

AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.

Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.

Differential Revision: https://reviews.llvm.org/D103860
2021-07-10 15:52:36 -05:00
Johannes Doerfert 0aab13aaf9 [Attributor] Introduce an optimistic getUnderlyingObjects helper
As the `llvm::getUnderlyingObjects` helper, the optimistic version
collects objects that might be the base of a given pointer. In contrast
to the llvm variant, the optimistic one will use assumed information,
e.g., about select conditions or dead blocks, to provide a more precise
result.

Differential Revision: https://reviews.llvm.org/D103859
2021-07-10 15:47:30 -05:00
Johannes Doerfert 5b12cf3e65 [Attributor][FIX] Traverse uses even if a value is assumed constant
Not all attributes are able to handle the interprocedural step and
follow the uses into a call site. Let them be able to combine call site
uses instead. This might result in some unused values/arguments being
leftover but it removes problems where we misused "is dead" even though
it was actually "is simplified/replaced".

We explicitly check for dead values due to constant propagation in
`AAIsDeadValueImpl::areAllUsesAssumedDead` instead.

Differential Revision: https://reviews.llvm.org/D103858
2021-07-10 15:47:20 -05:00
Nico Weber d3e7491333 Revert Attributor patch series
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
2021-07-10 16:15:55 -04:00
Johannes Doerfert 269416d419 [Attributor][NFCI] Add UsedAssumedInformation to more interfaces
As with other Attributor interfaces we often want to know if assumed
information was used to answer a query. This is important if only
known information is allowed or if known information can lead to an
early fixpoint. The users have been adjusted but none of them utilizes
the new information yet.
2021-07-10 12:32:51 -05:00
Johannes Doerfert d39179d7fa [OpenMP] Detect SPMD compatible kernels and execute them as such
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.

The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D102307
2021-07-10 12:32:51 -05:00
Johannes Doerfert 966342790e [Attributor][FIX] Sanitize queries to LVI and ScalarEvolution
When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need
to be careful with the query. The particular error occurred because we
folded a PHI node before the LVI query but the context location was now
not dominated by the value anymore. This is not supported by LVI so we
have to filter these situations before we query the outside analyses.
2021-07-10 12:32:51 -05:00
Johannes Doerfert ae08df87df [Attributor][FIX] Do not replace a value with a non-dominating instruction
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
2021-07-10 12:32:50 -05:00
Johannes Doerfert f0628c6ff7 [OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D101977
2021-07-10 12:32:50 -05:00
Johannes Doerfert 1d5711c3ee [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 12:32:50 -05:00
Johannes Doerfert 5003ba2542 [Attributor] Look through selects in genericValueTraversal
If we can simplify the select condition we can avoid one value in the
traversal.

Differential Revision: https://reviews.llvm.org/D103861
2021-07-10 12:32:50 -05:00
Johannes Doerfert 1eb31d6de3 [Attributor] Reorganize AAHeapToStack
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.

This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.

Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.

Differential Revision: https://reviews.llvm.org/D104993
2021-07-10 12:32:50 -05:00
Johannes Doerfert 374e573cfc [Attributor] Use AAValueSimplify to simplify returned values
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.

AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.

Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.

Differential Revision: https://reviews.llvm.org/D103860
2021-07-10 12:32:50 -05:00
Johannes Doerfert 93a279a67d [Attributor] Introduce an optimistic getUnderlyingObjects helper
As the `llvm::getUnderlyingObjects` helper, the optimistic version
collects objects that might be the base of a given pointer. In contrast
to the llvm variant, the optimistic one will use assumed information,
e.g., about select conditions or dead blocks, to provide a more precise
result.

Differential Revision: https://reviews.llvm.org/D103859
2021-07-10 12:32:49 -05:00
Johannes Doerfert be5d46e9bb [Attributor][FIX] Traverse uses even if a value is assumed constant
Not all attributes are able to handle the interprocedural step and
follow the uses into a call site. Let them be able to combine call site
uses instead. This might result in some unused values/arguments being
leftover but it removes problems where we misused "is dead" even though
it was actually "is simplified/replaced".

We explicitly check for dead values due to constant propagation in
`AAIsDeadValueImpl::areAllUsesAssumedDead` instead.

Differential Revision: https://reviews.llvm.org/D103858
2021-07-10 12:32:49 -05:00
Sander de Smalen 239fcda268 [LV] NFCI: Do cost comparison on InstructionCost directly.
Instead of performing the isMoreProfitable() operation on
InstructionCost::CostTy the operation is performed on InstructionCost
directly, so that it can handle the case where one of the costs is
Invalid.

This patch also changes the CostTy to be int64_t, so that the type is
wide enough to deal with multiplications with e.g. `unsigned MaxTripCount`.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105113
2021-07-10 11:57:16 +01:00
Eli Friedman 9c4baf5101 [ScalarEvolution] Strictly enforce pointer/int type rules.
Rules:

1. SCEVUnknown is a pointer if and only if the LLVM IR value is a
   pointer.
2. SCEVPtrToInt is never a pointer.
3. If any other SCEV expression has no pointer operands, the result is
   an integer.
4. If a SCEVAddExpr has exactly one pointer operand, the result is a
   pointer.
5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other
   pointer operands, the result is a pointer.
6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a
   pointer.
7. Otherwise, the SCEV expression is invalid.

I'm not sure how useful rule 6 is in practice.  If we exclude it, we can
guarantee that ScalarEvolution::getPointerBase always returns a
SCEVUnknown, which might be a helpful property. Anyway, I'll leave that
for a followup.

This is basically mop-up at this point; all the changes with significant
functional effects have landed.  Some of the remaining changes could be
split off, but I don't see much point.

Differential Revision: https://reviews.llvm.org/D105510
2021-07-09 17:29:26 -07:00
Valery N Dmitriev 8e9216fe87 [SLP] Do not make an attempt to match reduction on already erased instruction.
Differential Revision: https://reviews.llvm.org/D105752
2021-07-09 17:13:15 -07:00
Kazu Hirata 49d66d9f9f [AFDO] Merge function attributes after inlining
This patch teaches the sample profile loader to merge function
attributes after inlining functions.

Without this patch, the compiler could inline a function requiring the
512-bit vector width into its caller without merging function
attributes, triggering a failure during instruction selection.

Differential Revision: https://reviews.llvm.org/D105729
2021-07-09 16:47:12 -07:00
Sanjay Patel c2b7f09d8c [SLP] make invalid operand explicit for extra arg in reduction matching; NFC
This makes it clearer when we have encountered the extra arg.
Also, we may need to adjust the way the operand iteration
works when handling logical and/or.
2021-07-09 15:32:12 -04:00
Varun Gandhi 92dcb1d2db [Clang] Introduce Swift async calling convention.
This change is intended as initial setup. The plan is to add
more semantic checks later. I plan to update the documentation
as more semantic checks are added (instead of documenting the
details up front). Most of the code closely mirrors that for
the Swift calling convention. Three places are marked as
[FIXME: swiftasynccc]; those will be addressed once the
corresponding convention is introduced in LLVM.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D95561
2021-07-09 11:50:10 -07:00
Arthur Eubanks 9a7afae492 [OpaquePtr][InferAddrSpace] Use PointerType::getWithSamePointeeType() 2021-07-09 10:29:08 -07:00
Arthur Eubanks 4e6013250d [NFC][OpaquePtr] Use GlobalValue::getValueType() more
Instead of getType()->getElementType().
2021-07-09 09:55:41 -07:00
Sanjay Patel 486992f958 [SLP] improve code comments; NFC
This likely started out only supporint binops,
but now we handle min/max using cmp+sel, and
we may extend to handle bool logic in the form
of select.
2021-07-09 12:49:54 -04:00
Sanjay Patel 544f2711bb [SLP] make checks for cmp+select min/max more explicit
This is NFC-intended currently (so no test diffs). The motivation
is to eventually allow matching for poison-safe logical-and and
logical-or (these are in the form of a select-of-bools).
( https://llvm.org/PR41312 )

Those patterns will not have all of the same constraints as min/max
in the form of cmp+sel. We may also end up removing the cmp+sel
min/max matching entirely (if we canonicalize to intrinsics), so
this will make that step easier.
2021-07-09 12:43:43 -04:00
Arthur Eubanks e4f66a1055 [OpaquePointers][CallPromotion] Don't look at pointee type for byval
byval's type parameter is now always required.
2021-07-09 09:34:05 -07:00
Nico Weber 97c675d3d4 Revert "Revert "Temporarily do not drop volatile stores before unreachable""
This reverts commit 52aeacfbf5.
There isn't full agreement on a path forward yet, but there is agreement that
this shouldn't land as-is.  See discussion on https://reviews.llvm.org/D105338

Also reverts unreviewed "[clang] Improve `-Wnull-dereference` diag to be more in-line with reality"
This reverts commit f4877c78c0.

And all the related changes to tests:
This reverts commit 9a0152799f.
This reverts commit 3f7c9cc274.
This reverts commit 329f8197ef.
This reverts commit aa9f58cc2c.
This reverts commit 2df37d5ddd.
This reverts commit a72a441812.
2021-07-09 11:44:34 -04:00
Kevin P. Neal 52900486a1 [FPEnv][InstSimplify] Constrained FP support for NaN
Currently InstructionSimplify.cpp knows how to simplify floating point
instructions that have a NaN operand. It does not know how to handle the
matching constrained FP intrinsic.

This patch teaches it how to simplify so long as the exception handling
is not "fpexcept.strict".

Differential Revision: https://reviews.llvm.org/D103169
2021-07-09 11:26:28 -04:00
Roman Lebedev 4e332cd41a
Revert "Transform memset + malloc --> calloc (PR25892)"
It broke `check-msan`, see e.g. https://lab.llvm.org/buildbot/#/builders/18/builds/1934

This reverts commit 375694a07b.
2021-07-09 16:26:48 +03:00
Roman Lebedev 52aeacfbf5
Revert "Temporarily do not drop volatile stores before unreachable"
This reverts commit 4e413e1621,
which landed almost 10 months ago under premise that the original behavior
didn't match reality and was breaking users, even though it was correct as per
the LangRef. But the LangRef change still hasn't appeared, which might suggest
that the affected parties aren't really worried about this problem.

Please refer to discussion in:
* https://reviews.llvm.org/D87399 (`Revert "[InstCombine] erase instructions leading up to unreachable"`)
* https://reviews.llvm.org/D53184 (`[LangRef] Clarify semantics of volatile operations.`)
* https://reviews.llvm.org/D87149 (`[InstCombine] erase instructions leading up to unreachable`)

clang has `-Wnull-dereference` which will diagnose the obvious cases
of null dereference, it was adjusted in f4877c78c0,
but it will only catch the cases where the pointer is a null literal,
it will not catch the cases where an arbitrary store is expected to trap.

Differential Revision: https://reviews.llvm.org/D105338
2021-07-09 14:16:54 +03:00
Max Kazantsev 9c5e65691e [LoopDeletion] Handle switch in proving that loop exits on first iteration
Added check for switch-terminated blocks in loops.
Now if a block is terminated with a switch, we try to find out which of the
cases is taken on 1st iteration and mark corresponding edge from the block
to the case successor as live.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D105688
Reviewed By: nikic, mkazantsev
2021-07-09 18:03:34 +07:00
David Green 38c9a4068d [TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.

This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Differential Revision: https://reviews.llvm.org/D105484
2021-07-09 11:51:16 +01:00
Dawid Jurczak 375694a07b Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-09 11:01:12 +02:00
Bjorn Pettersson 472462c472 [NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg'
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.

This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.

I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105627
2021-07-09 09:47:03 +02:00
Reshabh Sharma 2e194dec60 [ASan][AMDGPU] Make shadow offset match X86 on Linux
This patch explicitly sets the shadow offset for
AMDGPU to match that of X86 on Linux.

Reviewed By: vitalybuka

https://reviews.llvm.org/D105282
2021-07-09 07:48:03 +05:30
Alexey Bataev 8af69975af [InstCombine][NFC]Use only `replaceInstUsesWith`, NFC. 2021-07-08 13:58:30 -07:00
David Blaikie 1def2579e1 PR51018: Remove explicit conversions from SmallString to StringRef to future-proof against C++23
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
2021-07-08 13:37:57 -07:00
Vitaly Buka 915e07605c [msan] Handle funnel shifts
Fixes https://bugs.llvm.org/show_bug.cgi?id=50840

Differential Revision: https://reviews.llvm.org/D105387
2021-07-08 12:49:49 -07:00
Alexey Bataev c574d2fbac [SLP]Improve vectorization of stores.
Patch tries to improve the vectorization of stores. Originally, we just
check the type and the base pointer of the store.
Patch adds some extra checks to avoid non-profitable vectorization
cases. It includes analysis of the scalar values to be stored and
triggers the vectorization attempt only if the scalar values have
same/alt opcode and are from same basic block, i.e. we don't end up
immediately with the gather node, which is not profitable.
This also improves compile time by filtering out non-profitable cases.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D104122
2021-07-08 12:35:39 -07:00
Alexey Bataev 0d74fd3fdf [SLP][COST][X86]Improve cost model for masked gather.
Revived D101297 in its original form + added some changes in X86
legalization cehcking for masked gathers.

This solution is the most stable and the most correct one. We have to
check the legality before trying to build the masked gather in SLP.
Without this check we have incorrect cost (for SLP) in case if the masked gather
is not legal/slower than the gather. And we're missing some
vectorization opportunities.

This can be fixed in the cost model, but in this case we need to add
special checks for the cost of GEPs for ScatterVectorize node, add
special check for small trees, etc., i.e. there are a lot of corner
cases here and there, which insrease code base and make it harder to
maintain the code.

> Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed.

The question from D101297. Actually, no, it can't. Actually, simple
gather may give us better result, especially after we started
vectorization of insertelements. Plus, like I said before, the cost for
non-legal masked gathers leads to missed vectorization opportunities.

Differential Revision: https://reviews.llvm.org/D105042
2021-07-08 11:53:30 -07:00
Alexey Bataev b5113bff46 [Instcombine]Transform reduction+(sext/zext(<n x i1>) to <n x im>) to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im.
Some of the SPEC tests end up with reduction+(sext/zext(<n x i1>) to <n x im>) pattern, which can be transformed to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im.
Also, reduction+(<n x i1>) can be transformed to ctpop(bitcast <n x i1> to in) & 1 != 0.

Differential Revision: https://reviews.llvm.org/D105587
2021-07-08 07:56:41 -07:00
Michael Liao 4e5d9c8803 [Internalize] Preserve variables externally initialized.
- ``externally_initialized`` variables would be initialized or modified
  elsewhere. Particularly, CUDA or HIP may have host code to initialize
  or modify ``externally_initialized`` device variables, which may not
  be explicitly referenced on the device side but may still be used
  through the host side interfaces. Not preserving them triggers the
  elimination of them in the GlobalDCE and breaks the user code.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D105135
2021-07-08 10:48:19 -04:00
Nikita Popov 84c15bc018 [SCEVExpander] Support opaque pointers
This adds support for opaque pointers to expandAddToGEP() by always
generating an i8 GEP for opaque pointers. After looking at some other
cases (constexpr GEP folding, SROA GEP generation), I've come around
to the idea that we should use i8 GEPs for opaque pointers, because
the alternative would be to guess a GEP type from surrounding code,
which will not be reliable. Ultimately, i8 GEPs is where we want to
end up anyway, and opaque pointers just make that the natural choice.

There are a couple of other places in SCEVExpander that check pointer
element types, I plan to update those when I run across usable test
coverage that doesn't assert elsewhere.

Differential Revision: https://reviews.llvm.org/D105398
2021-07-07 20:47:59 +02:00
Sanjay Patel 97c473ad39 [SLP] rename variable to not be misleading; NFC
The reduction matching was probably only dealing with binops
when it was written, but we have now generalized it to handle
select and intrinsics too, so assert on that too.
2021-07-07 14:40:21 -04:00
Arnold Schwaighofer 2937f8d148 [coro async] Cap the alignment of spilled values (vs. allocas) at the max frame alignment
Before this patch we would normally use the ABI alignment which can be
to high for the context alginment.

For spilled values we don't need ABI alignment, since the frame entry's
address is not escaped.

rdar://79664965

Differential Revision: https://reviews.llvm.org/D105288
2021-07-07 08:06:25 -07:00
Philip Reames 723144665b [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4)
Resubmit after the following changes:

* Fix a latent bug related to unrolling with required epilogue (see e49d65f). I believe this is the cause of the prior PPC buildbot failure.
* Disable non-latch exits for epilogue vectorization to be safe (9ffa90d)
* Split out assert movement (600624a) to reduce churn if this gets reverted again.

Previous commit message (try 3)

Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-07-07 07:44:35 -07:00
Dylan Fleming 7215dcfe36 [SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths
Depends on D104239

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105341
2021-07-07 15:30:10 +01:00
Arnold Schwaighofer 033de11150 [coro async] Move code to proper switch
While upstreaming patches this code somehow was applied to the wrong switch statement.

Differential Revision: https://reviews.llvm.org/D105504
2021-07-07 06:19:08 -07:00
Max Kazantsev 19885c7adf [NFC] Remove duplicate function calls
Removed repeated call of L->getHeader(). Now using previously stored return value.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D105535
Reviewed By: mkazantsev
2021-07-07 17:02:36 +07:00
Dylan Fleming 7586b47fb6 [SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104239
2021-07-07 09:58:05 +01:00
Johannes Doerfert 168a9234d7 [Attributor][FIX] Replace uses first, then values
Before we replaced value by registering all their uses. However, as we
replace a value old uses become stale. We now replace values explicitly
and keep track of "new values" when doing so to avoid replacing only
uses in stale/old values but not their replacements.
2021-07-06 22:43:51 -05:00
Johannes Doerfert aa3768278d [Attributor] Introduce a helper function to deal with undef + none
We often need to deal with the value lattice that contains none and
undef as special values. A simple helper makes this much nicer.

Differential Revision: https://reviews.llvm.org/D103857
2021-07-06 22:41:21 -05:00
Johannes Doerfert fc82409b5c [Attributor] Simplify operands inside of simplification AAs first
When we do simplification via AAPotentialValues or AAValueConstantRange
we need to simplify the operands of an instruction we deconstruct first.
This does not only improve the result, see for example range.ll, but is
required as we allow outside AAs to provide simplification rules via
callbacks. If we do ignore the simplification rules and base other
simplifications on the IR instead we can create an inconsistent state.
2021-07-06 22:41:18 -05:00
Eli Friedman 7ac1c7bead Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Recommitting with fix to MemoryDepChecker::isDependent.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 12:16:05 -07:00
Eli Friedman a6d081b2cb Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers."
This reverts commit 74d6ce5d5f.

Seeing crashes on buildbots in MemoryDepChecker::isDependent.
2021-07-06 11:17:13 -07:00
Philip Reames 9ffa90d6c2 [LV] Disable epilogue vectorization for non-latch exits
When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed.  Better late (4 months), than never right?

I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error.  Thus, let's play it safe for now.
2021-07-06 10:57:10 -07:00
Philip Reames 600624a103 [LoopVersion] Move an assert [nfc-ish] 2021-07-06 10:57:10 -07:00
Eli Friedman 74d6ce5d5f [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 10:54:41 -07:00
Arnold Schwaighofer 846a530e7d Fix coro lowering of single predecessor phis
Code assumes that uses of single predecessor phis are not live accross
suspend points. Cleanup any single predecessor phis preceeding the code
making this assumption.

rdar://76020301

Differential Revision: https://reviews.llvm.org/D105488
2021-07-06 10:22:25 -07:00
Alexey Bataev 4e1a0684f1 [SLP]Fix non-determinism in PHI sorting.
Compare type IDs and DFS numbering for basic block instead of addresses
to fix non-determinism.

Differential Revision: https://reviews.llvm.org/D105031
2021-07-06 08:45:45 -07:00
Arnold Schwaighofer 130ea3ceb4 Use swift mangling for resume functions
The resume partial functions generated for swift suspend points will now
use a Swift mangling suffix.

Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_")
and suspend resume partial functions will use the suffix 'TY'[0-9]+'_'
(e.g "...TY1_").

Reviewed By: nate_chandler

Differential Revision: https://reviews.llvm.org/D104144
2021-07-06 08:27:46 -07:00
Florian Hahn ef0d147cdc
Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups.
This reverts commit 706bbfb35b.

The committed version moves the definition of VPReductionPHIRecipe out
of an ifdef only intended for ::print helpers. This should resolve the
build failures that caused the revert
2021-07-06 14:15:42 +01:00
Kerry McLaughlin a7512401e5 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Florian Hahn 706bbfb35b
Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups
This reverts commit 3fed6d443f,
bbcbf21ae6 and
6c3451cd76.

The changes causing build failures with certain configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio

    lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction*, llvm::ArrayRef<llvm::VPValue*>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]':
    LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe'
    collect2: error: ld returned 1 exit status
2021-07-06 12:10:03 +01:00
Florian Hahn 3fed6d443f
[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual.
VPReductionRecipe overrides those implementations. Mark them as virtual
in the VPWidenPHIRecipe to unbreak build in certain configurations.
2021-07-06 12:00:41 +01:00
Florian Hahn bbcbf21ae6
[VPlan] Add destructor to VPReductionRecipe to unbreak build.
Attempt to unbreak
https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio
2021-07-06 11:41:20 +01:00
Florian Hahn 6c3451cd76
[VPlan] Add VPReductionPHIRecipe (NFC).
This patch is a first step towards splitting up VPWidenPHIRecipe into
separate recipes for the 3 distinct cases they model:

    1. reduction phis,
    2. first-order recurrence phis,
    3. pointer induction phis.

This allows untangling the code generation and allows us to reduce the
reliance on LoopVectorizationCostModel during VPlan code generation.

Discussed/suggested in D100102, D100113, D104197.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104989
2021-07-06 11:25:28 +01:00
Kerry McLaughlin 17b701c43c [LV] Collect a list of all element types found in the loop (NFC)
Splits `getSmallestAndWidestTypes` into two functions, one of which now collects
a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not
have to iterate over all instructions in the loop again in other places, such as in D102253
which disables scalable vectorization of a loop if any of the instructions use invalid types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105437
2021-07-06 10:37:41 +01:00
Akira Hatanaka 28fe9afdba [ObjC][ARC] Prevent moving objc_retain calls past objc_release calls
that release the retained object

This patch fixes what looks like a longstanding bug in ARC optimizer
where it reverses the order of objc_retain calls and objc_release calls
that retain and release the same object.

The code in ARC optimizer that is responsible for code motion takes the
following steps:

1. Traverse the CFG bottom-up and determine how far up objc_release
   calls can be moved. Determine the insertion points for the
   objc_release calls, but don't actually move them.
2. Traverse the CFG top-down and determine how far down objc_retain
   calls can be moved. Determine the insertion points for the
   objc_retain calls, but don't actually move them.
3. Try to move the objc_retain and objc_release calls if they can't be
   removed.

The problem is that the insertion points for the objc_retain calls are
determined in step 2 without taking into consideration the insertion
points for objc_release calls determined in step 1, so the order of an
objc_retain call and an objc_release call can be reversed, which is
incorrect, even though each step is correct in isolation.

To fix this bug, this patch teaches the top-down traversal step to take
into consideration the insertion points for objc_release calls
determined in the bottom-up traversal step. Code motion for an
objc_retain call is disabled if there is a possibility that it can be
moved past an objc_release call that releases the retained object.

rdar://79292791

Differential Revision: https://reviews.llvm.org/D104953
2021-07-05 12:16:15 -07:00
Sanjay Patel 40b752d28d [InstCombine] fold icmp slt/sgt of offset value with constant
This follows up patches for the unsigned siblings:
0c400e8953
c7b658aeb5

We are translating an offset signed compare to its
unsigned equivalent when one end of the range is
at the limit (zero or unsigned max).

(X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1)
(X + C2) <s C --> X >u (C ^ SMAX) (if C == C2)

This probably does not show up much in IR derived
from C/C++ source because that would likely have
'nsw', and we have folds for that already.

As with the previous unsigned transforms, the folds
could be generalized to handle non-constant patterns:

https://alive2.llvm.org/ce/z/Y8Xrrm

  ; sgt
  define i1 @src(i8 %a, i8 %c) {
    %c2 = add i8 %c, 1
    %t = add i8 %a, %c2
    %ov = icmp sgt i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c) {
    %c_off = sub i8 127, %c ; SMAX
    %ov = icmp ult i8 %a, %c_off
    ret i1 %ov
  }

https://alive2.llvm.org/ce/z/c8uhnk

  ; slt
  define i1 @src(i8 %a, i8 %c) {
    %t = add i8 %a, %c
    %ov = icmp slt i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c) {
    %c_offnot = xor i8 %c, 127 ; SMAX
    %ov = icmp ugt i8 %a, %c_offnot
    ret i1 %ov
  }
2021-07-05 10:08:31 -04:00
Caroline Concatto b868a2d2c6 [SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector.
The function vectorizeChainsInBlock does not support scalable vector,
because function like canReuseExtract and isCommutative in the code
path assert with scalable vectors.

This patch avoids vectorizing blocks that have extract instructions with scalable
vector..

Differential Revision: https://reviews.llvm.org/D104809
2021-07-05 12:43:41 +01:00
Stephen Tozer 14b62f7e2f [DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.

This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.

Differential Revision: https://reviews.llvm.org/D105129
2021-07-05 10:35:19 +01:00
Nikita Popov a213f735d8 [IR] Deprecate GetElementPtrInst::CreateInBounds without element type
This API is not compatible with opaque pointers, the method
accepting an explicit pointer element type should be used instead.

Thankfully there were few in-tree users. The BPF case still ends
up using the pointer element type for now and needs something like
D105407 to avoid doing so.
2021-07-04 16:49:30 +02:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Nikita Popov fabc17192e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Roman Lebedev fc150cecd7
[SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable
This replaces the current ad-hoc implementation,
by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`,
with one exception that here in SimplifyCFG we are allowed to remove EH instructions.

Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return),
arithmetic/logic operations, etc.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D105374
2021-07-03 10:45:44 +03:00
Fangrui Song 252a1eecc0 [ThinLTO] Respect ClearDSOLocalOnDeclarations for unimported functions
D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for
isDeclarationForLinker `GlobalValue`s. It missed a case for imported
declarations (`doImportAsDefinition` is false while `isPerformingImport` is
true). This can lead to a linker error for a default visibility symbol in
`ld.lld -shared`.

When `ClearDSOLocalOnDeclarations` is true, we check
`isPerformingImport() && !doImportAsDefinition(&GV)` along with
`GV.isDeclarationForLinker()`. The new condition checks an imported declaration.

This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104986
2021-07-02 17:08:25 -07:00
Roman Lebedev 53fef0b293
[NFCI][SimplifyCFG] simplifyUnreachable(): Use poison constant to represent the result of unreachable instrs
Mimics similar change for InstCombine:
  ce192ced2b / D104602

All these uses are in blocks that aren't reachable from function's entry,
and said blocks are removed by SimplifyCFG itself,
so we can't really test this change.
2021-07-02 22:11:52 +03:00
Heejin Ahn 51fecd17bb [InstCombine] Don't combine PHI before catchswitch
This tries to bail out if the PHI is in a `catchswitch` BB in
InstCombine. A PHI cannot be combined into a non-PHI instruction if it
is in a `catchswitch` BB, because `catchswitch` BB cannot have any
non-PHI instruction other than `catchswitch` itself.

The given test case started crashing after D98058.

Reviewed By: lebedev.ri, rnk

Differential Revision: https://reviews.llvm.org/D105309
2021-07-02 12:10:24 -07:00
Roman Lebedev da81ec6158
[SimplifyCFG] Volatile memory operations do not trap
Somewhat related to D105338.
While it is up for discussion whether or not volatile store traps,
so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap.
And even if simplifycfg currently concervatively believes that to be the case,
instcombine does not: https://godbolt.org/z/5vhv4K5b8

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D105343
2021-07-02 21:47:44 +03:00
Alexey Bataev 7f7e4aed21 [SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by
V.Dmitriev.

Reduces number of arguments
2021-07-02 10:30:13 -07:00
Jon Roelofs 37b6e03c18 [Intrinsics] Make MemCpyInlineInst a MemCpyInst
This opens up more optimization opportunities in passes that already handle MemCpyInst's.

Differential revision: https://reviews.llvm.org/D105247
2021-07-02 10:25:24 -07:00
Roman Lebedev 13e35ac124
[NFC][InstCombine] visitUnreachableInst(): enhance comments somewhat 2021-07-02 17:30:01 +03:00
Roman Lebedev dadedc99e9
[InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable
In the original review D87149 it was mentioned that this approach was tried,
and it lead to infinite combine loops, but i'm not seeing anything like that now,
neither in the `check-llvm`, nor on some codebases i tried.

This is a recommit of d9d65527c2,
which i immediately reverted because i have messed up something
during branch switch, and 597ccc92ce
accidentally ended up being pushed, which was very much not the intention.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D105339
2021-07-02 17:20:21 +03:00
Roman Lebedev 24d271bb18
Revert "https://godbolt.org/z/5vhv4K5b8"
This reverts commit 597ccc92ce.
2021-07-02 17:17:55 +03:00
Roman Lebedev 93a1642763
Revert "[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable"
This reverts commit d9d65527c2.
2021-07-02 17:17:47 +03:00
Roman Lebedev d9d65527c2
[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable
In the original review D87149 it was mentioned that this approach was tried,
and it lead to infinite combine loops, but i'm not seeing anything like that now,
neither in the `check-llvm`, nor on some codebases i tried.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D105339
2021-07-02 17:17:03 +03:00
Roman Lebedev 597ccc92ce
https://godbolt.org/z/5vhv4K5b8 2021-07-02 17:16:19 +03:00
Nico Weber a92964779c Revert "[InstrProfiling] Use external weak reference for bias variable"
This reverts commit 33a7b4d9d8.
Breaks check-profile on macOS, see comments on https://reviews.llvm.org/D105176
2021-07-02 09:05:12 -04:00
Florian Hahn a3ca578eb9
[Matrix] Fix crash during fusion if the same load is re-used.
This patch fixes a crash when the same load is used for both operands of
a fuseable multiply.
2021-07-02 14:00:17 +01:00
Alexey Bataev 28ac873bcb [SLP]Fix gathering of the scalars by not ignoring UndefValues.
The compiler should not ignore UndefValue when gathering the scalars,
otherwise the resulting code may be less defined than the original one.
Also, grouped scalars to insert them at first to reduce the analysis in
further passes.

Differential Revision: https://reviews.llvm.org/D105275
2021-07-02 04:46:48 -07:00
Florian Hahn 7655061cc6
[Matrix] Hoist address computation before multiply to enable fusion.
If the store address does not dominate the matrix multiply, try to hoist
address computation instructions without side-effects and/or memory
reads before the multiply, to allow fusion.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D105193
2021-07-02 09:52:11 +01:00
Evgeniy Brevnov 9568811cb8 [NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure
With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D101044
2021-07-02 10:00:47 +07:00
Craig Topper 066524ea54 [ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0.
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.

Noticed while looking at the code for something else so I don't
have a test case.

Differential Revision: https://reviews.llvm.org/D105220
2021-07-01 19:08:47 -07:00
Petr Hosek 33a7b4d9d8 [InstrProfiling] Use external weak reference for bias variable
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted which leads to an issue where the
linker might choose either of the weak symbols potentially disabling the
runtime counter relocation.

This change replaces the use of weak definition inside the runtime with
an external weak reference to address the issue. We also place the
compiler generated symbol inside a COMDAT group so dead definition can
be garbage collected by the linker.

Differential Revision: https://reviews.llvm.org/D105176
2021-07-01 15:25:31 -07:00
Philip Reames 955f125899 [instcombine] Fold overflow check using overflow intrinsic to comparison
This follows up to D104665 (which added umulo handling alongside the existing uaddo case), and generalizes for the remaining overflow intrinsics.

I went to add analogous handling to LVI, and discovered that LVI already had a more general implementation. Instead, we can port was LVI does to instcombine. (For context, LVI uses makeExactNoWrapRegion to constrain the value 'x' in blocks reached after a branch on the condition `op.with.overflow(x, C).overflow`.)

Differential Revision: https://reviews.llvm.org/D104932
2021-07-01 09:41:55 -07:00