Commit Graph

28315 Commits

Author SHA1 Message Date
Dawid Jurczak 43234b1595 [DSE] Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-20 11:39:05 +02:00
Florian Mayer 5f08219322 Revert "[hwasan] Use stack safety analysis."
This reverts commit e9c63ed10b.
2021-07-20 10:36:46 +01:00
Florian Mayer e9c63ed10b [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-20 10:06:35 +01:00
Johannes Doerfert c66cbee140 [Attributor] Use set vector instead of vector to prevent duplicates 2021-07-20 01:39:34 -05:00
Johannes Doerfert 5957cf9f11 [Attributor] Simplify to values in the genericValueTraversal
We already simplified to a constant, given the new interface we can also
simplify to a generic value.
2021-07-20 01:39:34 -05:00
Johannes Doerfert 5eba7846a5 [Attributor] Use checkForAllUses instead of custom use tracking
AAMemoryBehaviorFloating used a custom use tracking mechanism even
though checkForAllUses exists and is already more powerful. Further,
AAMemoryBehaviorFloating uses AANoCapture to guarantee that there are no
aliases and following the uses is sufficient. This is an OK assumption
if checkForAllUses is used but custom tracking is easily out of sync
with AANoCapture and problems follow.
2021-07-20 01:39:33 -05:00
Johannes Doerfert b96ea6b1fd [Attributor] Ensure to simplify operands in AAValueConstantRange
As with other patches before, the simplification callback interface
requires us to go through the Attributor::getAssumedSimplified API first
before we recurs.

It is unclear if the problem can be explicitly tested with our current
infrastructure.
2021-07-20 00:35:14 -05:00
Johannes Doerfert 5fbb51d8d5 [Attributor] Extend the AAValueSimplify compare simplification logic
We first simplify the operands of a compare and then reason on the
simplified versions, e.g., with AANonNull.

This does improve the simplification capabilities but also fixes a
potential problem that has not yet been observed by simplifying the
operands first.
2021-07-20 00:35:14 -05:00
Johannes Doerfert 9c00aabd60 [Attributor][NFCI] Expose `getAssumedUnderlyingObjects` API 2021-07-20 00:35:13 -05:00
Johannes Doerfert 5e169818fb [Attributor][NFC] Fix function name spelling 2021-07-20 00:35:13 -05:00
Johannes Doerfert 44a9ee170c [Attributor][FIX] Do not simplify byval arguments
A byval argument is a different value in the caller and callee, we
cannot propagate the information as part of AAValueSimplify. Users that
want to deal with byval arguments need to specifically perform the
argument -> call site step. We do not do this for now.
2021-07-19 22:48:51 -05:00
Johannes Doerfert c2281f1565 [Attributor] Introduce AAPointerInfo
This patch introduces AAPointerInfo which tracks the uses of a pointer
and places them in "bins" based on their offset from the base and access
size.

As with other AAs, any pointer can be tracked but it is up to the user
to make sense of the results. The user in this patch is AAValueSimplify
and AAPotentialValues which both utilize AAPointerInfo to determine the
value of a load. For now, this is restricted to loads of allocas and
internal globals. Through the use of AAPointerInfo and the "bins" we can
track struct members separately. The users also know that storing only
zeros (at unknown indices) will result in loading only 0 (from unknown
indices). Other than that, the users are flow and context insensitive
(for now).

To deal with the "bins" more easily, AAPointerInfo provides a
forallInterfearingAccesses that applies a callback on all accesses
that might interfere with a given load or store.

Differential Revision: https://reviews.llvm.org/D104432
2021-07-19 22:48:35 -05:00
Johannes Doerfert 28c78a9e12 [Attributor] Simplify loads
As a first step to simplify loads we only handle `null` and `undef`
underlying objects, as well as objects that have the load as a single user.
Loads of those values can be replaced by the initializer, if any.
Proper reasoning is introduced in a follow up patch

Differential Revision: https://reviews.llvm.org/D103862
2021-07-19 22:47:29 -05:00
Johannes Doerfert 97387fdf6d [OpenMP] Fix carefully track SPMDCompatibilityTracker
We did not properly use SPMDCompatibilityTracker in various places.
This patch makes sure we look at the validity properly and also fix
the state if we can.

Differential Revision: https://reviews.llvm.org/D106085
2021-07-19 22:47:03 -05:00
Artem Belevich 1a43ee65d1 Revert "[MemCpyOpt] Enable memcpy optimizations unconditionally."
This reverts commit 2c98298a75 which breaks
sanitizers.
2021-07-19 14:27:41 -07:00
Petr Hosek 54902e00d1 [InstrProfiling] Use weak alias for bias variable
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted in which case the symbol selected
by the linker is going to depend on the order of inputs which can be
fragile.

This change replaces the use of weak definition inside the runtime with
a weak alias. We place the compiler generated symbol inside a COMDAT
group so dead definition can be garbage collected by the linker.

We also disable the use of runtime counter relocation on Darwin since
Mach-O doesn't support weak external references, but Darwin already uses
a different continous mode that relies on overmapping so runtime counter
relocation isn't needed there.

Differential Revision: https://reviews.llvm.org/D105176
2021-07-19 12:23:51 -07:00
Artem Belevich b988d69ea2 [infer-address-spaces] Handle complex non-pointer constexpr arguments.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51099

Differential Revision: https://reviews.llvm.org/D106098
2021-07-19 12:15:52 -07:00
Artem Belevich 2c98298a75 [MemCpyOpt] Enable memcpy optimizations unconditionally.
The patch does not depend on the availability of the library functions for
memcpy/memset as it operates on LLVM intrinsics.  The optimizations are useful
on the targets that have these functions disabled (e.g. NVPTX & AMDGPU).

Differential Revision: https://reviews.llvm.org/D104801
2021-07-19 11:58:02 -07:00
maekawatoshiki 74f0f9a455 [LICM] Create LoopNest Invariant Code Motion (LNICM) pass
This patch adds a new pass called LNICM which is a LoopNest version of LICM and a test case to show how LNICM works.
Basically, LNICM only hoists invariants out of loop nest (not a loop) to keep/make perfect loop nest. This enables later optimizations that require perfect loop nest.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D104180
2021-07-20 00:31:18 +09:00
Alexey Bataev d8d8b4574a [SLP]Fix possible crash on unreachable incoming values sorting.
The incoming values for PHI nodes may come from unreachable BasicBlocks,
need to handle this case.

Differential Revision: https://reviews.llvm.org/D106264
2021-07-19 04:54:53 -07:00
Mindong Chen e908e063d1 [LoopUtils] Fix incorrect RT check bounds of loop-invariant mem accesses
This fixes the lower and upper bound calculation of a
RuntimeCheckingPtrGroup when it has more than one loop
invariant pointers. Resolves PR50686.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D104148
2021-07-19 19:38:24 +08:00
Florian Mayer 807d50100c Revert "[hwasan] Use stack safety analysis."
This reverts commit 12268fe14a.
2021-07-19 12:08:32 +01:00
Florian Mayer 12268fe14a [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-19 11:54:44 +01:00
Rosie Sumpter 34d6820551 [LoopFlatten] Use Loop to identify loop induction phi. NFC
Replace code which identifies induction phi with helper function
getInductionVariable to improve robustness.

Differential Revision: https://reviews.llvm.org/D106045
2021-07-19 09:06:57 +01:00
Krishna Kariya da92e86263 [InstCombine] Fold IntToPtr/PtrToInt to bitcast
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support
to specific cases where this optimization works. This patch is the
first one on this line.

Consider the example:

    %i = ptrtoint i8* %X to i64
    %p = inttoptr i64 %i to i16*
    %cmp = icmp eq i8* %load, %p

In this specific case, the inttoptr/ptrtoint optimization is correct
as it only compares the pointer values. In this patch, we fold
inttoptr/ptrtoint to a bitcast (if src and dest types are different).

Differential Revision: https://reviews.llvm.org/D105088
2021-07-18 23:13:25 +02:00
Sanjay Patel c0f2c4ce10 [SimplifyCFG] remove unnecessary state variable; NFC
Keeping a marker for Changed might have made sense before
this code was refactored, but we never touch that variable
after initialization now.
2021-07-18 13:42:22 -04:00
Nikita Popov 59c33a0bc8 [Cloning] Remove unused parameter from CloneAndPruneFunctionInto() (NFC) 2021-07-18 18:38:06 +02:00
Sanjay Patel 0e15de2d0c [InstCombine] fold reassociative FP add into start value of fadd reduction
This pattern is visible in unrolled and vectorized loops.
Although the backend seems to be able to reassociate to
ideal form in the examples I looked at, we might as well
do that in IR for efficiency.
2021-07-18 06:26:20 -04:00
Shilei Tian d3454ee8d2 [AbstractAttributor] Fix two issues in folding __kmpc_is_spmd_exec_mode
This patch fixed two issues found when folding `__kmpc_is_spmd_exec_mode`:
1. When the reaching kernels are empty, it should not fold to generic mode.
2. When creating AA for the caller when updating information, the dependency
   should be required.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D106209
2021-07-17 13:13:44 -04:00
Wenlei He f9f3c34e0f [CSSPGO] Turn on iterative-BFI for CSSPGO
Iterative-BFI produces better count quality and performance when evaluated on internal benchmarks. Turning it on by default now for CSSPGO. We can consider turn it on by default for AutoFDO as well in the future.

Differential Revision: https://reviews.llvm.org/D106202
2021-07-16 17:35:49 -07:00
Sami Tolvanen 0ad1d9fdf2 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 8e3b5cb39e.

Reverting to investigate test failures.
2021-07-16 14:47:33 -07:00
Sami Tolvanen 8e3b5cb39e ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them. This version uses module inline assembly
to avoid issues with LowerTypeTestsModule.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-07-16 14:33:34 -07:00
Alexey Bataev da3dbfcacf [SLP]Improve calculations of the cost for reused/reordered scalars.
Part of D105020. Also, fixed FIXMEs that need to use wider vector type
when trying to calculate the cost of reused scalars. This may cause
regressions unless D100486 is landed to improve the cost estimations
for long vectors shuffling.

Differential Revision: https://reviews.llvm.org/D106060
2021-07-16 13:40:15 -07:00
Alexey Bataev 1b18e9ab67 [PATCH] D105827: [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-16 12:59:08 -07:00
Joseph Huber b910a109f8 [OpenMP][NFC] Update the comment header for optimizations. 2021-07-16 14:13:13 -04:00
Joseph Huber 2c31d5ebfb [OpenMP] Add IDs to OpenMP remarks
This patch adds unique idenfitiers to the existing OpenMP remarks. This makes
it easier to identify the corresponding documentation for each remark that will
be hosted in the OpenMP webpage.

Depends on D105898

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105939
2021-07-16 14:07:03 -04:00
Joseph Huber eef6601b0f [OpenMP] Rework OpenMP remarks
This patch rewrites and reworks a few of the existing remarks to make the mmore
concise and consistent prior to writing the documentation for them.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105898
2021-07-16 14:07:00 -04:00
Congzhe Cao a7b7d22d6e [LoopInterchange] Check lcssa phis in the inner latch in scenarios of multi-level nested loops
We already know that we need to check whether lcssa
phis are supported in inner loop exit block or in
outer loop exit block, and we have logic to check
them already. Presumably the inner loop latch does
not have lcssa phis and there is no code that deals
with lcssa phis in the inner loop latch. However,
that assumption is not true, when we have loops
with more than two-level nesting. This patch adds
checks for lcssa phis in the inner latch.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102300
2021-07-16 11:59:20 -04:00
Kerry McLaughlin 49d73130ca [LV] Avoid scalable vectorization for loops containing alloca
This patch returns an Invalid cost from getInstructionCost() for alloca
instructions if the VF is scalable, as otherwise loops which contain
these instructions will crash when attempting to scalarize the alloca.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105824
2021-07-16 11:47:13 +01:00
Sander de Smalen 239d01fa88 Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs."
The original patch was:
  https://reviews.llvm.org/D105806

There were some issues with undeterministic behaviour of the sorting
function, which led to scalable-call.ll passing and/or failing. This
patch fixes the issue by numbering all instructions in the array first,
and using that number as the order, which should provide a consistent
ordering.

This reverts commit a607f64118.
2021-07-16 10:52:01 +01:00
Max Kazantsev f98ed74f69 [LSR] Handle case 1*reg => reg. PR50918
This patch addresses assertion failure in case when the only found formula for LSR
is `1*reg => reg` which was supposed to be an impossible situation, however there
is a test that shows it is possible.

In this case, we can use scale register with scale of 1 as the missing base register.

Reviewed By: huihuiz, reames
Differential Revision: https://reviews.llvm.org/D105009
2021-07-16 11:33:59 +07:00
Shilei Tian c23da666b5 [Attributor] Add support for compound assignment for ChangeStatus
A common use of `ChangeStatus` is as follows:
```
ChangeStatus Changed = ChangeStatus::UNCHANGED;
Changed |= foo();
```
where `foo` returns `ChangeStatus` as well. Currently `ChangeStatus` doesn't
support compound assignment, we have to write as
```
Changed = Changed | foo();
```
which is not that convenient.

This patch add the support for compound assignment for `ChangeStatus`. Compound
assignment is usually implemented as a member function, and binary arithmetic
operator is therefore implemented using compound assignment. However, unlike
regular C++ class, enum class doesn't support member functions. As a result, they
can only be implemented in the way shown in the patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106109
2021-07-15 23:51:46 -04:00
Vitaly Buka bba8a76b87 [NFC][hwasan] Remove default arguments in internal class 2021-07-15 15:28:02 -07:00
Shilei Tian ca662297d5 [AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-15 18:23:23 -04:00
Sanjay Patel 81ce3aa30c [SLP] avoid leaking poison in reduction of safe boolean logic ops
This bug was introduced with D105730 / 25ee55c0ba .

If we are not converting all of the operations of a reduction
into a vector op, we need to preserve the existing select form
of the remaining ops. Otherwise, we are potentially leaking
poison where it did not in the original code.

Alive2 agrees that the version that freezes some inputs
and then falls back to scalar is correct:
https://alive2.llvm.org/ce/z/erF4K2
2021-07-15 17:33:06 -04:00
Simon Pilgrim 0a614ca225 Fix "unknown pragma 'GCC'" MSVC warning. NFCI. 2021-07-15 18:50:19 +01:00
Arthur Eubanks 99cb2507f3 Revert "[SLP]Workaround for InsertSubVector cost."
This reverts commit 2eb50baf05.

Causes hangs, see comments on D105827.
2021-07-15 10:19:41 -07:00
Nikita Popov c191035f42 [IR] Add elementtype attribute
This implements the elementtype attribute specified in D105407. It
just adds the attribute and the specified verifier rules, but
doesn't yet make use of it anywhere.

Differential Revision: https://reviews.llvm.org/D106008
2021-07-15 18:04:26 +02:00
Arthur Eubanks 04b75c05b0 [InstCombine] Look through invariant group intrinsics when removing malloc
Fixes some regressions with -fstrict-vtable-pointers in llvm-test-suite.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D106017
2021-07-15 09:02:40 -07:00
Philip Reames 95346ba877 [LV] Enable vectorization of multiple exit loops w/computable exit counts
This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch.

The majority of work to support this was done in a set of previous patches. In particular,, 72314466 avoids having multiple edges from the middle block to the exits, and 4b33b2387 which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit.

Differential Revision: https://reviews.llvm.org/D105817
2021-07-15 08:53:51 -07:00
Shilei Tian a70ef3f568 Revert "[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible"
This reverts commit 1100e4aafe.
2021-07-15 11:19:28 -04:00
Sander de Smalen a607f64118 Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs."
This reverts commit efaf3099c8.
This reverts commit dc7bdc1e71.

Reverting patches due to buildbot failures.
2021-07-15 15:21:57 +01:00
Roman Lebedev 3e6c383dc6
[SimplifyCFG] Rerun PHI deduplication after common code sinkinkg (PR51092)
`SinkCommonCodeFromPredecessors()` doesn't itself ensure that duplicate PHI nodes aren't created.
I suppose, we could teach it to do that on-the-fly (& account for the already-existing PHI nodes,
& adjust costmodel), the diff will be bigger than this.

The alternative is to schedule a new EarlyCSE pass invocation somewhere later in the pipeline.
Clearly, we don't have any EarlyCSE runs in module optimization passline, so this pattern isn't cleaned up...
That would perhaps better, but it will again have some compile time impact.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106010
2021-07-15 16:34:34 +03:00
Sander de Smalen dc7bdc1e71 [LV] Fix determinism for failing scalable-call.ll test.
The sort function for emitting an OptRemark was not deterministic,
which caused scalable-call.ll to fail on some buildbots. This patch
fixes that.

This patch also fixes an issue where `Instruction::comesBefore()`
is called when two Instructions are in different basic blocks,
which would otherwise cause an assertion failure.
2021-07-15 13:16:59 +01:00
Stephen Tozer 47633af9d4 Reapply "[DebugInfo] Enable variadic debug value salvaging"
Reapplied after previous build failures were fixed in 14b62f7e2.

This reverts commit 540b4a5fb3.
2021-07-15 12:54:51 +01:00
Simon Pilgrim 944f39f38d [InstCombine] Strip inbounds from (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) fold
As discussed on rGd561b6fbdbe6, we can't guarantee that the new gep is inbounds
2021-07-15 12:19:10 +01:00
Ilya Leoshkevich 3845f2cd94 [TSan] Use zeroext for function parameters
SystemZ ABI requires zero-extending function parameters to 64-bit. The
compiler is free to optimize the code around this assumption, e.g.
failing to zero-extend __tsan_atomic32_load()'s morder may cause
crashes in to_mo() switch table lookup.

Fix by adding zeroext attributes to TSan's FunctionCallees, similar to
how it was done in commit 3bc439bdff ("[MSan] Add instrumentation for
SystemZ"). This is a no-op on arches that don't need it.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D105629
2021-07-15 12:18:47 +02:00
Florian Mayer 0ed1747a92 [NFC] [hwasan] Split argument logic into functions.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105971
2021-07-15 10:45:43 +01:00
Chuanqi Xu 8a1727ba51 [Coroutines] Run coroutine passes by default
This patch make coroutine passes run by default in LLVM pipeline. Now
the clang and opt could handle IR inputs containing coroutine intrinsics
without special options.
It should be fine. On the one hand, the coroutine passes seems to be stable
since there are already many projects using coroutine feature.
On the other hand, the coroutine passes should do nothing for IR who doesn't
contain coroutine intrinsic.

Test Plan: check-llvm

Reviewed by: lxfind, aeubanks

Differential Revision: https://reviews.llvm.org/D105877
2021-07-15 14:33:40 +08:00
Kuter Dinel ade190c5ea [Attributor] AACallEdges, Add a way to ask nonasm unknown callees
This patch adds a feature to AACallEdges AbstractAttribute that allows
users to ask if there is a unknown callee that isn't a inline assembly.
This feature is needed by some of it's users.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105992
2021-07-15 06:10:42 +03:00
Jon Roelofs d143103068 [GlobalOpt] Fix a miscompile when evaluating struct initializers.
The bug was that evaluateBitcastFromPtr attempts a narrowing to a struct's 0th
element of a store that covers other elements. While this is okay on the load
side, applying it to stores causes us to miss the writes to the additionally
covered elements.

rdar://79503568

Differential revision: https://reviews.llvm.org/D105838
2021-07-14 15:37:01 -07:00
Arthur Eubanks 5366de7375 [SimpleLoopUnswitch] Don't non-trivially unswitch loops with catchswitch exits
SplitBlock() can't handle catchswitch.

Fixes PR50973.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D105672
2021-07-14 14:07:28 -07:00
Alexey Bataev ba2690b17b [SLP][NFC]Fix variables names, NFC. 2021-07-14 12:43:45 -07:00
Simon Pilgrim 4fd0addb68 [SLP] Fix case of variable name. NFCI. 2021-07-14 20:20:04 +01:00
Sanjay Patel ca6e117d86 [InstCombine] reorder icmp with offset folds for better results
This set of folds was added recently with:
c7b658aeb5
0c400e8953
40b752d28d

...and I noted that this wasn't likely to fire in code derived
from C/C++ source because of nsw in particular. But I didn't
notice that I had placed the code above the no-wrap block
of transforms.

This is likely the cause of regressions noted from the previous
commit because -- as shown in the test diffs -- we may have
transformed into a compare with an arbitrary constant rather
than a simpler signbit test.
2021-07-14 12:12:05 -04:00
Sander de Smalen efaf3099c8 [LV] Print remark when loop cannot be vectorized due to invalid costs.
This patch emits remarks for instructions that have invalid costs for
a given set of vectorization factors. Some example output:

  t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
      dst[i] = sinf(src[i]);
                    ^
  t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32
      dst[i] = sinf(src[i]);
               ^
  t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
      dst[i] = sinf(src[i]);
             ^

Reviewed By: fhahn, kmclaughlin

Differential Revision: https://reviews.llvm.org/D105806
2021-07-14 17:11:33 +01:00
Alexey Bataev 2eb50baf05 [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-14 07:54:24 -07:00
Sanjay Patel 25ee55c0ba [SLP] match logical and/or as reduction candidates
This has been a work-in-progress for a long time...we finally have all of
the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312

To do this (see PhaseOrdering tests), we converted SimplifyCFG and
InstCombine to the poison-safe (select) forms of the logic ops, so now we
need to have SLP recognize those patterns and insert a freeze op to make
a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah

We get the minimal patterns with this patch, but the PhaseOrdering tests
show that we still need adjustments to get the ideal IR in some or all of
the motivating cases.

Differential Revision: https://reviews.llvm.org/D105730
2021-07-14 09:02:31 -04:00
Simon Pilgrim d561b6fbdb [InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) (REAPPLIED)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Reapplied with a fix for the bpf "-bpf-disable-avoid-speculation" tests

Differential Revision: https://reviews.llvm.org/D105901
2021-07-14 12:21:01 +01:00
Chuanqi Xu 12d04ce956 [NFC] [Coroutines] Remove unused CoroFree 2021-07-14 19:13:12 +08:00
Simon Pilgrim 0722f3d0fa Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)"
Missed some BPF test changes that need addressing
2021-07-14 11:48:37 +01:00
Stephen Tozer 810e4c3c66 [DebugInfo] Correctly update dbg.values with duplicated location ops
This patch fixes code that incorrectly handled dbg.values with duplicate
location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in
question were caused by either applying an update to dbg.value multiple
times when the update is only valid once, or by updating the
DIExpression for only the first instance of a value that appears
multiple times.

Differential Revision: https://reviews.llvm.org/D105831
2021-07-14 11:17:24 +01:00
Simon Pilgrim b803294cf7 [InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Differential Revision: https://reviews.llvm.org/D105901
2021-07-14 10:49:12 +01:00
Shilei Tian 1100e4aafe [AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-13 22:28:35 -04:00
Arthur Eubanks 0024ec59a0 [NewPM][SimpleLoopUnswitch] Add option to not trivially unswitch
To help with debugging non-trivial unswitching issues.

Don't care about the legacy pass, nobody is using it.

If a pass's string params are empty (e.g. "simple-loop-unswitch"), don't
default to the empty constructor for the pass params. We should still
let the parser take care of it in case the parser has its own defaults.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D105933
2021-07-13 16:09:42 -07:00
Eli Friedman 5d1ba53404 [LoopReroll] Add an extra defensive check to avoid SCEV assertion.
Make sure getMinusSCEV() didn't return a pointer.  The following check
would never succeed if it was a pointer, anyway, but calling
getMulExpr() on a pointer SCEV now asserts.
2021-07-13 12:17:09 -07:00
Arthur Eubanks 489742991f [NFC] Inline variable to prevent unused variable warning 2021-07-13 09:57:59 -07:00
Arthur Eubanks ab5693aa4a [OpaquePtr] Use byval type more 2021-07-13 09:34:34 -07:00
Arthur Eubanks 113a807977 [OpaquePtr] Get load/store type without PointerType::getElementType() 2021-07-13 09:34:34 -07:00
Arthur Eubanks 693bc04bf6 [OpaquePtr] Use GlobalValue::getValueType() more 2021-07-13 09:34:34 -07:00
Simon Pilgrim b2f6cf1479 [InstCombine] Fold lshr/ashr(or(neg(x),x),bw-1) --> zext/sext(icmp_ne(x,0)) (PR50816)
Handle the missing fold reported in PR50816, which is a variant of the existing ashr(sub_nsw(X,Y),bw-1) --> sext(icmp_sgt(X,Y)) fold.

We also handle the lshr(or(neg(x),x),bw-1) --> zext(icmp_ne(x,0)) equivalent - https://alive2.llvm.org/ce/z/SnZmSj

We still allow multi uses of the neg(x) - as this is likely to let us further simplify other uses of the neg - but not multi uses of the or() which would increase instruction count.

Differential Revision: https://reviews.llvm.org/D105764
2021-07-13 14:44:54 +01:00
Jeroen Dobbelaere 1d8030053d [NFC] Do not track calls to inlined intrinsics in IFI.
Just like intrinsics are not tracked for IFI.InlinedCalls, they should not be tracked for IFI.InlinedCallSites.

In the current top-of-tree this change is a NFC, but the full restrict patches (D68484) potentially trigger an read-after-free
if intrinsics are also added to the InlindeCallSites, due to a late optimization potentially removing some of the inlined intrinsics.

Also see https://lists.llvm.org/pipermail/llvm-dev/2021-July/151722.html for a discussion about the problem.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105805
2021-07-13 10:18:23 +02:00
hyeongyu kim e338d08ae6 [SimplifyCFG] Fix SimplifyBranchOnICmpChain to be undef/poison safe.
This patch fixes the problem of SimplifyBranchOnICmpChain that occurs
when extra values are Undef or poison.

Suppose the %mode is 51 and the %Cond is poison, and let's look at the
case below.
```
	%A = icmp ne i32 %mode, 0
	%B = icmp ne i32 %mode, 51
	%C = select i1 %A, i1 %B, i1 false
	%D = select i1 %C, i1 %Cond, i1 false
	br i1 %D, label %T, label %F
=>
	br i1 %Cond, label %switch.early.test, label %F
switch.early.test:
	switch i32 %mode, label %T [
		i32 51, label %F
		i32 0, label %F
	]
```
incorrectness: https://alive2.llvm.org/ce/z/BWScX

Code before transformation will not raise UB because %C and %D is false,
and it will not use %Cond. But after transformation, %Cond is being used
immediately, and it will raise UB.

This problem can be solved by adding freeze instruction.

correctness: https://alive2.llvm.org/ce/z/x9x4oY

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104569
2021-07-13 15:35:18 +09:00
Arthur Eubanks 0e6424acbd [OpaquePointers][ThreadSanitizer] Cleanup calls to PointerType::getElementType()
Reviewed By: #opaque-pointers, dblaikie

Differential Revision: https://reviews.llvm.org/D105710
2021-07-12 20:46:08 -07:00
Nikita Popov 7ed3e87825 [Attributes] Determine attribute properties from TableGen data
Continuing from D105763, this allows placing certain properties
about attributes in the TableGen definition. In particular, we
store whether an attribute applies to fn/param/ret (or a combination
thereof). This information is used by the Verifier, as well as the
ForceFunctionAttrs pass. I also plan to use this in LLParser,
which also duplicates info on which attributes are valid where.

This keeps metadata about attributes in one place, and makes it
more likely that it stays in sync, rather than in various
functions spread across the codebase.

Differential Revision: https://reviews.llvm.org/D105780
2021-07-12 22:13:38 +02:00
Nikita Popov 6ac32872ee [Attributes] Replace doesAttrKindHaveArgument() (NFC)
This is now the same as isIntAttrKind(), so use that instead, as
it does not require manual maintenance. The naming is also more
accurate in that both int and type attributes have an argument,
but this method was only targeting int attributes.

I initially wanted to tighten the AttrBuilder assertion, but we
have some in-tree uses that would violate it.
2021-07-12 21:57:26 +02:00
Sanjay Patel a488c7879e [InstCombine] reduce signbit test of logic ops to cmp with zero
This is the pattern from the description of:
https://llvm.org/PR50816

There might be a way to generalize this to a smaller or more
generic pattern, but I have not found it yet.

https://alive2.llvm.org/ce/z/ShzJoF

define i1 @src(i8 %x) {
  %add = add i8 %x, -1
  %xor = xor i8 %x, -1
  %and = and i8 %add, %xor
  %r = icmp slt i8 %and, 0
  ret i1 %r
}

define i1 @tgt(i8 %x) {
  %r = icmp eq i8 %x, 0
  ret i1 %r
}
2021-07-12 09:01:26 -04:00
Yevgeny Rouban 88024a724c [RS4GC] Use one DVCache for both inlineGetBaseAndOffset() and insertParsePoints()
This new test demonstrates a case where a base ptr is generated
twice for the same value: the first one is generated while
the gc.get.pointer.base() is inlined, the second is generated
for the statepoint. This happens because the methods
inlineGetBaseAndOffset() and insertParsePoints() do not share
their defining value cache used by the findBasePointer() method.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D103240
2021-07-12 18:13:00 +07:00
Sander de Smalen d2e4ccc790 [LV] Ignore candidate VFs with invalid costs.
This follows on from discussion on the mailing-list:
  https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html

to interpret an Invalid cost as 'infinitely expensive', as this
simplifies some of the legalization issues with scalable vectors.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105473
2021-07-12 09:58:22 +01:00
Liqiang Tao 6ebeb7f8ba [llvm][Inliner] Templatize PriorityInlineOrder
The patch templatize PriorityInlinerOrder so that it can accept any type priority metric.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D104972
2021-07-12 10:38:22 +08:00
Johannes Doerfert 792aac9897 [Attributor][NFCI] Add UsedAssumedInformation to more interfaces
As with other Attributor interfaces we often want to know if assumed
information was used to answer a query. This is important if only
known information is allowed or if known information can lead to an
early fixpoint. The users have been adjusted but none of them utilizes
the new information yet.
2021-07-11 19:18:03 -05:00
Eli Friedman 6144085c29 [IndVars] Don't widen pointers in WidenIV::getWideRecurrence
It's not a reasonable transform, and calling getSignExtendExpr() on a
pointer hits an assertion.
2021-07-11 17:04:50 -07:00
Florian Hahn c6e4c1fbd8
[VPlan] Remove default arg from getVPValue (NFC).
The const version of VPValue::getVPValue still had a default value for
the value index. Remove the default value and use getVPSingleValue
instead, which is the proper function.
2021-07-11 22:03:09 +02:00
Craig Topper e38b7e8948 [DivRemPairs] Add an initial case for hoisting to a common predecessor.
This patch adds support for hoisting the division and maybe the
remainder for control flow graphs like this.

```
PredBB
  |  \
  |  Rem
  |  /
 Div
```

If we have DivRem we'll hoist both to PredBB. If not we'll just
hoist Div and expand Rem using the Div.

This improves our codegen for something like this

```
__uint128_t udivmodti4(__uint128_t dividend, __uint128_t divisor, __uint128_t *remainder) {
    if (remainder != 0)
        *remainder = dividend % divisor;
    return dividend / divisor;
}
```

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D87555
2021-07-11 10:03:07 -07:00
hyeongyu kim 1a5f4cbe1b [InstCombine] Add optimization to prevent poison from being propagated.
In D104569, Freeze was inserted just before br to solve the `branching on undef` miscompilation problem.
But value analysis was being disturbed by added freeze.

```
v = load ptr
cond = freeze(icmp (and v, const), const')
br cond, ...
```
The case in which value analysis disturbed is as above.
By changing freeze to add immediately after load, value analysis will be successful again.

```
v = load ptr
freeze(icmp (and v, const), const')
=>
v = load ptr
v' = freeze v
icmp (and v', const), const'
```
In this patch, I propose the above optimization.
With this patch, the poison will not spread as the freeze is performed early.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D105392
2021-07-11 12:40:43 +09:00
Johannes Doerfert 2e7e2994a9 [Attributor][FIX] Destroy bump allocator objects to avoid leaks
AllocationInfo and DeallocationInfo objects themselves are allocated
with the Attributor bump allocator and do not need to be deallocated.
That said, the sets in AllocationInfo and DeallocationInfo need to be
destroyed to avoid memory leaks.
2021-07-10 18:53:37 -05:00
Johannes Doerfert 514c033db1 [OpenMP] Detect SPMD compatible kernels and execute them as such
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.

The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D102307
2021-07-10 18:44:25 -05:00
Johannes Doerfert 8cb7d71355 [OpenMP][FIX] Add missing `)` to remark 2021-07-10 18:40:32 -05:00
Johannes Doerfert d9659bf6a0 [OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D101977
2021-07-10 17:57:08 -05:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Johannes Doerfert 4761d29633 [Attributor][FIX] Sanitize queries to LVI and ScalarEvolution
When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need
to be careful with the query. The particular error occurred because we
folded a PHI node before the LVI query but the context location was now
not dominated by the value anymore. This is not supported by LVI so we
have to filter these situations before we query the outside analyses.
2021-07-10 16:45:19 -05:00
Johannes Doerfert c1d53a316d [Attributor] Look through selects in genericValueTraversal
If we can simplify the select condition we can avoid one value in the
traversal.

Differential Revision: https://reviews.llvm.org/D103861
2021-07-10 16:44:05 -05:00
Johannes Doerfert c1c1fe9385 [Attributor] Reorganize AAHeapToStack
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.

This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.

Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.

Differential Revision: https://reviews.llvm.org/D104993
2021-07-10 16:32:24 -05:00
Johannes Doerfert dbb3a65f5b [Attributor][FIX] Do not replace a value with a non-dominating instruction
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
2021-07-10 16:09:30 -05:00
Johannes Doerfert 5ef18e2421 [Attributor] Use AAValueSimplify to simplify returned values
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.

AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.

Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.

Differential Revision: https://reviews.llvm.org/D103860
2021-07-10 15:52:36 -05:00
Johannes Doerfert 0aab13aaf9 [Attributor] Introduce an optimistic getUnderlyingObjects helper
As the `llvm::getUnderlyingObjects` helper, the optimistic version
collects objects that might be the base of a given pointer. In contrast
to the llvm variant, the optimistic one will use assumed information,
e.g., about select conditions or dead blocks, to provide a more precise
result.

Differential Revision: https://reviews.llvm.org/D103859
2021-07-10 15:47:30 -05:00
Johannes Doerfert 5b12cf3e65 [Attributor][FIX] Traverse uses even if a value is assumed constant
Not all attributes are able to handle the interprocedural step and
follow the uses into a call site. Let them be able to combine call site
uses instead. This might result in some unused values/arguments being
leftover but it removes problems where we misused "is dead" even though
it was actually "is simplified/replaced".

We explicitly check for dead values due to constant propagation in
`AAIsDeadValueImpl::areAllUsesAssumedDead` instead.

Differential Revision: https://reviews.llvm.org/D103858
2021-07-10 15:47:20 -05:00
Nico Weber d3e7491333 Revert Attributor patch series
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
2021-07-10 16:15:55 -04:00
Johannes Doerfert 269416d419 [Attributor][NFCI] Add UsedAssumedInformation to more interfaces
As with other Attributor interfaces we often want to know if assumed
information was used to answer a query. This is important if only
known information is allowed or if known information can lead to an
early fixpoint. The users have been adjusted but none of them utilizes
the new information yet.
2021-07-10 12:32:51 -05:00
Johannes Doerfert d39179d7fa [OpenMP] Detect SPMD compatible kernels and execute them as such
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.

The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D102307
2021-07-10 12:32:51 -05:00
Johannes Doerfert 966342790e [Attributor][FIX] Sanitize queries to LVI and ScalarEvolution
When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need
to be careful with the query. The particular error occurred because we
folded a PHI node before the LVI query but the context location was now
not dominated by the value anymore. This is not supported by LVI so we
have to filter these situations before we query the outside analyses.
2021-07-10 12:32:51 -05:00
Johannes Doerfert ae08df87df [Attributor][FIX] Do not replace a value with a non-dominating instruction
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
2021-07-10 12:32:50 -05:00
Johannes Doerfert f0628c6ff7 [OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D101977
2021-07-10 12:32:50 -05:00
Johannes Doerfert 1d5711c3ee [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 12:32:50 -05:00
Johannes Doerfert 5003ba2542 [Attributor] Look through selects in genericValueTraversal
If we can simplify the select condition we can avoid one value in the
traversal.

Differential Revision: https://reviews.llvm.org/D103861
2021-07-10 12:32:50 -05:00
Johannes Doerfert 1eb31d6de3 [Attributor] Reorganize AAHeapToStack
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.

This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.

Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.

Differential Revision: https://reviews.llvm.org/D104993
2021-07-10 12:32:50 -05:00
Johannes Doerfert 374e573cfc [Attributor] Use AAValueSimplify to simplify returned values
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.

AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.

Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.

Differential Revision: https://reviews.llvm.org/D103860
2021-07-10 12:32:50 -05:00
Johannes Doerfert 93a279a67d [Attributor] Introduce an optimistic getUnderlyingObjects helper
As the `llvm::getUnderlyingObjects` helper, the optimistic version
collects objects that might be the base of a given pointer. In contrast
to the llvm variant, the optimistic one will use assumed information,
e.g., about select conditions or dead blocks, to provide a more precise
result.

Differential Revision: https://reviews.llvm.org/D103859
2021-07-10 12:32:49 -05:00
Johannes Doerfert be5d46e9bb [Attributor][FIX] Traverse uses even if a value is assumed constant
Not all attributes are able to handle the interprocedural step and
follow the uses into a call site. Let them be able to combine call site
uses instead. This might result in some unused values/arguments being
leftover but it removes problems where we misused "is dead" even though
it was actually "is simplified/replaced".

We explicitly check for dead values due to constant propagation in
`AAIsDeadValueImpl::areAllUsesAssumedDead` instead.

Differential Revision: https://reviews.llvm.org/D103858
2021-07-10 12:32:49 -05:00
Sander de Smalen 239fcda268 [LV] NFCI: Do cost comparison on InstructionCost directly.
Instead of performing the isMoreProfitable() operation on
InstructionCost::CostTy the operation is performed on InstructionCost
directly, so that it can handle the case where one of the costs is
Invalid.

This patch also changes the CostTy to be int64_t, so that the type is
wide enough to deal with multiplications with e.g. `unsigned MaxTripCount`.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105113
2021-07-10 11:57:16 +01:00
Eli Friedman 9c4baf5101 [ScalarEvolution] Strictly enforce pointer/int type rules.
Rules:

1. SCEVUnknown is a pointer if and only if the LLVM IR value is a
   pointer.
2. SCEVPtrToInt is never a pointer.
3. If any other SCEV expression has no pointer operands, the result is
   an integer.
4. If a SCEVAddExpr has exactly one pointer operand, the result is a
   pointer.
5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other
   pointer operands, the result is a pointer.
6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a
   pointer.
7. Otherwise, the SCEV expression is invalid.

I'm not sure how useful rule 6 is in practice.  If we exclude it, we can
guarantee that ScalarEvolution::getPointerBase always returns a
SCEVUnknown, which might be a helpful property. Anyway, I'll leave that
for a followup.

This is basically mop-up at this point; all the changes with significant
functional effects have landed.  Some of the remaining changes could be
split off, but I don't see much point.

Differential Revision: https://reviews.llvm.org/D105510
2021-07-09 17:29:26 -07:00
Valery N Dmitriev 8e9216fe87 [SLP] Do not make an attempt to match reduction on already erased instruction.
Differential Revision: https://reviews.llvm.org/D105752
2021-07-09 17:13:15 -07:00
Kazu Hirata 49d66d9f9f [AFDO] Merge function attributes after inlining
This patch teaches the sample profile loader to merge function
attributes after inlining functions.

Without this patch, the compiler could inline a function requiring the
512-bit vector width into its caller without merging function
attributes, triggering a failure during instruction selection.

Differential Revision: https://reviews.llvm.org/D105729
2021-07-09 16:47:12 -07:00
Sanjay Patel c2b7f09d8c [SLP] make invalid operand explicit for extra arg in reduction matching; NFC
This makes it clearer when we have encountered the extra arg.
Also, we may need to adjust the way the operand iteration
works when handling logical and/or.
2021-07-09 15:32:12 -04:00
Varun Gandhi 92dcb1d2db [Clang] Introduce Swift async calling convention.
This change is intended as initial setup. The plan is to add
more semantic checks later. I plan to update the documentation
as more semantic checks are added (instead of documenting the
details up front). Most of the code closely mirrors that for
the Swift calling convention. Three places are marked as
[FIXME: swiftasynccc]; those will be addressed once the
corresponding convention is introduced in LLVM.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D95561
2021-07-09 11:50:10 -07:00
Arthur Eubanks 9a7afae492 [OpaquePtr][InferAddrSpace] Use PointerType::getWithSamePointeeType() 2021-07-09 10:29:08 -07:00
Arthur Eubanks 4e6013250d [NFC][OpaquePtr] Use GlobalValue::getValueType() more
Instead of getType()->getElementType().
2021-07-09 09:55:41 -07:00
Sanjay Patel 486992f958 [SLP] improve code comments; NFC
This likely started out only supporint binops,
but now we handle min/max using cmp+sel, and
we may extend to handle bool logic in the form
of select.
2021-07-09 12:49:54 -04:00
Sanjay Patel 544f2711bb [SLP] make checks for cmp+select min/max more explicit
This is NFC-intended currently (so no test diffs). The motivation
is to eventually allow matching for poison-safe logical-and and
logical-or (these are in the form of a select-of-bools).
( https://llvm.org/PR41312 )

Those patterns will not have all of the same constraints as min/max
in the form of cmp+sel. We may also end up removing the cmp+sel
min/max matching entirely (if we canonicalize to intrinsics), so
this will make that step easier.
2021-07-09 12:43:43 -04:00
Arthur Eubanks e4f66a1055 [OpaquePointers][CallPromotion] Don't look at pointee type for byval
byval's type parameter is now always required.
2021-07-09 09:34:05 -07:00
Nico Weber 97c675d3d4 Revert "Revert "Temporarily do not drop volatile stores before unreachable""
This reverts commit 52aeacfbf5.
There isn't full agreement on a path forward yet, but there is agreement that
this shouldn't land as-is.  See discussion on https://reviews.llvm.org/D105338

Also reverts unreviewed "[clang] Improve `-Wnull-dereference` diag to be more in-line with reality"
This reverts commit f4877c78c0.

And all the related changes to tests:
This reverts commit 9a0152799f.
This reverts commit 3f7c9cc274.
This reverts commit 329f8197ef.
This reverts commit aa9f58cc2c.
This reverts commit 2df37d5ddd.
This reverts commit a72a441812.
2021-07-09 11:44:34 -04:00
Kevin P. Neal 52900486a1 [FPEnv][InstSimplify] Constrained FP support for NaN
Currently InstructionSimplify.cpp knows how to simplify floating point
instructions that have a NaN operand. It does not know how to handle the
matching constrained FP intrinsic.

This patch teaches it how to simplify so long as the exception handling
is not "fpexcept.strict".

Differential Revision: https://reviews.llvm.org/D103169
2021-07-09 11:26:28 -04:00
Roman Lebedev 4e332cd41a
Revert "Transform memset + malloc --> calloc (PR25892)"
It broke `check-msan`, see e.g. https://lab.llvm.org/buildbot/#/builders/18/builds/1934

This reverts commit 375694a07b.
2021-07-09 16:26:48 +03:00
Roman Lebedev 52aeacfbf5
Revert "Temporarily do not drop volatile stores before unreachable"
This reverts commit 4e413e1621,
which landed almost 10 months ago under premise that the original behavior
didn't match reality and was breaking users, even though it was correct as per
the LangRef. But the LangRef change still hasn't appeared, which might suggest
that the affected parties aren't really worried about this problem.

Please refer to discussion in:
* https://reviews.llvm.org/D87399 (`Revert "[InstCombine] erase instructions leading up to unreachable"`)
* https://reviews.llvm.org/D53184 (`[LangRef] Clarify semantics of volatile operations.`)
* https://reviews.llvm.org/D87149 (`[InstCombine] erase instructions leading up to unreachable`)

clang has `-Wnull-dereference` which will diagnose the obvious cases
of null dereference, it was adjusted in f4877c78c0,
but it will only catch the cases where the pointer is a null literal,
it will not catch the cases where an arbitrary store is expected to trap.

Differential Revision: https://reviews.llvm.org/D105338
2021-07-09 14:16:54 +03:00
Max Kazantsev 9c5e65691e [LoopDeletion] Handle switch in proving that loop exits on first iteration
Added check for switch-terminated blocks in loops.
Now if a block is terminated with a switch, we try to find out which of the
cases is taken on 1st iteration and mark corresponding edge from the block
to the case successor as live.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D105688
Reviewed By: nikic, mkazantsev
2021-07-09 18:03:34 +07:00
David Green 38c9a4068d [TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.

This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Differential Revision: https://reviews.llvm.org/D105484
2021-07-09 11:51:16 +01:00
Dawid Jurczak 375694a07b Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-09 11:01:12 +02:00
Bjorn Pettersson 472462c472 [NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg'
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.

This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.

I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105627
2021-07-09 09:47:03 +02:00
Reshabh Sharma 2e194dec60 [ASan][AMDGPU] Make shadow offset match X86 on Linux
This patch explicitly sets the shadow offset for
AMDGPU to match that of X86 on Linux.

Reviewed By: vitalybuka

https://reviews.llvm.org/D105282
2021-07-09 07:48:03 +05:30
Alexey Bataev 8af69975af [InstCombine][NFC]Use only `replaceInstUsesWith`, NFC. 2021-07-08 13:58:30 -07:00
David Blaikie 1def2579e1 PR51018: Remove explicit conversions from SmallString to StringRef to future-proof against C++23
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
2021-07-08 13:37:57 -07:00
Vitaly Buka 915e07605c [msan] Handle funnel shifts
Fixes https://bugs.llvm.org/show_bug.cgi?id=50840

Differential Revision: https://reviews.llvm.org/D105387
2021-07-08 12:49:49 -07:00
Alexey Bataev c574d2fbac [SLP]Improve vectorization of stores.
Patch tries to improve the vectorization of stores. Originally, we just
check the type and the base pointer of the store.
Patch adds some extra checks to avoid non-profitable vectorization
cases. It includes analysis of the scalar values to be stored and
triggers the vectorization attempt only if the scalar values have
same/alt opcode and are from same basic block, i.e. we don't end up
immediately with the gather node, which is not profitable.
This also improves compile time by filtering out non-profitable cases.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D104122
2021-07-08 12:35:39 -07:00
Alexey Bataev 0d74fd3fdf [SLP][COST][X86]Improve cost model for masked gather.
Revived D101297 in its original form + added some changes in X86
legalization cehcking for masked gathers.

This solution is the most stable and the most correct one. We have to
check the legality before trying to build the masked gather in SLP.
Without this check we have incorrect cost (for SLP) in case if the masked gather
is not legal/slower than the gather. And we're missing some
vectorization opportunities.

This can be fixed in the cost model, but in this case we need to add
special checks for the cost of GEPs for ScatterVectorize node, add
special check for small trees, etc., i.e. there are a lot of corner
cases here and there, which insrease code base and make it harder to
maintain the code.

> Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed.

The question from D101297. Actually, no, it can't. Actually, simple
gather may give us better result, especially after we started
vectorization of insertelements. Plus, like I said before, the cost for
non-legal masked gathers leads to missed vectorization opportunities.

Differential Revision: https://reviews.llvm.org/D105042
2021-07-08 11:53:30 -07:00
Alexey Bataev b5113bff46 [Instcombine]Transform reduction+(sext/zext(<n x i1>) to <n x im>) to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im.
Some of the SPEC tests end up with reduction+(sext/zext(<n x i1>) to <n x im>) pattern, which can be transformed to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im.
Also, reduction+(<n x i1>) can be transformed to ctpop(bitcast <n x i1> to in) & 1 != 0.

Differential Revision: https://reviews.llvm.org/D105587
2021-07-08 07:56:41 -07:00
Michael Liao 4e5d9c8803 [Internalize] Preserve variables externally initialized.
- ``externally_initialized`` variables would be initialized or modified
  elsewhere. Particularly, CUDA or HIP may have host code to initialize
  or modify ``externally_initialized`` device variables, which may not
  be explicitly referenced on the device side but may still be used
  through the host side interfaces. Not preserving them triggers the
  elimination of them in the GlobalDCE and breaks the user code.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D105135
2021-07-08 10:48:19 -04:00
Nikita Popov 84c15bc018 [SCEVExpander] Support opaque pointers
This adds support for opaque pointers to expandAddToGEP() by always
generating an i8 GEP for opaque pointers. After looking at some other
cases (constexpr GEP folding, SROA GEP generation), I've come around
to the idea that we should use i8 GEPs for opaque pointers, because
the alternative would be to guess a GEP type from surrounding code,
which will not be reliable. Ultimately, i8 GEPs is where we want to
end up anyway, and opaque pointers just make that the natural choice.

There are a couple of other places in SCEVExpander that check pointer
element types, I plan to update those when I run across usable test
coverage that doesn't assert elsewhere.

Differential Revision: https://reviews.llvm.org/D105398
2021-07-07 20:47:59 +02:00
Sanjay Patel 97c473ad39 [SLP] rename variable to not be misleading; NFC
The reduction matching was probably only dealing with binops
when it was written, but we have now generalized it to handle
select and intrinsics too, so assert on that too.
2021-07-07 14:40:21 -04:00
Arnold Schwaighofer 2937f8d148 [coro async] Cap the alignment of spilled values (vs. allocas) at the max frame alignment
Before this patch we would normally use the ABI alignment which can be
to high for the context alginment.

For spilled values we don't need ABI alignment, since the frame entry's
address is not escaped.

rdar://79664965

Differential Revision: https://reviews.llvm.org/D105288
2021-07-07 08:06:25 -07:00
Philip Reames 723144665b [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4)
Resubmit after the following changes:

* Fix a latent bug related to unrolling with required epilogue (see e49d65f). I believe this is the cause of the prior PPC buildbot failure.
* Disable non-latch exits for epilogue vectorization to be safe (9ffa90d)
* Split out assert movement (600624a) to reduce churn if this gets reverted again.

Previous commit message (try 3)

Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-07-07 07:44:35 -07:00
Dylan Fleming 7215dcfe36 [SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths
Depends on D104239

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105341
2021-07-07 15:30:10 +01:00
Arnold Schwaighofer 033de11150 [coro async] Move code to proper switch
While upstreaming patches this code somehow was applied to the wrong switch statement.

Differential Revision: https://reviews.llvm.org/D105504
2021-07-07 06:19:08 -07:00
Max Kazantsev 19885c7adf [NFC] Remove duplicate function calls
Removed repeated call of L->getHeader(). Now using previously stored return value.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D105535
Reviewed By: mkazantsev
2021-07-07 17:02:36 +07:00
Dylan Fleming 7586b47fb6 [SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104239
2021-07-07 09:58:05 +01:00
Johannes Doerfert 168a9234d7 [Attributor][FIX] Replace uses first, then values
Before we replaced value by registering all their uses. However, as we
replace a value old uses become stale. We now replace values explicitly
and keep track of "new values" when doing so to avoid replacing only
uses in stale/old values but not their replacements.
2021-07-06 22:43:51 -05:00
Johannes Doerfert aa3768278d [Attributor] Introduce a helper function to deal with undef + none
We often need to deal with the value lattice that contains none and
undef as special values. A simple helper makes this much nicer.

Differential Revision: https://reviews.llvm.org/D103857
2021-07-06 22:41:21 -05:00
Johannes Doerfert fc82409b5c [Attributor] Simplify operands inside of simplification AAs first
When we do simplification via AAPotentialValues or AAValueConstantRange
we need to simplify the operands of an instruction we deconstruct first.
This does not only improve the result, see for example range.ll, but is
required as we allow outside AAs to provide simplification rules via
callbacks. If we do ignore the simplification rules and base other
simplifications on the IR instead we can create an inconsistent state.
2021-07-06 22:41:18 -05:00
Eli Friedman 7ac1c7bead Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Recommitting with fix to MemoryDepChecker::isDependent.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 12:16:05 -07:00
Eli Friedman a6d081b2cb Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers."
This reverts commit 74d6ce5d5f.

Seeing crashes on buildbots in MemoryDepChecker::isDependent.
2021-07-06 11:17:13 -07:00
Philip Reames 9ffa90d6c2 [LV] Disable epilogue vectorization for non-latch exits
When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed.  Better late (4 months), than never right?

I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error.  Thus, let's play it safe for now.
2021-07-06 10:57:10 -07:00
Philip Reames 600624a103 [LoopVersion] Move an assert [nfc-ish] 2021-07-06 10:57:10 -07:00
Eli Friedman 74d6ce5d5f [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 10:54:41 -07:00
Arnold Schwaighofer 846a530e7d Fix coro lowering of single predecessor phis
Code assumes that uses of single predecessor phis are not live accross
suspend points. Cleanup any single predecessor phis preceeding the code
making this assumption.

rdar://76020301

Differential Revision: https://reviews.llvm.org/D105488
2021-07-06 10:22:25 -07:00
Alexey Bataev 4e1a0684f1 [SLP]Fix non-determinism in PHI sorting.
Compare type IDs and DFS numbering for basic block instead of addresses
to fix non-determinism.

Differential Revision: https://reviews.llvm.org/D105031
2021-07-06 08:45:45 -07:00
Arnold Schwaighofer 130ea3ceb4 Use swift mangling for resume functions
The resume partial functions generated for swift suspend points will now
use a Swift mangling suffix.

Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_")
and suspend resume partial functions will use the suffix 'TY'[0-9]+'_'
(e.g "...TY1_").

Reviewed By: nate_chandler

Differential Revision: https://reviews.llvm.org/D104144
2021-07-06 08:27:46 -07:00
Florian Hahn ef0d147cdc
Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups.
This reverts commit 706bbfb35b.

The committed version moves the definition of VPReductionPHIRecipe out
of an ifdef only intended for ::print helpers. This should resolve the
build failures that caused the revert
2021-07-06 14:15:42 +01:00
Kerry McLaughlin a7512401e5 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Florian Hahn 706bbfb35b
Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups
This reverts commit 3fed6d443f,
bbcbf21ae6 and
6c3451cd76.

The changes causing build failures with certain configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio

    lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction*, llvm::ArrayRef<llvm::VPValue*>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]':
    LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe'
    collect2: error: ld returned 1 exit status
2021-07-06 12:10:03 +01:00
Florian Hahn 3fed6d443f
[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual.
VPReductionRecipe overrides those implementations. Mark them as virtual
in the VPWidenPHIRecipe to unbreak build in certain configurations.
2021-07-06 12:00:41 +01:00
Florian Hahn bbcbf21ae6
[VPlan] Add destructor to VPReductionRecipe to unbreak build.
Attempt to unbreak
https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio
2021-07-06 11:41:20 +01:00
Florian Hahn 6c3451cd76
[VPlan] Add VPReductionPHIRecipe (NFC).
This patch is a first step towards splitting up VPWidenPHIRecipe into
separate recipes for the 3 distinct cases they model:

    1. reduction phis,
    2. first-order recurrence phis,
    3. pointer induction phis.

This allows untangling the code generation and allows us to reduce the
reliance on LoopVectorizationCostModel during VPlan code generation.

Discussed/suggested in D100102, D100113, D104197.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104989
2021-07-06 11:25:28 +01:00
Kerry McLaughlin 17b701c43c [LV] Collect a list of all element types found in the loop (NFC)
Splits `getSmallestAndWidestTypes` into two functions, one of which now collects
a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not
have to iterate over all instructions in the loop again in other places, such as in D102253
which disables scalable vectorization of a loop if any of the instructions use invalid types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105437
2021-07-06 10:37:41 +01:00
Akira Hatanaka 28fe9afdba [ObjC][ARC] Prevent moving objc_retain calls past objc_release calls
that release the retained object

This patch fixes what looks like a longstanding bug in ARC optimizer
where it reverses the order of objc_retain calls and objc_release calls
that retain and release the same object.

The code in ARC optimizer that is responsible for code motion takes the
following steps:

1. Traverse the CFG bottom-up and determine how far up objc_release
   calls can be moved. Determine the insertion points for the
   objc_release calls, but don't actually move them.
2. Traverse the CFG top-down and determine how far down objc_retain
   calls can be moved. Determine the insertion points for the
   objc_retain calls, but don't actually move them.
3. Try to move the objc_retain and objc_release calls if they can't be
   removed.

The problem is that the insertion points for the objc_retain calls are
determined in step 2 without taking into consideration the insertion
points for objc_release calls determined in step 1, so the order of an
objc_retain call and an objc_release call can be reversed, which is
incorrect, even though each step is correct in isolation.

To fix this bug, this patch teaches the top-down traversal step to take
into consideration the insertion points for objc_release calls
determined in the bottom-up traversal step. Code motion for an
objc_retain call is disabled if there is a possibility that it can be
moved past an objc_release call that releases the retained object.

rdar://79292791

Differential Revision: https://reviews.llvm.org/D104953
2021-07-05 12:16:15 -07:00
Sanjay Patel 40b752d28d [InstCombine] fold icmp slt/sgt of offset value with constant
This follows up patches for the unsigned siblings:
0c400e8953
c7b658aeb5

We are translating an offset signed compare to its
unsigned equivalent when one end of the range is
at the limit (zero or unsigned max).

(X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1)
(X + C2) <s C --> X >u (C ^ SMAX) (if C == C2)

This probably does not show up much in IR derived
from C/C++ source because that would likely have
'nsw', and we have folds for that already.

As with the previous unsigned transforms, the folds
could be generalized to handle non-constant patterns:

https://alive2.llvm.org/ce/z/Y8Xrrm

  ; sgt
  define i1 @src(i8 %a, i8 %c) {
    %c2 = add i8 %c, 1
    %t = add i8 %a, %c2
    %ov = icmp sgt i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c) {
    %c_off = sub i8 127, %c ; SMAX
    %ov = icmp ult i8 %a, %c_off
    ret i1 %ov
  }

https://alive2.llvm.org/ce/z/c8uhnk

  ; slt
  define i1 @src(i8 %a, i8 %c) {
    %t = add i8 %a, %c
    %ov = icmp slt i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c) {
    %c_offnot = xor i8 %c, 127 ; SMAX
    %ov = icmp ugt i8 %a, %c_offnot
    ret i1 %ov
  }
2021-07-05 10:08:31 -04:00
Caroline Concatto b868a2d2c6 [SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector.
The function vectorizeChainsInBlock does not support scalable vector,
because function like canReuseExtract and isCommutative in the code
path assert with scalable vectors.

This patch avoids vectorizing blocks that have extract instructions with scalable
vector..

Differential Revision: https://reviews.llvm.org/D104809
2021-07-05 12:43:41 +01:00
Stephen Tozer 14b62f7e2f [DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.

This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.

Differential Revision: https://reviews.llvm.org/D105129
2021-07-05 10:35:19 +01:00
Nikita Popov a213f735d8 [IR] Deprecate GetElementPtrInst::CreateInBounds without element type
This API is not compatible with opaque pointers, the method
accepting an explicit pointer element type should be used instead.

Thankfully there were few in-tree users. The BPF case still ends
up using the pointer element type for now and needs something like
D105407 to avoid doing so.
2021-07-04 16:49:30 +02:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Nikita Popov fabc17192e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Roman Lebedev fc150cecd7
[SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable
This replaces the current ad-hoc implementation,
by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`,
with one exception that here in SimplifyCFG we are allowed to remove EH instructions.

Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return),
arithmetic/logic operations, etc.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D105374
2021-07-03 10:45:44 +03:00
Fangrui Song 252a1eecc0 [ThinLTO] Respect ClearDSOLocalOnDeclarations for unimported functions
D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for
isDeclarationForLinker `GlobalValue`s. It missed a case for imported
declarations (`doImportAsDefinition` is false while `isPerformingImport` is
true). This can lead to a linker error for a default visibility symbol in
`ld.lld -shared`.

When `ClearDSOLocalOnDeclarations` is true, we check
`isPerformingImport() && !doImportAsDefinition(&GV)` along with
`GV.isDeclarationForLinker()`. The new condition checks an imported declaration.

This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104986
2021-07-02 17:08:25 -07:00
Roman Lebedev 53fef0b293
[NFCI][SimplifyCFG] simplifyUnreachable(): Use poison constant to represent the result of unreachable instrs
Mimics similar change for InstCombine:
  ce192ced2b / D104602

All these uses are in blocks that aren't reachable from function's entry,
and said blocks are removed by SimplifyCFG itself,
so we can't really test this change.
2021-07-02 22:11:52 +03:00
Heejin Ahn 51fecd17bb [InstCombine] Don't combine PHI before catchswitch
This tries to bail out if the PHI is in a `catchswitch` BB in
InstCombine. A PHI cannot be combined into a non-PHI instruction if it
is in a `catchswitch` BB, because `catchswitch` BB cannot have any
non-PHI instruction other than `catchswitch` itself.

The given test case started crashing after D98058.

Reviewed By: lebedev.ri, rnk

Differential Revision: https://reviews.llvm.org/D105309
2021-07-02 12:10:24 -07:00
Roman Lebedev da81ec6158
[SimplifyCFG] Volatile memory operations do not trap
Somewhat related to D105338.
While it is up for discussion whether or not volatile store traps,
so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap.
And even if simplifycfg currently concervatively believes that to be the case,
instcombine does not: https://godbolt.org/z/5vhv4K5b8

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D105343
2021-07-02 21:47:44 +03:00
Alexey Bataev 7f7e4aed21 [SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by
V.Dmitriev.

Reduces number of arguments
2021-07-02 10:30:13 -07:00
Jon Roelofs 37b6e03c18 [Intrinsics] Make MemCpyInlineInst a MemCpyInst
This opens up more optimization opportunities in passes that already handle MemCpyInst's.

Differential revision: https://reviews.llvm.org/D105247
2021-07-02 10:25:24 -07:00
Roman Lebedev 13e35ac124
[NFC][InstCombine] visitUnreachableInst(): enhance comments somewhat 2021-07-02 17:30:01 +03:00
Roman Lebedev dadedc99e9
[InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable
In the original review D87149 it was mentioned that this approach was tried,
and it lead to infinite combine loops, but i'm not seeing anything like that now,
neither in the `check-llvm`, nor on some codebases i tried.

This is a recommit of d9d65527c2,
which i immediately reverted because i have messed up something
during branch switch, and 597ccc92ce
accidentally ended up being pushed, which was very much not the intention.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D105339
2021-07-02 17:20:21 +03:00
Roman Lebedev 24d271bb18
Revert "https://godbolt.org/z/5vhv4K5b8"
This reverts commit 597ccc92ce.
2021-07-02 17:17:55 +03:00
Roman Lebedev 93a1642763
Revert "[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable"
This reverts commit d9d65527c2.
2021-07-02 17:17:47 +03:00
Roman Lebedev d9d65527c2
[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable
In the original review D87149 it was mentioned that this approach was tried,
and it lead to infinite combine loops, but i'm not seeing anything like that now,
neither in the `check-llvm`, nor on some codebases i tried.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D105339
2021-07-02 17:17:03 +03:00
Roman Lebedev 597ccc92ce
https://godbolt.org/z/5vhv4K5b8 2021-07-02 17:16:19 +03:00
Nico Weber a92964779c Revert "[InstrProfiling] Use external weak reference for bias variable"
This reverts commit 33a7b4d9d8.
Breaks check-profile on macOS, see comments on https://reviews.llvm.org/D105176
2021-07-02 09:05:12 -04:00
Florian Hahn a3ca578eb9
[Matrix] Fix crash during fusion if the same load is re-used.
This patch fixes a crash when the same load is used for both operands of
a fuseable multiply.
2021-07-02 14:00:17 +01:00
Alexey Bataev 28ac873bcb [SLP]Fix gathering of the scalars by not ignoring UndefValues.
The compiler should not ignore UndefValue when gathering the scalars,
otherwise the resulting code may be less defined than the original one.
Also, grouped scalars to insert them at first to reduce the analysis in
further passes.

Differential Revision: https://reviews.llvm.org/D105275
2021-07-02 04:46:48 -07:00
Florian Hahn 7655061cc6
[Matrix] Hoist address computation before multiply to enable fusion.
If the store address does not dominate the matrix multiply, try to hoist
address computation instructions without side-effects and/or memory
reads before the multiply, to allow fusion.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D105193
2021-07-02 09:52:11 +01:00
Evgeniy Brevnov 9568811cb8 [NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure
With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D101044
2021-07-02 10:00:47 +07:00
Craig Topper 066524ea54 [ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0.
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.

Noticed while looking at the code for something else so I don't
have a test case.

Differential Revision: https://reviews.llvm.org/D105220
2021-07-01 19:08:47 -07:00
Petr Hosek 33a7b4d9d8 [InstrProfiling] Use external weak reference for bias variable
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted which leads to an issue where the
linker might choose either of the weak symbols potentially disabling the
runtime counter relocation.

This change replaces the use of weak definition inside the runtime with
an external weak reference to address the issue. We also place the
compiler generated symbol inside a COMDAT group so dead definition can
be garbage collected by the linker.

Differential Revision: https://reviews.llvm.org/D105176
2021-07-01 15:25:31 -07:00
Philip Reames 955f125899 [instcombine] Fold overflow check using overflow intrinsic to comparison
This follows up to D104665 (which added umulo handling alongside the existing uaddo case), and generalizes for the remaining overflow intrinsics.

I went to add analogous handling to LVI, and discovered that LVI already had a more general implementation. Instead, we can port was LVI does to instcombine. (For context, LVI uses makeExactNoWrapRegion to constrain the value 'x' in blocks reached after a branch on the condition `op.with.overflow(x, C).overflow`.)

Differential Revision: https://reviews.llvm.org/D104932
2021-07-01 09:41:55 -07:00
Arnold Schwaighofer 4a361f5209 [coro async] Add support for specifying which parameter is swiftself in
async resume functions

Differential Revision: https://reviews.llvm.org/D104147
2021-07-01 07:33:15 -07:00
David Sherwood 51b4ab26ca [NFC] Add new setDebugLocFromInst that uses the class Builder by default
In lots of places we were calling setDebugLocFromInst and passing
in the same Builder member variable found in InnerLoopVectorizer.
I personally found this confusing so I've changed the interface
to take an Optional<IRBuilder<> *> and we can now pass in None
when we want to use the class member variable.

Differential Revision: https://reviews.llvm.org/D105100
2021-07-01 14:23:34 +01:00
Roman Lebedev 333d3a3cdf
[NFC][PassBuilder] addVectorPasses(): clarify that 'IsLTO' is actually 'IsFullLTO'
I.e. it will be `false` for thin lto.
2021-07-01 10:09:24 +03:00
Chuanqi Xu 51fbd18706 [Coroutine] Recommit Add statistics for the number of elided coroutine
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D105095
2021-07-01 11:01:28 +08:00
Sanjay Patel 0c400e8953 [InstCombine] fold icmp ult of offset value with constant
This is one sibling of the fold added with c7b658aeb5 .

(X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN)
I'm still not sure how to describe it best, but we're
translating 2 constants from an unsigned range comparison
to signed because that eliminates the offset (add) op.

This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/K-fMBf

  define i1 @src(i8 %a, i8 %c2) {
    %t = add i8 %a, %c2
    %c = add i8 %c2, 128 ; SMIN
    %ov = icmp ult i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c2) {
    %not_c2 = xor i8 %c2, -1
    %ov = icmp sgt i8 %a, %not_c2
    ret i1 %ov
  }
2021-06-30 19:00:12 -04:00
Xun Li 822b92aae4 [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened
Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html
In the existing design, An SCC that contains a coroutine will go through the folloing passes:
Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline

The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat.
As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended.
What we really wanted is this:
Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline
(note that we don't really need to run Inliner again on the ramp function after split).

Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function.

This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually.

By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines.

Reviewed By: aeubanks, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D95807
2021-06-30 11:38:14 -07:00
Sanjay Patel c7b658aeb5 [InstCombine] fold icmp of offset value with constant
There must be a better way to describe this pattern in words?
(X + C2) >u C --> X <s -C2 (if C == C2 + SMAX)

This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/rdfNFP

  define i1 @src(i8 %a, i8 %c1) {
    %t = add i8 %a, %c1
    %c2 = add i8 %c1, 127 ; SMAX
    %ov = icmp ugt i8 %t, %c2
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c1) {
    %neg_c1 = sub i8 0, %c1
    %ov = icmp slt i8 %a, %neg_c1
    ret i1 %ov
  }

The pattern was noticed as a by-product of D104932.
2021-06-30 13:37:31 -04:00
Philip Reames c4fc2cb5b2 [instcombine] umin(x, 1) == zext(x != 0)
We already implemented this for the select form, but the intrinsic form was missing.  Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.
2021-06-30 10:20:01 -07:00
Joseph Huber ecabc6684f [OpenMP] Change analysis remarks to not emit on cold functions
The remarks will trigger on some functions that are marked cold, such as the
`__muldc3` intrinsic functions. Change the remarks to avoid these functions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105196
2021-06-30 11:54:24 -04:00
Nico Weber db86e5c914 Revert "[Coroutine] Add statistics for the number of elided coroutine"
This reverts commit 1d9539cf49.
Test fails in LLVM_ENABLE_ASSERTIONS=OFF builds (such as regular
release builds).
2021-06-30 10:22:45 -04:00
Joseph Huber 0edb87773b [OpenMP] Add additional remarks for OpenMPOpt
This patch adds additional remarks, suggesting the use of `noescape` for failed
globalization and indicating when internalization failed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105150
2021-06-30 09:49:25 -04:00
David Sherwood 7b7b5b5a26 [NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-30 11:11:49 +01:00
Chuanqi Xu 801c2b9bba [FuncSpec] Add an option to specializing literal constant
Now the option is off by default. Since we are not sure if this option
would make the compile time increase aggressively. Although we tested it
on SPEC2017, we may need to test more to make it on by default.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D104365
2021-06-30 11:26:44 +08:00
Chuanqi Xu 1d9539cf49 [Coroutine] Add statistics for the number of elided coroutine
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D105095
2021-06-30 11:20:53 +08:00
Jianzhou Zhao ae6648cee0 [dfsan] Expose dfsan_get_track_origins to get origin tracking status
This allows application code checks if origin tracking is on before
printing out traces.

-dfsan-track-origins can be 0,1,2.
The current code only distinguishes 1 and 2 in compile time, but not at runtime.
Made runtime distinguish 1 and 2 too.

Reviewed By: browneee

Differential Revision: https://reviews.llvm.org/D105128
2021-06-29 20:32:39 +00:00
Nikita Popov c4de78e91c [SanitizerCoverage] Fix global type check with opaque pointers
The code was previously relying on the fact that an incorrectly
typed global would result in the insertion of a BitCast constant
expression. With opaque pointers, this is no longer the case, so
we should check the type explicitly.
2021-06-29 20:32:14 +02:00
Alexey Bataev 129ae515fb [INSTCOMBINE] Transform reduction(shuffle V, poison, unique_mask) to reduction(V).
After SLP + LTO we may have have reduction(shuffle V, poison,
mask). This can be simplified to just reduction(V) if the mask is only
for single vector and just all elements from this vector are permuted,
  without reusing, replacing with undefs and/or other values, etc.

Differential Revision: https://reviews.llvm.org/D105053
2021-06-29 10:02:38 -07:00
Philip Reames e49d65f36d [LV] Fix bug when unrolling (only) a loop with non-latch exit
If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so.

The included test case on an unmodified opt will access memory one past the expected bound.  As a result, this patch is fixing a latent miscompile.

Differential Revision: https://reviews.llvm.org/D103700
2021-06-29 08:04:26 -07:00
Johannes Doerfert 7af91a2b8f [Attributor][NFCI] Make the state of AAValueSimplify explicit
As we have done with other states we want the AAValueSimplify state to
be explicit to use it more easily in our helpers.
2021-06-29 09:38:22 -05:00
Johannes Doerfert dcbe58d94c [Attributor][NFCI] Remove unneeded namespace 2021-06-29 09:38:20 -05:00
Johannes Doerfert 457bd5c8d5 [Attributor] Teach AAPotentialValues about constant select conditions
There was a TODO but now we actually check if the select condition is
assumed constant and only look at the relevant operand.
2021-06-29 09:38:18 -05:00
Johannes Doerfert 8dc9bb6d85 [Attributor][NFC] Clang format 2021-06-29 09:38:15 -05:00
Johannes Doerfert a33e128012 [InstCombine] Gracefully handle an alloca outside the alloca-AS
While we might eventually want to disallow allocas that do not have the
alloca-AS set, it seems undesirable to crash on them. Add a cast when
required so that we can support such allocas (at least here).

Differential Revision: https://reviews.llvm.org/D104866
2021-06-29 09:38:13 -05:00
David Sherwood 9de63367d8 Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable"
This reverts commit 9dde514162.
2021-06-29 15:20:22 +01:00
David Sherwood 9dde514162 [NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-29 14:34:30 +01:00
David Sherwood 8a3365fba2 Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable"
This reverts commit dcfc2c3fac.
2021-06-29 14:04:42 +01:00
Florian Hahn 47215e1c62
[LV] Fix crash when target instruction for sinking is dead.
This patch fixes a crash when the target instruction for sinking is
dead. In that case, no recipe is created and trying to get the recipe
for it results in a crash. To ensure all sink targets are alive, find &
use the first previous alive instruction.

Note that the case where the sink source is dead is already handled.

Found by
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104603
2021-06-29 13:31:22 +01:00
David Sherwood 303b6d5e98 [LoopVectorize] Add support for scalable vectorization of invariant stores
Previously in setCostBasedWideningDecision if we encountered an
invariant store we just assumed that we could scalarize the store
and called getUniformMemOpCost to get the associated cost.
However, for scalable vectors this is not an option because it is
not currently possibly to scalarize the store. At the moment we
crash in VPReplicateRecipe::execute when trying to scalarize the
store.

Therefore, I have changed setCostBasedWideningDecision so that if
we are storing a scalable vector out to a uniform address and the
target supports scatter instructions, then we should use those
instead.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-inv-store.ll

Differential Revision: https://reviews.llvm.org/D104624
2021-06-29 11:56:09 +01:00
Roman Lebedev 6cf6f6f65f
[NFC][InstCombine] foldAggregateConstructionIntoAggregateReuse(): cast to Instruction eagerly
In all of these, the value must be an instruction for us to succeed anyway,
so change it to maybe hopefully make further changes more straight-forward.
2021-06-29 13:29:18 +03:00
David Sherwood dcfc2c3fac [NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-29 09:14:35 +01:00
Sanjay Patel 9d0bf7699c [InstCombine] don't try to fold a constant expression that can trap (PR50906)
We could use a bigger hammer and bail out on any constant
expression, but there's a regression test that appears to
validly do the transform (although it may not have been
intending to check that optimization).
2021-06-28 17:00:21 -04:00
Joseph Huber 57ad2e1067 [OpenMP] Prevent OpenMPOpt from internalizing uncalled functions
Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary.

Depends on D102423

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104890
2021-06-28 16:47:53 -04:00
Nikita Popov 7ac0442fe5 [SanitizerCoverage] Support opaque pointers
Pass element type rather than pointer type to some functions, so
we know which type to use for the global variables.
2021-06-28 22:18:42 +02:00
Arnold Schwaighofer 3dee1e8a84 [coro] Fix rematerializable instruction sinking to coro.suspend blocks
There is a constraint that coro.suspend instructions need to be in their
own blocks. The coro split pass initially creates IR that obeys this constraint
(which is later checked). Sinking rematerializable instructions into these
blocks breaks that constraint.

Instead rematerialize in the predecessor block to the suspend's single
predecessor block.

Differential Revision: https://reviews.llvm.org/D104051
2021-06-28 09:37:45 -07:00
Nico Weber 540b4a5fb3 Revert "[DebugInfo] Enable variadic debug value salvaging"
This reverts commit adace79652.
Still breaks things, see comment on https://reviews.llvm.org/D91722
2021-06-28 11:25:09 -04:00
Reshabh Sharma ae983de6cc [InferAddressSpaces] NFC: For noop IntToPtr/PtrToInt pair cast to operator instead of PtrToInt
Compiler crashes at an assertion while casting operands to PtrToIntInst at some cases when
ptrtoint is present as an explicit operand to inttoptr. Explicit instruction operator as
operand can not be casted to an Instruction.

This patch replaces cast from PtrToInst to Operator which are later checked for constant
expressions.

Differential Revision: https://reviews.llvm.org/D105002
2021-06-28 19:24:26 +05:30
Joseph Huber 13b2fba239 [OpenMP][NFC] Fix typo in OpenMPOpt 2021-06-28 09:49:14 -04:00
Joseph Huber 4024087731 [OpenMP][NFC] Fix missing argument 2021-06-28 09:15:01 -04:00
Joseph Huber 4a6bd8e3e7 [OpenMP] Increase attributor iterations on the GPU
Increase the number of attributor iterations on a GPU target. I forgot to
change this in D104416.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104920
2021-06-28 08:50:49 -04:00
Kerry McLaughlin f99672568f [LoopVectorize] Fix strict reductions where VF = 1
Currently we will allow loops with a fixed width VF of 1 to vectorize
if the -enable-strict-reductions flag is set. However, the loop vectorizer
will not use ordered reductions if `VF.isScalar()` and the resulting
vectorized loop will be out of order.

This patch removes `VF.isVector()` when checking if ordered reductions
should be used. Also, instead of converting the FAdds to reductions if the
VF = 1, operands of the FAdds are changed such that the order is preserved.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D104533
2021-06-28 11:27:10 +01:00
Florian Hahn 80aa7e147e
[VPlan] Merge predicated-triangle regions, after sinking.
Sinking scalar operands into predicated-triangle regions may allow
merging regions. This patch adds a VPlan-to-VPlan transform that tries
to merge predicate-triangle regions after sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100260
2021-06-28 11:10:38 +01:00
Max Kazantsev d58514d41c [LSR][NFC] Make sure that after the canonicalization the formula is canonical 2021-06-28 12:50:04 +07:00
Max Kazantsev 7c73c2ede8 [LoopDeletion] Benefit from branches by undef conditions when symbolically executing 1st iteration
We can exploit branches by `undef` condition. Frankly, the LangRef says that
such branches are UB, so we can assume that all outgoing edges of such blocks
are dead.

However, from practical perspective, we know that this is not supported correctly
in some other places. So we are being conservative about it.

Branch by undef is treated in the following way:
- If it is a loop-exiting branch, we always assume it exits the loop;
- If not, we arbitrarily assume it takes `true` value.

Differential Revision: https://reviews.llvm.org/D104689
Reviewed By: nikic
2021-06-28 11:39:46 +07:00
Nikita Popov e81702912e [DSE] Preserve address space
Preserve address space when inserting i8* cast.
2021-06-27 20:26:00 +02:00
Nikita Popov 9aa951e80e [MemCpyOpt] Preserve address space
Preserve address space when generating the cast to i8*.
2021-06-27 20:21:19 +02:00
Nikita Popov f00941e061 [DSE] Support opaque pointers
For the start shortening optimization, always use a i8 type for
the GEP, as it is a raw offset calculation.

Handling of non-i8* memset/memcpy arguments requires insertion
of casts. These cases were previously miscompiled, as the offset
calculation was performed on the wrong type.
2021-06-27 17:41:40 +02:00
Nikita Popov f025053977 [MemCpyOpt] Handle unusual memcpy element type
Apparently, it is legal to use memcpy/memset with pointer types
other than i8*. Prior to 81fcdae68c
this case was silently miscompiled, as the i8 offset calculation
was performed on some other type. Now it would crash due to a
type mismatch. Fix this by inserting an explicit bitcast to i8*.
2021-06-27 16:21:44 +02:00
Sanjay Patel 153da08a6c [InstCombine] hoist min/max intrinsics above select with constant op
This is an extension of the handling for unary intrinsics and
follows the logic that we use for binary ops.

We don't canonicalize to min/max intrinsics yet, but this might
help unlock other folds seen in D98152.
2021-06-27 10:02:23 -04:00
Nikita Popov 81fcdae68c [MemCpyOpt] Support opaque pointers 2021-06-27 15:52:38 +02:00
Nikita Popov a9129f8964 [LoadStoreVectorizer] Support opaque pointers
There are remaining redundant bitcasts.
2021-06-27 15:42:16 +02:00
Florian Hahn f1a6430272
[VPlan] Track both incoming values for first-order recurrence phis.
This patch updates VPWidenPHI recipes for first-order recurrences to
also track the incoming value from the back-edge. Similar to D99294,
which did the same for reductions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104197
2021-06-27 14:29:35 +01:00
Florian Hahn 7f36981977
[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC).
Suggested in D104197, avoids the early exit.
2021-06-26 13:13:25 +01:00
Andrew Browne 45f6d5522f [DFSan] Change shadow and origin memory layouts to match MSan.
Previously on x86_64:

  +--------------------+ 0x800000000000 (top of memory)
  | application memory |
  +--------------------+ 0x700000008000 (kAppAddr)
  |                    |
  |       unused       |
  |                    |
  +--------------------+ 0x300000000000 (kUnusedAddr)
  |       origin       |
  +--------------------+ 0x200000008000 (kOriginAddr)
  |       unused       |
  +--------------------+ 0x200000000000
  |   shadow memory    |
  +--------------------+ 0x100000008000 (kShadowAddr)
  |       unused       |
  +--------------------+ 0x000000010000
  | reserved by kernel |
  +--------------------+ 0x000000000000

  MEM_TO_SHADOW(mem) = mem & ~0x600000000000
  SHADOW_TO_ORIGIN(shadow) = kOriginAddr - kShadowAddr + shadow

Now for x86_64:

  +--------------------+ 0x800000000000 (top of memory)
  |    application 3   |
  +--------------------+ 0x700000000000
  |      invalid       |
  +--------------------+ 0x610000000000
  |      origin 1      |
  +--------------------+ 0x600000000000
  |    application 2   |
  +--------------------+ 0x510000000000
  |      shadow 1      |
  +--------------------+ 0x500000000000
  |      invalid       |
  +--------------------+ 0x400000000000
  |      origin 3      |
  +--------------------+ 0x300000000000
  |      shadow 3      |
  +--------------------+ 0x200000000000
  |      origin 2      |
  +--------------------+ 0x110000000000
  |      invalid       |
  +--------------------+ 0x100000000000
  |      shadow 2      |
  +--------------------+ 0x010000000000
  |    application 1   |
  +--------------------+ 0x000000000000

  MEM_TO_SHADOW(mem) = mem ^ 0x500000000000
  SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000

Reviewed By: stephan.yichao.zhao, gbalats

Differential Revision: https://reviews.llvm.org/D104896
2021-06-25 17:00:38 -07:00
Nikita Popov fdd4c199a1 Revert "[InstCombine] Make indexed compare fold opaque ptr compatible"
This reverts commit 5cb20ef8a2.

Assertion failures with this patch were reported on
https://reviews.llvm.org/rG5cb20ef8a235, revert for now.
2021-06-26 00:32:59 +02:00
Eli Friedman 8d5bf0709d [NFC] Prefer ConstantRange::makeExactICmpRegion over makeAllowedICmpRegion
The implementation is identical, but it makes the semantics a bit more
obvious.
2021-06-25 14:43:13 -07:00
Juneyoung Lee 1605593440 [SimplifyLibCalls] Fix memchr opt to use CreateLogicalAnd
This fixes a bug at LibCallSimplifier::optimizeMemChr which does the following transformation:

```
// memchr("\r\n", C, 2) != nullptr -> (1 << C & ((1 << '\r') | (1 << '\n')))
// != 0
//   after bounds check.
```

As written above, a bounds check on C (whether it is less than integer bitwidth) is done before doing `1 << C` otherwise 1 << C will overflow.
If the bounds check is false, the result of (1 << C & ...) must not be used at all, otherwise the result of shift (which is poison) will contaminate the whole results.
A correct way to encode this is `select i1 (bounds check), (1 << C & ...), false`  because select does not allow the unused operand to contaminate the result.
However, this optimization was introducing `and (bounds check), (1 << C & ...)` which cannot do that.

The bug was found from compilation of this C++ code: https://reviews.llvm.org/rG2fd3037ac615#1007197

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104901
2021-06-26 05:59:35 +09:00
Joseph Huber 5ccb7424fa [OpenMP] Change OpenMPOpt to check openmp metadata
The metadata added in D102361 introduces a module flag that we can check
to determine if the module was compiled with `-fopenmp` enables. We can
now check for the precense of this instead of scanning the call graph
for OpenMP runtime functions.

Depends on D102361

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102423
2021-06-25 16:34:22 -04:00
Hongtao Yu 3638085ff0 [Coroutines] Define __coro_frame_ty in function scope
Types should be defined in function scope instead of a local lexical scope. Field types should be defined inside in its parent type scope.

We were seeing a type defined in a local scope causing trouble to the dwarf emitter where a context is required to be a funciton scope, a namespace or a global scope.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D104937
2021-06-25 12:33:20 -07:00
Florian Hahn cc5ee857f9
[LV] Doxygenize VectorizationFactor member comments (NFC).
Minor cleanup for follow-up patch.
2021-06-25 18:35:00 +01:00
Philip Reames 2cd23eb243 [instcombine] Fold overflow check using umulo to comparison
If we have a umul.with.overflow where the multiply result is not used and one of the operands is a constant, we can perform the overflow check cheaper with a comparison then by performing the multiply and extracting the overflow flag.

(Noticed when looking at the conditions SCEV emits for overflow checks.)

Differential Revision: https://reviews.llvm.org/D104665
2021-06-25 10:25:45 -07:00
Florian Hahn 91053e327c
[LV] Reflow comment for VectorizationCostTy (NFC). 2021-06-25 14:20:06 +01:00
Arthur Eubanks 1aa02b37e7 Revert "[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions."
This reverts commit 1eda5453f2.

Causes https://crbug.com/1223647:
Incompatible argument and return types for 'returned' attribute
  tail call void @llvm.memset.p0i8.i64(i8* noalias noundef returned writeonly align 1 dereferenceable(255) %arraydecay, i8 0, i64 255, i1 false), !dbg !985
2021-06-24 19:24:34 -07:00
Nikita Popov 5cb20ef8a2 [InstCombine] Make indexed compare fold opaque ptr compatible
Rather than relying on pointer type equality (which, for a change,
is silently incorrect with opaque pointers) check that the GEP
source element types match.
2021-06-24 22:33:01 +02:00
Arthur Eubanks 7110510eca [WPD] Don't optimize calls more than once
WPD currently assumes that there is a one to one correspondence between
type test assume sequences and virtual calls. However, with
-fstrict-vtable-pointers this may not be true. This ends up causing
crashes when we try to optimize a virtual call more than once (
applyUniformRetValOpt()/applyUniqueRetValOpt()/applyVirtualConstProp()/applySingleImplDevirt()).

applySingleImplDevirt() actually didn't previous crash because it would
replace the devirtualized call with the same direct call. Adding an
assert that the call is indirect causes the corresponding test to crash
with the rest of the patch.

This makes Chrome successfully build with -fstrict-vtable-pointers + WPD.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104798
2021-06-24 13:28:09 -07:00
Nikita Popov 8e0ff44bf8 [InstCombine] Make varargs cast transform compatible with opaque ptrs
The whole transform can be dropped once we have fully transitioned
to opaque pointers (as it's purpose is to remove no-op pointer
casts). For now, make sure that it handles opaque pointers correctly.
2021-06-24 21:57:05 +02:00
Jonas Paulsson 1eda5453f2 [BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions.
- When emitting libcalls, do not only pass the calling convention from the
  function prototype but also the attributes.

- Do not pass attributes from e.g. libc memcpy to llvm.memcpy.

Review: Reid Kleckner, Eli Friedman, Arthur Eubanks

Differential Revision: https://reviews.llvm.org/D103992
2021-06-24 14:47:24 -05:00
Roman Lebedev d064182612
[SimplifyCFG] Tail-merging all blocks with `resume` terminator
Similar to what we already do for `ret` terminators.
As noted by @rnk, clang seems to already generate a single `ret`/`resume`,
so this isn't likely to cause widespread changes.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104849
2021-06-24 21:25:06 +03:00
Florian Hahn 833bdbe93c
[LV] Support sinking recipe in replicate region after another region.
This patch handles sinking a replicate region after another replicate
region. In that case, we can connect the sink region after the target
region. This properly handles the case for which an assertion has been
added in 337d765282.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D103514
2021-06-24 13:58:42 +01:00
Stephen Tozer adace79652 [DebugInfo] Enable variadic debug value salvaging
This patch enables the salvaging of debug values that may be calculated
from more than one SSA value, such as with binary operators that do not
use a constant argument. The actual functionality for this behaviour is
added in a previous commit (c7270567), but with the ability to actually
emit the resulting debug values switched off.

The reason for this is that the prior patch has been reverted several
times due to issues discovered downstream, some time after the actual
landing of the patch. The patch in question is rather large and touches
several widely used header files, and all issues discovered are more
related to the handling of variadic debug values as a whole rather than
the details of the patch itself. Therefore, to minimize the build time
impact and risk of conflicts involved in any potential future
revert/reapply of that patch, this significantly smaller patch (that
touches no header files) will instead be used as the capstone to enable
variadic debug value salvaging.

The review linked to this patch is mostly implemented by the previous
commit, c7270567, but also contains the changes in this patch.

Differential Revision: https://reviews.llvm.org/D91722
2021-06-24 13:16:29 +01:00
Roman Lebedev 9c4c2f2472
[SimplifyCFG] Tail-merging all blocks with `ret` terminator
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104597
2021-06-24 13:15:39 +03:00
Stephen Tozer c72705678c Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.

Differential Revision: https://reviews.llvm.org/D91722

This reverts commit 386b66b2fc.
2021-06-24 09:46:38 +01:00
Zequan Wu 9393894331 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This casues compiler crash: Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"'

This reverts commit e3d24b45b8.
2021-06-23 19:24:56 -07:00
Evgenii Stepanov 78f7e6d8d7 [hwasan] Respect llvm.asan.globals.
This enable no_sanitize C++ attribute to exclude globals from hwasan
testing, and automatically excludes other sanitizers' globals (such as
ubsan location descriptors).

Differential Revision: https://reviews.llvm.org/D104825
2021-06-23 18:37:00 -07:00
Nikita Popov 8321335fd8 [InstCombine] Use getFunctionType()
Avoid fetching pointer element type...
2021-06-23 20:28:34 +02:00
Sami Tolvanen e3d24b45b8 ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

This relands commit 4474958d3a
with a fix to a use-of-uninitialized-value error that tripped
MemorySanitizer.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-06-23 10:56:13 -07:00
Kuter Dinel 5d44d56f7d [Attributor] Derive AAFunctionReachability attribute.
This attribute uses Attributor's internal 'optimistic' call graph
information to answer queries about function call reachability.

Functions can become reachable over time as new call edges are
discovered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104599
2021-06-23 20:43:10 +03:00
Nikita Popov 00d3f7cc3c [LAA] Make getPointersDiff() API compatible with opaque pointers
Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.

The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.

Differential Revision: https://reviews.llvm.org/D104784
2021-06-23 18:44:34 +02:00
Datta Nagraj ad0085d338 [InstCombine] Eliminate casts to optimize ctlz operation
If a ctlz operation is performed on higher datatype and then
downcasted, then this can be optimized by doing a ctlz operation
on a lower datatype and adding the difference bitsize to the result
of ctlz to provide the same output:

https://alive2.llvm.org/ce/z/8uup9M

The original problem is shown in
https://llvm.org/PR50173

Differential Revision: https://reviews.llvm.org/D103788
2021-06-23 11:19:12 -04:00
Sanjay Patel 1e9b6b89a7 [InstCombine] convert FP min/max with negated op to fabs
This is part of improving floating-point patterns seen in:
https://llvm.org/PR39480

We don't require any FMF because the 2 potential corner cases
(-0.0 and NaN) are correctly handled without FMF:
1. -0.0 is treated as strictly less than +0.0 with
   maximum/minimum, so fabs/fneg work as expected.
2. +/- 0.0 with maxnum/minnum is indeterminate, so
   transforming to fabs/fneg is more defined.
3. The sign of a NaN may be altered by this transform,
   but that is allowed in the default FP environment.

If there are FMF, they are propagated from the min/max call to
one or both new operands which seems to agree with Alive2:
https://alive2.llvm.org/ce/z/bem_xC
2021-06-23 10:41:39 -04:00
Roman Lebedev ff4b1d379f
[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging
This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.

This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.

Other restrictions of the transform remain.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104598
2021-06-23 14:33:18 +03:00
Joe Ellis 3c4dbf6ea9 [Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics
With regards to overrunning, the langref (llvm/docs/LangRef.rst)
specifies:

   (llvm.experimental.vector.insert)
   Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1)
   must be valid ``vec`` indices. If this condition cannot be determined
   statically but is false at runtime, then the result vector is
   undefined.

   (llvm.experimental.vector.extract)
   Elements ``idx`` through (``idx`` + num_elements(result_type) - 1)
   must be valid vector indices. If this condition cannot be determined
   statically but is false at runtime, then the result vector is
   undefined.

For the non-mixed cases (e.g. inserting/extracting a scalable into/from
another scalable, or inserting/extracting a fixed into/from another
fixed), it is possible to statically check whether or not the above
conditions are met. This was previously missing from the verifier, and
if the conditions were found to be false, the result of the
insertion/extraction would be replaced with an undef.

With regards to invalid indices, the langref (llvm/docs/LangRef.rst)
specifies:

    (llvm.experimental.vector.insert)
    ``idx`` represents the starting element number at which ``subvec``
    will be inserted. ``idx`` must be a constant multiple of
    ``subvec``'s known minimum vector length.

    (llvm.experimental.vector.extract)
    The ``idx`` specifies the starting element number within ``vec``
    from which a subvector is extracted. ``idx`` must be a constant
    multiple of the known-minimum vector length of the result type.

Similarly, these conditions were not previously enforced in the
verifier. In some circumstances, invalid indices were permitted
silently, and in other circumstances, an undef was spawned where a
verifier error would have been preferred.

This commit adds verifier checks to enforce the constraints above.

Differential Revision: https://reviews.llvm.org/D104468
2021-06-23 10:33:22 +00:00
Max Kazantsev 842b4c83cb [LoopDeletion] Exploit undef Phi inputs when symbolically executing 1st iteration
Follow-up on Roman's idea expressed in D103959.
- If a Phi has undefined inputs from live blocks:
   - and no other inputs, assume it is undef itself;
   - and exactly one non-undef input, we can assume that all undefs are equal to this input.

Differential Revision: https://reviews.llvm.org/D104618
Reviewed By: lebedev.ri, nikic
2021-06-23 11:53:48 +07:00
Max Kazantsev b7d2c173eb [LSR] Filter out zero factors. PR50765
Zero factor leads to division by zero and failure of corresponding
assert as shown in PR50765. We should filter out such factors.

Differential Revision: https://reviews.llvm.org/D104702
Reviewed By: huihuiz, reames
2021-06-23 10:43:06 +07:00
Jon Roelofs 493d6928fe [Remarks] Make memsize remarks report as an analysis, not a missed opportunity.
Differential revision: https://reviews.llvm.org/D104078
2021-06-22 18:22:47 -07:00
Liqiang Tao a0d96fdd3a [llvm][Inliner] Make PriorityInlineOrder lazily updated
This patch makes PriorityInlineOrder lazily updated.
The PriorityInlineOrder would lazily update the desirability of a call site if it's decreasing.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D104654
2021-06-23 08:59:53 +08:00
Joseph Huber 1cfdcae653 [Attributor] Fix AAExecutionDomain returning true on invalid states
This patch fixes a problem with the AAExecutionDomain attributor not
checking if it is in a valid state. This can cause it to incorrectly
return that a block is executed in a single threaded context after the
attributor failed for any reason.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103186
2021-06-22 18:12:43 -04:00
Joseph Huber 44feacc736 [OpenMP] Change remaining globalization from an analysis remark to missed
After landing the globalization optimizations, the precense of globalization on
the device that was not put in shared or stack memory is a failed optimization
with performance consequences so it should indicate a missed remark.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104735
2021-06-22 16:52:06 -04:00
Nikita Popov 7bb7fa12e7 [OpaquePtr] Support changing load type in InstCombine
When the load type is changed to ptr, we need the load pointer type
to also be ptr, because it's not allowed to create a pointer to an
opaque pointer. This is achieved by adjusting the getPointerTo() API
to return an opaque pointer for an opaque pointer base type.

Differential Revision: https://reviews.llvm.org/D104718
2021-06-22 21:16:15 +02:00
Sami Tolvanen 33c9438f11 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 4474958d3a.

Breaks check-llvm on Mac.
2021-06-22 12:10:58 -07:00
Joseph Huber ca1560da72 [OpenMP][NFC] Add new optimizations to OpenMPOpt comment header
Summary:
Adds mentions to the new globalization optimizations added to the OpenMPOpt comment header.
2021-06-22 14:40:31 -04:00
Joseph Huber b54ccab509 [Attributor] Add an option to increase the max number of iterations
Right now the Attributor defaults to 32 fixed point iterations unless it is set
explicitly by a command line flag. This patch allows this to be configured when
the attributor instance is created. The maximum is then increased in OpenMPOpt
if the target is a kernel. This is because the globalization analysis can result
in larger iteration counts due to many dependent instances running at once.

Depends on D102444

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104416
2021-06-22 14:38:25 -04:00
Sanjay Patel b1f6ef92ec [InstCombine] reduce code duplication for FP min/max with casts fold; NFC 2021-06-22 14:15:04 -04:00
Joseph Huber 30e36c9b3c [Attributor] Add interface to emit remarks in Attributor
Summary:
This patch adds support for the Attributor to emit remarks on behalf of some
other pass. The attributor can now optionally take a callback function that
returns an OptimizationRemarkEmitter object when given a Function pointer. If
this is availible then a remark will be emitted for the corresponding pass
name.

Depends on D102197

Reviewed By: sstefan1 thegameg

Differential Revision: https://reviews.llvm.org/D102444
2021-06-22 14:12:46 -04:00
Joseph Huber 7d69da71dd [OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls
Summary:
The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087.

Depends on D97818

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102197
2021-06-22 13:23:05 -04:00
Sami Tolvanen 4474958d3a ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-06-22 10:01:55 -07:00
Joseph Huber 03d7e61c87 [OpenMP] Internalize functions in OpenMPOpt to improve IPO passes
Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102824
2021-06-22 12:38:10 -04:00
Joseph Huber 6fc51c9f7d [OpenMP] Replace GPU globalization calls with shared memory in the middle-end
Summary:
The changes introduced in D97680 create a simpler interface to code that needs
to be globalized. This interface is used to simplify the globalization calls in
the middle end. We can check any globalization call that is only called by a
single thread in the team and replace it with a static shared memory buffer.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97818
2021-06-22 11:55:44 -04:00
Nikita Popov e790d3667e [OpaquePtr] Handle addrspacecasts in InstCombine
This adds support for addrspace casts involving opaque pointers to
InstCombine, as well as the isEliminableCastPair() helper
(otherwise the assertion failure would just move there).

Add PointerType::hasSameElementTypeAs() to hide the element type
details.

Differential Revision: https://reviews.llvm.org/D104668
2021-06-22 17:45:30 +02:00
Jingu Kang 873ff5a728 [SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch
There was a bug from cost calculation for partially invariant unswitch.

The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.

Differential Revision: https://reviews.llvm.org/D103816
2021-06-22 16:18:00 +01:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Max Kazantsev 4c4f1ae93e Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks"
Patch was reverted due to a bug that existed before it and was exposed
by it. Returning after the underlying bug has been fixed.

Differential Revision: https://reviews.llvm.org/D103959
2021-06-22 12:28:46 +07:00
Max Kazantsev 575253887b [LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically
Two predecessors break the further logic, and the loop may come to the
opt in non-canonicalized state.
2021-06-22 12:20:55 +07:00
Nikita Popov 39796e1ad0 Reapply [InstCombine] Don't try converting opaque pointer bitcast to GEP
Reapplied without changes -- this was reverted together with an
underlying patch.

-----

Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
2021-06-21 22:15:56 +02:00
Nikita Popov e2c2124a4b Reapply [InstCombine] Extract bitcast -> gep transform
Relative to the original patch, an InstCombine test has been
added to show a previously missed pattern, and the Coroutine
test that resulted in the revert has been regenerated.

-----

Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. This previously
happened for the isSized() check, which skipped folds like
distributing a bitcast over a select.
2021-06-21 22:03:15 +02:00
Nikita Popov 6922ab73a5 Revert "[InstCombine] Extract bitcast -> gep transform"
This reverts commit d9f5d7b959.
This reverts commit 5780611d7e.

This causes a failure in Coroutine tests.
2021-06-21 21:34:17 +02:00
Nikita Popov 862313cf59 [LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI)
As these are no longer passed to UnrollLoop(), there is no need to
modify them in computeUnrollCount(). Make them non-reference parameters.

Differential Revision: https://reviews.llvm.org/D104590
2021-06-21 21:34:17 +02:00
Alexey Bataev 908b753661 [SLP]Improve vectorization of PHI instructions.
Perform better analysis when trying to vectorize PHIs.
1. Do not try to vectorize vector PHIs.
2. Do deeper analysis for more profitable nodes for the vectorization.

Before we just tried to vectorize the PHIs of the same type. Patch
improves this and tries to vectorize PHIs with incoming values which
come from the same basic block, have the same and/or alternative
opcodes.

It allows to save the compile time and provides better vectorization
results in general.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103638
2021-06-21 12:26:24 -07:00
Nikita Popov 5780611d7e [InstCombine] Don't try converting opaque pointer bitcast to GEP
Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
2021-06-21 21:24:50 +02:00
Nikita Popov d9f5d7b959 [InstCombine] Extract bitcast -> gep transform
Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. There is
already one isSized() check that could run into this issue,
thus this change is not strictly NFC.
2021-06-21 21:24:50 +02:00
Nikita Popov a969bdc56f [InstCombine] Remove unnecessary addres space check (NFC)
It's not possible to bitcast between different address spaces,
and this is ensured by the IR verifier. As such, this bitcast to
addrspacecast canonicalization can never be hit.
2021-06-21 20:11:39 +02:00
Nathan Chancellor f52666985d
Revert "[LoopDeletion] Handle Phis with similar inputs from different blocks"
This reverts commit bb1dc876eb.

This patch causes an assertion failure when building an arm64 defconfig
Linux kernel.

See https://reviews.llvm.org/D103959 for a link to the original bug
report and a reduced reproducer.
2021-06-21 10:18:55 -07:00
Sanjay Patel 198b79caae [InstCombine] move bitmanipulation-of-select folds
This is no outwardly-visible-difference-intended,
but it is obviously better to have all transforms
for an intrinsic housed together since we already
have helper functions in place.

It is also potentially more efficient to zap a
simple pattern match before trying to do expensive
computeKnownBits() calls.
2021-06-21 11:32:16 -04:00
Sanjay Patel 64b2676ca8 [InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms
Building on:
4c44b02d87
...and adding handling for the extra operand in these intrinsics.

This pattern is discussed in:
https://llvm.org/PR50140
2021-06-21 11:04:12 -04:00
Nikita Popov 80e0424b2c [Mem2Reg] Use poison for unreachable cases
Use poison instead of undef for cases dealing with unreachable
code. This still leaves the more interesting case of "load from
uninitialized memory" as undef.
2021-06-21 10:54:13 +02:00
Juneyoung Lee c038845f58 [InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified
This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified.

Resolves llvm.org/pr48975.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D96663
2021-06-21 17:39:05 +09:00
Sjoerd Meijer 342bbb7832 [FuncSpec] Don't specialise functions with NoDuplicate instructions.
getSpecializationCost was returning INT_MAX for a case when specialisation
shouldn't happen, but this wasn't properly checked if specialisation was
forced.

Differential Revision: https://reviews.llvm.org/D104461
2021-06-21 09:02:11 +01:00
Max Kazantsev bb1dc876eb [LoopDeletion] Handle Phis with similar inputs from different blocks
This patch lifts the requirement to have the only incoming live block
for Phis. There can be multiple live blocks if the same value comes to
phi from all of them.

Differential Revision: https://reviews.llvm.org/D103959
Reviewed By: nikic, lebedev.ri
2021-06-21 11:37:06 +07:00
Juneyoung Lee ce192ced2b [InstCombine] Use poison constant to represent the result of unreachable instrs
This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.

This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104602
2021-06-21 09:58:44 +09:00
Nikita Popov 1ae266f452 [LoopUnroll] Use smallest exact trip count from any exit
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.

The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.

Differential Revision: https://reviews.llvm.org/D102982
2021-06-20 20:58:26 +02:00
David Green a24b02193a [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-06-20 17:03:30 +01:00
Sanjay Patel 4c44b02d87 [InstCombine] fold ctpop-of-select with 1 or more constant arms
The general pattern is mentioned in:
https://llvm.org/PR50140
...but we need to do a bit more to handle intrinsics with extra operands
like ctlz/cttz.
2021-06-20 11:28:45 -04:00
Sanjay Patel 240acb0cff [InstCombine] avoid infinite loops with select folds of constant expressions
This pair of transforms was added recently with:
8591640379

And could lead to conflicting folds:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399
2021-06-20 09:46:25 -04:00
Roman Lebedev c5b7335dc8
[SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken
Same as with HoistThenElseCodeToIf() (ad87761925).
2021-06-20 12:37:14 +03:00
Roman Lebedev ad87761925
[SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken
This problem is exposed by D104598, after it tail-merges `ret` in
`@test_inline_constraint_S_label`, the verifier would start complaining
`invalid operand for inline asm constraint 'S'`.

Essentially, taking address of a block is mismodelled in IR.
It should probably be an explicit instruction, a first one in block,
that isn't identical to any other instruction of the same type,
so that it can't be hoisted.
2021-06-20 12:18:15 +03:00
Nikita Popov 1bd4085e0b [LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop()
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.

Differential Revision: https://reviews.llvm.org/D104487
2021-06-19 09:25:57 +02:00
Liqiang Tao 671a87104b [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-19 10:17:32 +08:00
Guozhi Wei 575ba6f425 [InstCombine] Don't transform code if DoTransform is false
In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case.

Differential Revision: https://reviews.llvm.org/D104567
2021-06-18 18:01:34 -07:00
Fangrui Song 3307240f05 [InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:

* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code

The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.

Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-18 17:01:17 -07:00
Hongtao Yu bd52495518 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Nikita Popov 3308205ae9 [LoopUnroll] Simplify optimization remarks
Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and
debug code. For debug code, print information about all exits.
For optimization remarks, only include the unroll count and the
type of unroll (complete, partial or runtime), but omit detailed
information about exit folding, now that more than one exit may
be folded.

Differential Revision: https://reviews.llvm.org/D104482
2021-06-18 23:47:03 +02:00
Nick Desaulniers bef2992861 [GCOVProfiling] don't profile Fn's w/ noprofile attribute
Similar to D104475, the Linux kernel would like to avoid compiler
generated code in certain functions. The no_profile function
attribute can be used in C to generate the the noprofile fn attr in IR.
Respect that from GCOVProfiling.

Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104257
2021-06-18 13:58:34 -07:00
Andrew Browne 14407332de [DFSan] Cleanup code for platforms other than Linux x86_64.
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.

Reviewed By: gbalats, stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104481
2021-06-18 11:21:46 -07:00
Liqiang Tao 93183a41b9 Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder" 2021-06-18 18:52:00 +08:00
Max Kazantsev de92287cf8 [LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3)
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

This implementation uses InstSimplify. SCEV version was rejected due to high
compile time impact.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: nikic
2021-06-18 17:31:57 +07:00
Haojian Wu 3f5d53a525 [Attributor] Fix UB behavior on uninitalized bool variables.
Found by ASAN.
2021-06-18 11:49:42 +02:00
Daniil Seredkin 6643e51d79 [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.

Differential Revision: https://reviews.llvm.org/D104193
2021-06-18 16:28:06 +07:00
Liqiang Tao a740b707d1 [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-18 16:55:38 +08:00
Max Kazantsev fa5eb22ad4 [NFC] Assert non-zero factor before division
This is to ensure that zero denominator leads to controlled
assertion failure rather than UB.
2021-06-18 15:50:50 +07:00
Haojian Wu 7670938bba [Attributor] Don't print the call-graph in a hard-coded file.
This looks like not a practical pattern in our codebase (it could fail
in some sandbox environement).

Instead we print it via standard output, and it is controled by the
-attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.
2021-06-18 09:38:07 +02:00
Daniil Seredkin 6de741de08 Revert "[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)"
This reverts commit 31053338c9.
2021-06-18 14:21:02 +07:00
Daniil Seredkin 31053338c9 [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.

Differential Revision: https://reviews.llvm.org/D104193
2021-06-18 14:12:00 +07:00
Fangrui Song 5798be8458 Revert D103717 "[InstrProfiling] Make __profd_ unconditionally private for ELF"
This reverts commit 76d0747e08.

If a group has `__llvm_prf_vals` due to static value profiler counters
(`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text
section may reference `__llvm_prf_data` and will cause a `relocation refers to a
discarded section` linker error.

Note: while a `__profc_` group is non-prevailing, it may be referenced by a
prevailing text section due to inlining.

```
group section [   66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections:
   [Index]    Name
   [   67]   __llvm_prf_cnts
   [   68]   __llvm_prf_vals
   [   69]   __llvm_prf_data
   [   70]   .rela__llvm_prf_data
```
2021-06-17 23:38:17 -07:00
Johannes Doerfert 30c9d68ad9 [Attributor][FIX] Arguments of unknown functions can be undef
This should fix PR50683. The wrong assumption was that we
could always know what the callee is when we replace a call site
argument with undef. We wanted to know that to remove the `noundef`
that might be attached to the argument. Since no callee means we
did the propagation on the caller site, there is no need to remove
an attribute. It is only needed if we replace all uses and therefore
pass `undef` instead of the value that was passed in otherwise.
2021-06-18 01:07:53 -05:00
Johannes Doerfert 666dc6f126 [Attributor] Use a centralized value simplification interface
To allow outside AAs that simplify values we need to ensure all value
simplification goes through the Attributor, not AAValueSimplify (or any
of the other AAs we have already like AAPotentialValues). This patch
also introduces an interface for the outside AAs to register
simplification callbacks for an IRPosition. To make this work as
expected we have to pass IRPositions instead of Values in
AAValueSimplify, which makes sense by itself.
2021-06-18 01:07:53 -05:00
Johannes Doerfert d9194b6efb [Attributor] Introduce a helper do deal with constant type mismatches
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.

This also introduces the AA namespace for abstract attribute related
functions and types.

Differential Revision: https://reviews.llvm.org/D103856
2021-06-18 01:07:52 -05:00
Johannes Doerfert 9959eee001 [Attributor] Make sure Heap2Stack works properly on a GPU target
If the target stack is not accessible between different running
"threads" we have to make sure not to create allocas for mallocs
that might be used by multiple "threads". The "use check" is
sufficient to prevent this but if we apply the "free check" we have
to make sure the pointer is not communicated to others before
the free is reached.

Differential Revision: https://reviews.llvm.org/D98608
2021-06-18 01:07:52 -05:00
Johannes Doerfert 9a23e673ca [OpenMP][NFC] Expose AAExecutionDomain and rename its getter
The initial use for AAExecutionDomain was to determine if a single
thread executes a block. While this is sometimes informative most
of the time, and for other reasons, we actually want to know if it
is the "initial thread". Thus, the thread that started execution on
the current device. The deduction needs to be adjusted in a follow
up as the methods we use right not are looking for the OpenMP thread
id which is resets whenever a thread enters a parallel region. What
we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and
equivalent functions.
2021-06-18 01:07:52 -05:00
Johannes Doerfert 8d7bace3b5 [Attributor][NFC] AAReachability is currently stateless, don't invalidate it
We invalidated AAReachabilityImpl directly which is not helpful and
confusing as we still used it regardless. We now avoid invalidating it
(not needed anyway) and add checks for the state. This has by itself no
actual effect but prepares for later extensions.
2021-06-18 01:07:51 -05:00
George Balatsouras c6b5a25eeb [dfsan] Replace dfs$ prefix with .dfsan suffix
The current naming scheme adds the `dfs$` prefix to all
DFSan-instrumented functions.  This breaks mangling and prevents stack
trace printers and other tools from automatically demangling function
names.

This new naming scheme is mangling-compatible, with the `.dfsan`
suffix being a vendor-specific suffix:
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure

With this fix, demangling utils would work out-of-the-box.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104494
2021-06-17 22:42:47 -07:00
Xun Li 3522167efd [Coroutine] Properly deal with byval and noalias parameters
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857.
Previous attempts can be found in D104007 and D101980.
A lot of discussions can be found in those two patches.
To summarize the bug:
When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly.
However, in some cases we find that arguments are still directly used:
When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free.
This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly.
For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass.
For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer.

Differential Revision: https://reviews.llvm.org/D104184
2021-06-17 19:06:10 -07:00
Roman Lebedev 84eeb82888
[NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining unswitching failure
It's not prohibitively verbose, and allows easier understanding
why certain unswitching ultimately wasn't performed.
2021-06-18 00:46:04 +03:00
Kuter Dinel eaf1b6810c [Attributor] Derive AACallEdges attribute
This attribute computes the optimistic live call edges using the attributor
liveness information. This attribute will be used for deriving a
inter-procedural function reachability attribute.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104059
2021-06-18 03:29:22 +03:00
Andrew Browne 39295e92f7 Revert "[DFSan] Cleanup code for platforms other than Linux x86_64."
This reverts commit 8441b993bd.

Buildbot failures.
2021-06-17 14:19:18 -07:00
Fangrui Song 76d0747e08 [InstrProfiling] Make __profd_ unconditionally private for ELF
For ELF, since all counters/data are in a section group (either `comdat any` or
`comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the
D1003372 optimization prerequisite (linker GC cannot discard data variables
while the text section is retained) is always satisified, we can make __profd_
unconditionally private.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-17 14:16:54 -07:00
Craig Topper 99e95856fb [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp.
This pass emits a floating point compare and a conditional branch,
but if strictfp is enabled we don't emit a constrained compare
intrinsic.

The backend also won't expand the readonly sqrt call this pass inserts
to a sqrt instruction under strictfp. So we end up with 2 libcalls as
seen here. https://godbolt.org/z/oax5zMEWd

Fix these things by disabling the pass.

Differential Revision: https://reviews.llvm.org/D104479
2021-06-17 14:15:12 -07:00
Andrew Browne 8441b993bd [DFSan] Cleanup code for platforms other than Linux x86_64.
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.

Reviewed By: gbalats, stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104481
2021-06-17 14:08:40 -07:00
Nikita Popov f7c54c4603 [LoopUnroll] Fold all exits based on known trip count/multiple
Fold all exits based on known trip count/multiple information from
SCEV. Previously only the latch exit or the single exit were folded.

This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple
entirely: They're still used to a) decide whether runtime unrolling
should be performed and b) for ORE remarks. However, the core
unrolling logic is independent of them now.

Differential Revision: https://reviews.llvm.org/D104203
2021-06-17 20:58:34 +02:00
Roman Lebedev 37dfc467ac
[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg
This really isn't talking about vectors in general,
but only about either fixed or scalable vectors,
and it's pretty confusing to see it state
that there aren't any vectors :)
2021-06-17 21:07:34 +03:00
hyeongyukim 69b0ed9a0a [InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.

```
   // assume that GV is an array of 4-byte elements
   GEP = gep GV, 0, Idx // this is accessing Idx * 4
   L = load GEP
   ICI = icmp eq L, value
 =>
   ICI = icmp eq Idx, NewIdx
```

The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.

Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.

This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.

```
   ...
 =>
   Idx' = and Idx, 0x3F..FF
   ICI = icmp eq Idx', NewIdx
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D99481
2021-06-17 19:46:17 +09:00
Sjoerd Meijer dcd23d875a [FuncSpec] Don't specialise functions with attribute NoDuplicate.
Differential Revision: https://reviews.llvm.org/D104378
2021-06-17 10:32:29 +01:00
Florian Hahn 80a403348b
[VPlan] Support PHIs as LastInst when inserting scalars in ::get().
At the moment, we create insertelement instructions directly after
LastInst when inserting scalar values in a vector in
VPTransformState::get.

This results in invalid IR when LastInst is a phi, followed by another
phi. In that case, the new instructions should be inserted just after
the last PHI node in the block.

At the moment, I don't think the problematic case can be triggered, but
it can happen once predicate regions are merged and multiple
VPredInstPHI recipes are in the same block (D100260).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104188
2021-06-17 09:36:44 +01:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Sjoerd Meijer 49ab3b1735 [FuncSpec] Statistics
Adds some bookkeeping for collecting the number of specialised functions and a
test for that.

Differential Revision: https://reviews.llvm.org/D104102
2021-06-16 09:11:51 +01:00
Evgeniy Brevnov 96cded5b79 [SLP] Incorrect handling of external scalar values
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103954
2021-06-16 13:27:36 +07:00
Andrew Browne e652d99169 [DFSan][NFC] Fix shadowing variable name. 2021-06-15 22:58:22 -07:00
Rong Xu 82a0bb1afc [SampleFDO] Place the discriminator flag variable into the used list.
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.

This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.

Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.

Differential Revision: https://reviews.llvm.org/D103988
2021-06-15 21:51:04 -07:00
Chuanqi Xu 86906304d8 [FuncSpec] Use std::pow instead of operator^
The original implementation calculating UserBonus uses operator ^, which means XOR in C++
language.
At the first glance of reviewing, I thought it should be power, my bad.
It doesn't make sense to use XOR here. So I believe it should be a
carelessness as I made.

Test Plan: check-all

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D104282
2021-06-16 10:13:21 +08:00
Andrew Browne af93157625 [DFSan] Handle landingpad inst explicitly as zero shadow.
Before this change, DFSan was relying fallback cases when getting origin
address.

Differential Revision: https://reviews.llvm.org/D104266
2021-06-15 18:28:20 -07:00
Vitaly Buka 6478ef61b1 [asan] Remove Asan, Ubsan support of RTEMS and Myriad
Differential Revision: https://reviews.llvm.org/D104279
2021-06-15 12:59:05 -07:00
Roman Lebedev e52364532a
[NewPM] Remove SpeculateAroundPHIs pass
Addition of this pass has been botched.
There is no particular reason why it had to be sold as an inseparable part
of new-pm transition. It was added when old-pm was still the default,
and very *very* few users were actually tracking new-pm,
so it's effects weren't measured.

Which means, some of the turnoil of the new-pm transition
are actually likely regressions due to this pass.

Likewise, there has been a number of post-commit feedback
(post new-pm switch), namely
* https://reviews.llvm.org/D37467#2787157 (regresses HW-loops)
* https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before)
* https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata)
and in the half year past, the pass authors (google) still haven't found time to respond to any of that.

Hereby it is proposed to backout the pass from the pipeline,
until someone who cares about it can address the issues reported,
and properly start the process of adding a new pass into the pipeline,
with proper performance evaluation.

Furthermore, neither google nor facebook reports any perf changes
from this change, so i'm dropping the pass completely.
It can always be re-reverted should/if anyone want to pick it up again.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D104099
2021-06-15 20:35:55 +03:00
Andrew Litteken 2c21278e74 [IROutliner] Adding DebugInfo handling for IR Outlined Functions
This adds support for functions outlined by the IR Outliner to be
recognized by the debugger.  The expected behavior is that it will
skip over the instructions included in that section.  This is due to the
fact that we can not say which of the original locations the
instructions originated from.

These functions will show up in the call stack, but you cannot step
through them.

Reviewers: paquette, vsk, djtodoro

Differential Revision: https://reviews.llvm.org/D87302
2021-06-15 10:57:08 -05:00
Florian Hahn f7fc8927c0
[LoopDeletion] Check for irreducible cycles when deleting loops.
Loops with irreducible cycles may loop infinitely. Those cannot be
removed, unless the loop/function is marked as mustprogress.

Also discussed in D103382.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104238
2021-06-15 12:56:12 +01:00
Neil Henning 1540da3b78 ABI breaking changes fixes.
This commit mostly just replaces bad uses of `NDEBUG` with uses of
`LLVM_ENABLE_ABI_BREAKING_CHANGES` - the safe way to include ABI
breaking changes (normally extra struct elements in headers).

Differential Revision: https://reviews.llvm.org/D104216
2021-06-15 11:08:13 +01:00
Vitaly Buka b8919fb0ea [NFC][sanitizer] clang-format some code 2021-06-14 18:05:22 -07:00
Huihui Zhang 1c096bf09f [SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE.
Currently, Loop strengh reduce is not handling loops with scalable stride very well.

Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance,
(refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added).

Memory accesses are incremented by "16*vscale", while induction variable is incremented
by "8*vscale". The scaling factor "2" needs to be extracted to build candidate formula
i.e., "reg(%in) + 2*reg({0,+,(8 * %vscale)}". So that addrec register reg({0,+,(8*vscale)})
can be reused among Address and ICmpZero LSRUses to enable optimal solution selection.

This patch allow LSR getExactSDiv to recognize special cases like "C1*X*Y /s C2*X*Y",
and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR
is missing candidate formula with proper scaled factor to leverage target scaled-index
addressing mode.

Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable
vector. But allow simple valid scale to pass through.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103939
2021-06-14 16:42:34 -07:00
Matt Morehouse b87894a1d2 [HWASan] Enable globals support for LAM.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104265
2021-06-14 14:20:44 -07:00
Sanjay Patel 8591640379 [InstCombine] add DeMorgan folds for logical ops in select form
We canonicalized to these select patterns (poison-safe logic)
with D101191, so we need to reduce 'not' ops when possible
as we would with 'and'/'or' instructions.

This is shown in a secondary example in:
https://llvm.org/PR50389

https://alive2.llvm.org/ce/z/BvsESh
2021-06-14 12:54:35 -04:00
Florian Hahn 96ca03493a
[VectorCombine] Limit scalarization to non-poison indices for now.
As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
2021-06-14 16:40:14 +01:00
Jeroen Dobbelaere bb8ce25e88 Intrinsic::getName: require a Module argument
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.

This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.

Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99173
2021-06-14 14:52:29 +02:00
Aditya Kumar dcbbc69cc5 Calculate getTerminator only when necessary
Differential Revision: https://reviews.llvm.org/D104202
2021-06-13 20:16:07 -07:00
Simon Pilgrim 9efe89d82f BoundsChecking.cpp - tidy implicit header dependencies. NFCI.
We don't use <vector> but we do use std::pair (<utility>)
2021-06-13 17:08:15 +01:00
Simon Pilgrim 56541d1377 GVN.cpp - remove unused <vector> include. NFCI. 2021-06-13 14:06:32 +01:00
Simon Pilgrim c14fd171fe LoopUnrollAndJamPass.cpp - remove unused <vector> include. NFCI. 2021-06-13 14:06:32 +01:00
Sanjay Patel afd44bb6f2 [InstCombine] fold ctlz/cttz of bool types
https://alive2.llvm.org/ce/z/tX4pUT
2021-06-13 08:26:40 -04:00
Simon Pilgrim 2477b498f2 ArgumentPromotion.cpp - remove unused <string> include. NFCI. 2021-06-13 13:03:47 +01:00
Simon Pilgrim b013c58e82 VPlanSLP.cpp - tidy implicit header dependencies. NFCI.
We don't use std::string and std::vector, but we do use std::pair and std::max.
2021-06-13 12:37:17 +01:00
Xun Li fae7debadc [CHR] Don't run ControlHeightReduction if any BB has address taken
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610.
In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable.
CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list.
However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code.
To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken.
Added a few test cases.

Reviewed By: aeubanks, wenlei, hoy

Differential Revision: https://reviews.llvm.org/D103867
2021-06-12 10:29:53 -07:00
Kevin Athey 1d22596b2f [sanitizer] Remove numeric values from -asan-use-after-return flag. (NFC)
for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104152
2021-06-11 15:14:51 -07:00
Kevin Athey e0b469ffa1 [clang-cl][sanitizer] Add -fsanitize-address-use-after-return to clang.
Also:
  - add driver test (fsanitize-use-after-return.c)
  - add basic IR test (asan-use-after-return.cpp)
  - (NFC) cleaned up logic for generating table of __asan_stack_malloc
    depending on flag.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104076
2021-06-11 12:07:35 -07:00
Valery N Dmitriev 94a07c79cf [SLP][NFC] Fix condition that was supposed to save a bit of compile time.
It was found by chance revealing discrepancy between comment (few lines above),
the condition and how re-ordering of instruction is done inside the if statement
it guards. The condition was always evaluated to true.

Differential Revision: https://reviews.llvm.org/D104064
2021-06-11 10:08:55 -07:00
Adam Nemet e0efebb8eb [Matrix] In transpose opts, handle a^t * a^t
Without the fix the testcase crashes because we remove the same instruction
twice.

Differential Revision: https://reviews.llvm.org/D104127
2021-06-11 09:29:43 -07:00
Alexey Bataev a010d4230e [SLP]Allow reordering of insertelements.
After we added support for non-ordered insertelements, we can allow
their reordering.

Differential Revision: https://reviews.llvm.org/D104057
2021-06-11 08:47:41 -07:00
Matt Morehouse 0867edfc64 [HWASan] Add basic stack tagging support for LAM.
Adds the basic instrumentation needed for stack tagging.

Currently does not support stack short granules or TLS stack histories,
since a different code path is followed for the callback instrumentation
we use.

We may simply wait to support these two features until we switch to
a custom calling convention.

Patch By: xiangzhangllvm, morehouse

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102901
2021-06-11 08:21:17 -07:00
Alexey Bataev 74af4bb1f4 [SLP]Remove unnecessary UndefValue in CreateShuffle.
No need to use UndefValue in CreateShuffle call.

Differential Revision: https://reviews.llvm.org/D104113
2021-06-11 08:08:30 -07:00
Sjoerd Meijer 9907746f5d Move Function Specialization to its correct location. NFC.
As a follow up of rGc4a0969b9c14, and as part of D104102, move it to
the IPO transformations directory.
2021-06-11 15:00:10 +01:00
Sanjay Patel 602ab24833 [SimplifyCFG] avoid crash on degenerate loop
The problematic code pattern in the test is based on:
https://llvm.org/PR50638

If the IfCond is itself the phi that we are trying to remove,
then the loop around line 2835 can end up with something like:
%cmp = select i1 %cmp, i1 false, i1 true

That can then lead to a use-after-free and assert (although
I'm still not seeing that locally in my release + asserts build).

I think this can only happen with unreachable code.

Differential Revision: https://reviews.llvm.org/D104063
2021-06-11 09:37:06 -04:00
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Roman Lebedev 20542b47d6
[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper
This results in slightly more optimistic alignments in some cases
2021-06-11 12:47:10 +03:00
Roman Lebedev abc0e0125c
[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function 2021-06-11 12:47:09 +03:00
Simon Pilgrim 5e6bfb661e [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00