Commit Graph

26805 Commits

Author SHA1 Message Date
Simon Pilgrim 232f32f0da [DSE] eliminateDeadStoresMemorySSA - fix "initialization is never read" clang-tidy warning. NFCI. 2021-03-02 15:01:33 +00:00
Alexey Bataev a054e94e9e [SLP]Merge reorder and reuse shuffles.
It is possible to merge reuse and reorder shuffles and reduce the total
cost of the vectorization tree/number of final instructions.

Differential Revision: https://reviews.llvm.org/D94992
2021-03-02 06:39:47 -08:00
Juneyoung Lee 365f5e2475 [JumpThreading] Fix tryToUnfoldSelectInCurrBB to treat and/or and its select form equally
This is a minor fix to update tryToUnfoldSelectInCurrBB to ignore select
form of and/ors because the function does not look into binops as well
2021-03-02 18:35:18 +09:00
Ta-Wei Tu ea1a1ebbc6 [NFC] Use std::swap in LoopInterchange 2021-03-02 11:42:48 +08:00
Fangrui Song 04c3040f41 [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker no longer lets `__start_/__stop_` references retain them.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on other COFF/Mach-O are not affected.

Differential Revision: https://reviews.llvm.org/D97649
2021-03-01 13:43:23 -08:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
Florian Hahn a6c81d3366
[VPlan] Remove recipes from back to front.
Update the deletion order when destroying VPBasicBlocks. This ensures
recipes that depend on earlier ones in the block are removed first.
Otherwise this may cause issues when recipes have remaining users later
in the block.
2021-03-01 16:06:30 +00:00
Florian Hahn 53dacb7b67
[LV] Generate RT checks up-front and remove them if required.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Reviewed By: lebedev.ri, ebrevnov

Differential Revision: https://reviews.llvm.org/D75980
2021-03-01 10:48:04 +00:00
Juneyoung Lee 5419b67137 [SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally
This is a minor change that fixes FoldTwoEntryPHINode to handle
phis with and/ors of select form and binop form equally.
2021-03-01 13:34:51 +09:00
Kazu Hirata d639120983 [llvm] Use set_is_subset (NFC) 2021-02-28 10:59:20 -08:00
Sanjay Patel 9502061bcc [InstCombine] avoid infinite loop in demanded bits for select
https://llvm.org/PR49205
2021-02-28 10:17:53 -05:00
William S. Moses b077d82b00 [Attributor] Conditinoally delete fns
Allow the attributor to delete functions only if requested

Differential Revision: https://reviews.llvm.org/D97238
2021-02-27 20:37:42 -05:00
Sanjay Patel 356cdabd3a [SimplifyCFG] avoid illegal phi with both poison and undef
In the example based on:
https://llvm.org/PR49218
...we are crashing because poison is a subclass of undef, so we merge blocks and create:

PHI node has multiple entries for the same basic block with different incoming values!
  %k3 = phi i64 [ poison, %entry ], [ %k3, %g ], [ undef, %entry ]

If both poison and undef values are incoming, we soften the poison values to undef.

Differential Revision: https://reviews.llvm.org/D97495
2021-02-27 09:10:32 -05:00
Kazu Hirata 1d4a2f3778 [Transforms/Utils] Use range-based for loops (NFC) 2021-02-26 22:36:40 -08:00
Fangrui Song bf176c49e8 [InstrProfiling] Use llvm.compiler.used instead of llvm.used for ELF
Many optimizers (e.g.  GlobalOpt/ConstantMerge) do not respect linker semantics
for comdat and may not discard the sections as a unit.

The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF)
are similar to D97432: `__profd_` is not directly referenced, so
`__profd_` may be discarded while `__profc_` is retained, breaking the
interconnection.  We currently conservatively add all such sections to
`llvm.used` and let the linker do GC for ELF.

In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN,
causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC).
Use `llvm.compiler.used` to retain the current GC behavior.

Differential Revision: https://reviews.llvm.org/D97585
2021-02-26 16:14:03 -08:00
George Balatsouras c9075a1c8e [dfsan] Record dfsan metadata in globals
This will allow identifying exactly how many shadow bytes were used
during compilation, for when fast8 mode is introduced.

Also, it will provide a consistent matching point for instrumentation
tests so that the exact llvm type used (i8 or i16) for the shadow can
be replaced by a pattern substitution. This is handy for tests with
multiple prefixes.

Reviewed by: stephan.yichao.zhao, morehouse

Differential Revision: https://reviews.llvm.org/D97409
2021-02-26 14:42:46 -08:00
Jianzhou Zhao a47d435bc4 [dfsan] Propagate origins for callsites
This is a part of https://reviews.llvm.org/D95835.

Each customized function has two wrappers. The
first one dfsw is for the normal shadow propagation. The second one dfso is used
when origin tracking is on. It calls the first one, and does additional
origin propagation. Which one to use can be decided at instrumentation
time. This is to ensure minimal additional overhead when origin tracking
is off.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97483
2021-02-26 19:12:03 +00:00
Fangrui Song b55f29c194 [SanitizerCoverage] Clarify llvm.used/llvm.compiler.used and partially fix unmatched metadata sections on Windows
`__sancov_pcs` parallels the other metadata section(s). While some optimizers
(e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the
sections as a unit, some (e.g.  GlobalOpt/ConstantMerge) do not. So we have to
conservatively retain all unconditionally in the compiler.

When a comdat is used, the COFF/ELF linkers' GC semantics ensure the
associated parallel array elements are retained or discarded together,
so `llvm.compiler.used` is sufficient.

Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not
used), we have to use `llvm.used` to conservatively make all sections retain by
the linker. This will fix the Windows problem once internal linkage
GlobalObject's in `llvm.used` are retained via `/INCLUDE:`.

Reviewed By: morehouse, vitalybuka

Differential Revision: https://reviews.llvm.org/D97432
2021-02-26 11:10:03 -08:00
Simon Pilgrim 455d43b951 [Utils] collectBitParts - bail for integers > 128-bits
collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit.

We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers.

Thanks to @manojgupta for the repro.
2021-02-26 14:58:01 +00:00
Stephen Tozer ec7b9b0c18 [InstCombine] Avoid redundant or out-of-order debug value sinking
This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent
redundant debug intrinsics from being produced, and also prevent the intrinsics
from being emitted in an incorrect order. It does this by ensuring that when
this pass sinks an instruction and creates clones of the debug intrinsics that
use that instruction, it inserts those debug intrinsics in their original order,
and only inserts the last debug intrinsic for each variable in the Instruction's
block.

Differential revision: https://reviews.llvm.org/D95463
2021-02-26 13:04:33 +00:00
Evgeniy Brevnov 13a5cac2ba Revert "[NARY-REASSOCIATE] Support reassociation of min/max"
This reverts commit 83d134c3c4.
2021-02-26 19:47:54 +07:00
Kazu Hirata 5fc9e30985 [Scalar] Use range-based for loops (NFC) 2021-02-25 19:54:38 -08:00
Jianzhou Zhao c88fedef2a [dfsan] Conservative solution to atomic load/store
DFSan at store does store shadow data; store app data; and at load does
load shadow data; load app data.

When an application data is atomic, one overtainting case is

thread A: load shadow
thread B: store shadow
thread B: store app
thread A: load app

If the application address had been used by other flows, thread A reads
previous shadow, causing overtainting.

The change is similar to MSan's solution.
1) enforce ordering of app load/store
2) load shadow after load app; store shadow before shadow app
3) do not track atomic store by reseting its shadow to be 0.
The last one is to address a case like this.

Thread A: load app
Thread B: store shadow
Thread A: load shadow
Thread B: store app

This approach eliminates overtainting as a trade-off between undertainting
flows via shadow data race.

Note that this change addresses only native atomic instructions, but
does not support builtin libcalls yet.
   https://llvm.org/docs/Atomics.html#libcalls-atomic

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97310
2021-02-25 23:34:58 +00:00
James Y Knight 24539f1ef2 Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg.
And then push those change throughout LLVM.

Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).

The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.

Differential Revision: https://reviews.llvm.org/D97223
2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih fee9abe69c [Remarks] Provide more information about auto-init calls
This now analyzes calls to both intrinsics and functions.

For intrinsics, grab the ones we know and care about (mem* family) and
analyze the arguments.

For calls, use TLI to get more information about the libcalls, then
analyze the arguments if known.

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

Differential Revision: https://reviews.llvm.org/D97489
2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih 4753a69a31 [Remarks] Provide more information about auto-init stores
This adds support for analyzing the instruction with the !annotation
"auto-init" in order to generate a more user-friendly remark.

For now, support the store size, and whether it's atomic/volatile.

Example:

```
auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks]
  int var;
      ^
```

Differential Revision: https://reviews.llvm.org/D97412
2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih c49b600b2f [Remarks] Emit remarks for "auto-init" !annotations
Using the !annotation metadata, emit remarks pointing to code added by
`-ftrivial-auto-var-init` that survived the optimizer.

Example:

```
auto-init.c:4:7: remark: Initialization inserted by -ftrivial-auto-var-init. [-Rpass-missed=annotation-remarks]
  int buf[1024];
      ^
```

The tests are testing various situations like calls/stores/other
instructions, with debug locations, and extra debug information on
purpose: more patches will come to improve the reporting to make it more
user-friendly, and these tests will show how the reporting evolves.

Differential Revision: https://reviews.llvm.org/D97405
2021-02-25 15:14:09 -08:00
Adrian Prantl 1693180884 Add a nullptr check.
This doesn't actually reproduce with a dbg.declare(i8* null, ...)
which produces a non-null null Value, but I have seen this show up in
crash logs. I'm suspecting that there may be another pass forcibly
setting the operand to a nullptr.
2021-02-25 12:01:11 -08:00
Fangrui Song 4d63892acb [SanitizerCoverage] Drop !associated on metadata sections
In SanitizerCoverage, the metadata sections (`__sancov_guards`,
`__sancov_cntrs`, `__sancov_bools`) are referenced by functions.  After
inlining, such a `__sancov_*` section can be referenced by more than one
functions, but its sh_link still refers to the original function's section.
(Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to
section violates the invariant.)

If the original function's section is discarded (e.g. LTO internalization +
`ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error.

This above reasoning means that `!associated` is not appropriate to be called by
an inlinable function. Non-interposable functions are inline candidates, so we
have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections
but is expected to parallel a metadata section, so we have to make sure the two
sections are retained or discarded at the same time. A section group does the
trick.  (Note: we have a module ctor, so `getUniqueModuleId` guarantees to
return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return
non-null.)

For interposable functions, we could keep using `!associated`, but
LTO can change the linkage to `internal` and allow such functions to be inlinable,
so we have to drop `!associated`, too. To not interfere with section
group resolution, we need to use the `noduplicates` variant (section group flag 0).
(This allows us to get rid of the ModuleID parameter.)
In -fno-pie and -fpie code (mostly dso_local), instrumented interposable
functions have WeakAny/LinkOnceAny linkages, which are rare. So the
section group header overload should be low.

This patch does not change the object file output for COFF (where `!associated` is ignored).

Reviewed By: morehouse, rnk, vitalybuka

Differential Revision: https://reviews.llvm.org/D97430
2021-02-25 11:59:23 -08:00
Jon Roelofs 7f6e331645 Support `#pragma clang section` directives on MachO targets
rdar://59560986

Differential Revision: https://reviews.llvm.org/D97233
2021-02-25 09:30:10 -08:00
Rong Xu 6103b6ad69 [SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class
This patch makes SampleProfileLoaderBaseImpl a template class so it
can be used in CodeGen transformation.

Noticeable changes:
 * use one template parameter and use IRTraits to get other used
   types an type specific functions.
 * remove the temporary "inline" keywords in previous refactor
   patch.
 * change the template function findEquivalencesFor to a regular
   function. This function has a single caller with type of
   PostDominatorTree. It's simpler to use the type directly
   because MachinePostDominatorTree is not a derived type of
   template DominatorTreeBase.

Differential Revision: https://reviews.llvm.org/D96981
2021-02-25 08:26:17 -08:00
Evgeniy Brevnov d0a6f8bb65 [NFC] Fix build failure after 83d134c3c4 2021-02-25 18:43:00 +07:00
Evgeniy Brevnov 83d134c3c4 [NARY-REASSOCIATE] Support reassociation of min/max
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88287
2021-02-25 18:22:39 +07:00
Xun Li c38000a9fb [Coroutine] Check indirect uses of alloca when checking lifetime info
In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point.
This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias.
Only checking direct uses can miss cases where indirect uses are crossing suspension point.
This can be demonstrated in the newly added test case 007.
In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend.
If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame.

Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better.
We still checks the lifetime info in the same way as before, but with two differences:
1. The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated.
2. When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization.

Differential Revision: https://reviews.llvm.org/D96922
2021-02-24 18:29:23 -08:00
Sanjay Patel a7cee55762 [InstCombine] fold fdiv with powi divisor (PR49147)
This extends b40fde062c for the especially non-standard
powi pattern. We want to avoid being completely wrong
on the negation-of-int-min corner case, so I'm adding
an extra FMF check for 'ninf' assuming that gives us
the flexibility to handle that possibility.
https://llvm.org/PR49147
2021-02-24 16:44:36 -05:00
Sanjay Patel 868d43fbd6 [InstCombine] add helper for x/pow(); NFC
We at least want to add powi to this list, so
split it off into a switch to reduce code duplication.
2021-02-24 16:44:36 -05:00
Duncan P. N. Exon Smith 01701646d5 Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs
This is a follow up to 22a52dfddc and a
revert of df763188c9.

With this change, we only skip cloning distinct nodes in
MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the
no-longer-needed local helper `cloneOrBuildODR()`.  Skipping cloning in
other cases is unsound and breaks CloneModule, which is why the textual
IR for PR48841 didn't pass previously. This commit adds the test as:
Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll

Cloning less often exposed a hole in subprogram cloning in
CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's
test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function
has a subprogram attachment whose scope is a DICompositeType that
shouldn't be cloned, but it has no internal debug info pointing at that
type, that composite type was being cloned. This commit plugs that hole,
calling DebugInfoFinder::processSubprogram from CloneFunctionInto.

As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit
message, I think we need to formalize ownership of metadata a bit more
so that ValueMapper/CloneFunctionInto (and similar functions) can deal
with cloning (or not) metadata in a more generic, less fragile way.

This fixes PR48841.

Differential Revision: https://reviews.llvm.org/D96734
2021-02-24 12:57:52 -08:00
Sander de Smalen 5e19208d96 [InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep.
This fixes the types of a few more cost variables to be of type InstructionCost.
2021-02-24 14:30:03 +00:00
Pierre Gousseau 27830bc2b1 [asan] Avoid putting globals in a comdat section when targetting elf.
Putting globals in a comdat for dead-stripping changes the semantic and
can potentially cause false negative odr violations at link time.
If odr indicators are used, we keep the comdat sections, as link time
odr violations will be dectected for the odr indicator symbols.

This fixes PR 47925
2021-02-24 12:01:56 +00:00
Simon Pilgrim b94c215592 [Utils] collectBitParts - add truncate() handling 2021-02-24 11:48:34 +00:00
Florian Hahn 6240f436dd
Recommit "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends."
This reverts the revert commit 437f0bbcd5.

It adds a new toVPRecipeResult, which forces VPRecipeOrVPValueTy to be
constructed with a VPRecipeBase *. This should address ambiguous
constructor issues for recipe sub-types that also inherit from VPValue.
2021-02-24 10:36:02 +00:00
Dan Liew 7d3ef103b5 [ASan] Introduce a way set different ways of emitting module destructors.
Previously there was no way to control how module destructors were emitted
by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang)
to be able to decide how to emit these destructors (if at all).

This patch introduces the `AsanDtorKind` enum that represents the different ways
destructors can be emitted. There are currently only two valid ways to emit destructors.

* `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default.
* `None`   - Do not emit module destructors.

The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated
to take the `AsanDtorKind` as an argument.

The `-asan-destructor-kind=` command line argument has been introduced to make this
easy to test from `opt`. If this argument is specified it overrides the value passed
to the `ModuleAddressSanitizerPass` constructor.

Note that `AsanDtorKind` is not `bool` because we will introduce a new way to
emit destructors in a subsequent patch.

Note that `AsanDtorKind` is given its own header file because if it is declared
in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error
(Module is ambiguous) when trying to use it in
`clang/Basic/CodeGenOptions.def`.

rdar://71609176

Differential Revision: https://reviews.llvm.org/D96571
2021-02-23 20:01:21 -08:00
Juneyoung Lee 56d228a14e [SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes
This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes.

A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97244
2021-02-24 10:40:50 +09:00
Fangrui Song ef312951fd collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128
And delete the SmallPtrSetImpl overload.

While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97257
2021-02-23 16:09:06 -08:00
Fangrui Song ed02f52d28 Fix unstable SmallPtrSet iteration issues due to collectUsedGlobalVariables
While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Depends on D97128 (which added a new SmallVecImpl overload for collectUsedGlobalVariables).

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97139
2021-02-23 16:09:05 -08:00
Fangrui Song 3adb89bb9f [ThinLTO] Make cloneUsedGlobalVariables deterministic
Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements
is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements.

While here, decrease inline element counts from 8 to 4. The number of
`llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full
LTO/hybrid LTO, the number may be large, so we need to be careful.

According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is
good for a large project with WholeProgramDevirt, when available_externally
vtables are placed in the llvm.compiler.used set.

Differential Revision: https://reviews.llvm.org/D97128
2021-02-23 16:09:05 -08:00
Teresa Johnson 0a5949dcfa [WPD] Fix handling of pure virtual base class
The fix in 3c4c205060 caused an assert in
the case of a pure virtual base class. In that case, the vTableFuncs
list on the summary will be empty, so we were hitting the new assert
that the linkage type was not available_externally.

In the case of pure virtual, we do not want to assert, and additionally
need to set VS so that we don't treat it conservatively and quit the
analysis of the type id early.

This exposed a pre-existing issue where we were not updating the vcall
visibility on pure virtual functions when whole program visibility was
specified. We were skipping updating the visibility on any global vars
that didn't have any vTableFuncs, which meant all pure virtual were not
updated, and the later analysis would block any devirtualization of
calls that had a type id used on those pure virtual vtables (see the
handling in the other code modified in this patch). Simply remove that
check. It will mean that we may update the vcall visibility on global
vars that aren't vtables, but that setting is ignored for any global
vars that didn't have type metadata anyway.

Added a new test case that asserted without removing the assert, and
that requires the other fixes in this patch (updateVCallVisibilityInIndex
and not skipping all vtables without virtual funcs) to get a successful
devirtualization with index-only WPD. I added cases to test hybrid and
regular LTO for completeness, although those already worked without the
fixes here.

With this final fix, a clang multistage bootstrap with WPD builds and
runs all tests successfully.

Differential Revision: https://reviews.llvm.org/D97126
2021-02-23 16:07:09 -08:00
Jianzhou Zhao a05aa0dd5e [dfsan] Update memset and dfsan_(set|add)_label with origin tracking
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97302
2021-02-23 23:16:33 +00:00
Matthew Voss 6da7d31416 [llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.

Differential Revision: https://reviews.llvm.org/D92074
2021-02-23 12:51:54 -08:00
Andrei Elovikov 3605b873f6 [NFC][VPlan] Use VPUser to store block's predicate
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96529
2021-02-23 11:08:27 -08:00
Florian Hahn de40423c85
[LV] Ensure fixNonInductionPHIs uses a valid insertion point.
In some cases, Builder's insertion point may be invalidated before using
it in VPTransformState::get. Make sure the insertion point is
up-to-date.

This should fix various sanitizer errors, like
https://lab.llvm.org/buildbot/#/builders/5/builds/4933/steps/9/logs/stdio
2021-02-23 18:51:05 +00:00
Florian Hahn 437f0bbcd5
Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends."
This reverts commit 4efa097eb4, because
some the compilers used for some bots do not support automatic
conversions to PointerUnion.
2021-02-23 16:57:21 +00:00
Florian Hahn 4efa097eb4
[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends.
Generalize the return value of tryToCreateWidenRecipe to return either a
newly create recipe or an existing VPValue. Use this to avoid creating
unnecessary VPBlendRecipes.

Fixes PR44800.
2021-02-23 16:52:03 +00:00
Juneyoung Lee 19c2e12947 [JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns
This allows JumpThreading's computeValueKnownInPredecessors to
recognize select form of and/or patterns as well.
2021-02-24 00:06:10 +09:00
Nate Chandler 01b4890e47 Add @llvm.coro.async.size.replace intrinsic.
The new intrinsic replaces the size in one specified AsyncFunctionPointer with
the size in another.  This ability is necessary for functions which merely
forward to async functions such as those defined for partial applications.

Reviewed By: aschwaighofer

Differential Revision: https://reviews.llvm.org/D97229
2021-02-23 06:43:52 -08:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Matteo Favaro 633e090528
[DSE] Allow ptrs defined in the entry block in IsGuaranteedLoopInvariant.
The **IsGuaranteedLoopInvariant** function is making sure to check if the
incoming pointer is guaranteed to be loop invariant, therefore I think
the case where the pointer is defined in the entry block of a function
automatically guarantees the pointer to be loop invariant, as the entry
block of a function cannot have predecessors or be part of a loop.

I implemented this small patch and tested it using
**ninja check-llvm-unit** and **ninja check-llvm**. I added a contained test
file that shows the problem and used **opt -O3 -debug** on it to make sure
the case is not currently handled (in fact the debug log is showing that
the DSE pass is bailing out when testing if the killer store is able to
clobber the dead store).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96979
2021-02-23 12:00:44 +00:00
Juneyoung Lee 481c62277d [BuildLibCalls] Add noundef to allocator fns' size
This is a patch to explicitly mark the size parameter of allocator functions like malloc/realloc/... as noundef.

For C/C++: undef can be created from reading an uninitialized variable or padding.
Calling a function with uninitialized variable is already UB.
Calling malloc with padding value is.. something that's not expected. Padding bits may appear in a coerced aggregate, which doesn't apply to malloc's size.
Therefore, malloc's size can be marked as noundef.

For transformations that introduce malloc/realloc/..: I ran LLVM unit tests with an updated Alive2 semantics, and found no regression, so it seems okay.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97045
2021-02-23 13:58:03 +09:00
Kazu Hirata 4ed47858ab [llvm] Use llvm::drop_begin (NFC) 2021-02-22 20:17:16 -08:00
ksyx 4125cabce1 [GVN] Fix a typo in comment
NFC.

Differential Revision: https://reviews.llvm.org/D97200

Reviewed By: fhahn
2021-02-23 10:39:34 +08:00
Jianzhou Zhao 7424efd5ad [dfsan] Propagate origins at non-memory/phi/call instructions
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97200
2021-02-23 02:12:45 +00:00
Petr Hosek c24b7a16b1 [InstrProfiling] Use ELF section groups for counters, data and values
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Differential Revision: https://reviews.llvm.org/D96757
2021-02-22 14:00:02 -08:00
Alexey Bataev 9a4dd4de9d [SLP]No need to mark scatter load pointer as scalar as it gets vectorized.
Pointer operand of scatter loads does not remain scalar in the tree (it
gest vectorized) and thus must not be marked as the scalar that remains
scalar in vectorized form.

Differential Revision: https://reviews.llvm.org/D96818
2021-02-22 11:58:28 -08:00
Petr Hosek 4827492d9f Revert "[InstrProfiling] Use ELF section groups for counters, data and values"
This reverts commits:
5ca21175e0
97184ab99c

The instrprof-gc-sections.c is failing on AArch64 LLD bot.
2021-02-22 11:13:55 -08:00
Florian Hahn 95daec6a84
[ConstraintElimination] Use unsigned > 0 instead of != 0.
ICMP_NE predicates cannot be directly represented as constraint. But we
can use ICMP_UGT instead ICMP_NE for %x != 0.

See https://alive2.llvm.org/ce/z/XlLCsW
2021-02-22 17:54:36 +00:00
Nikita Popov 4125afc357 [MemCpyOpt] Fix handling of readnone byval arguments
If the call is readnone, then there may not be any MemoryAccess
associated with the call. Bail out in that case.

This fixes the issue reported at
https://reviews.llvm.org/D94376#2578312.
2021-02-22 18:48:31 +01:00
Nikita Popov 5e7e499b91 [JumpThreading] Clone noalias.scope.decl when threading blocks
When cloning instructions during jump threading, also clone and
adapt any declared scopes. This is primarily important when
threading loop exits, because we'll end up with two dominating
scope declarations in that case (at least after additional loop
rotation). This addresses a loose thread from
https://reviews.llvm.org/rG2556b413a7b8#975012.

Differential Revision: https://reviews.llvm.org/D97154
2021-02-22 18:35:30 +01:00
Florian Hahn c7ee57f1dc
[LV] Directly use incoming value for single VPBlendRecipes.
VPBlendRecipes with single incoming (value, mask) pair are no-ops. Use
the incoming value directly.
2021-02-22 16:10:08 +00:00
Florian Hahn c11fd0df64
[VPlan] Skip VPWidenPHIRecipe in VPInterleavedACcessInfo.
Update unit tests that did not expect VPWidenPHIRecipes after
15a74b64df.
2021-02-22 10:35:09 +00:00
Florian Hahn 15a74b64df
[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe.
This patch extends VPWidenPHIRecipe to manage pairs of incoming
(VPValue, VPBasicBlock) in the VPlan native path. This is made possible
because we now directly manage defined VPValues for recipes.

By keeping both the incoming value and block in the recipe directly,
code-generation in the VPlan native path becomes independent of the
predecessor ordering when fixing up non-induction phis, which currently
can cause crashes in the VPlan native path.

This fixes PR45958.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D96773
2021-02-22 09:44:25 +00:00
Petr Hosek 5ca21175e0 [InstrProfiling] Use ELF section groups for counters, data and values
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Differential Revision: https://reviews.llvm.org/D96757
2021-02-21 16:13:06 -08:00
Nikita Popov e0615bcd39 [Loads] Add optimized FindAvailableLoadedValue() overload (NFCI)
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.

If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.
2021-02-21 18:42:56 +01:00
Kristina Bessonova e97aab8d15 [ThinLTO] Fix import of multiply defined global variables
Currently, if there is a module that contains a strong definition of
a global variable and a module that has both a weak definition for
the same global and a reference to it, it may result in an undefined symbol error
while linking with ThinLTO.

It happens because:
* the strong definition become internal because it is read-only and can be imported;
* the weak definition gets replaced by a declaration because it's non-prevailing;
* the strong definition failed to be imported because the destination module
  already contains another definition of the global yet this def is non-prevailing.

The patch adds a check to computeImportForReferencedGlobals() that allows
considering a global variable for being imported even if the module contains
a definition of it in the case this def has an interposable linkage type.

Note that currently the check is based only on the linkage type
(and this seems to be enough at the moment), but it might be worth to account
the information whether the def is prevailing or not.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D95943
2021-02-21 18:34:12 +02:00
Jianzhou Zhao 9524632fa2 [dfsan] Comment out unused methods by D97087 temporarily 2021-02-21 03:31:19 +00:00
Sanjay Patel e772618f1e [InstCombine] fold fdiv with exp/exp2 divisor (PR49147)
Follow-up to:
D96648 / b40fde062
...for the special-case base calls.

From the earlier commit:
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.
2021-02-20 16:02:58 -05:00
Teresa Johnson fde55a9c9b [LTO] Fix cloning of llvm*.used when splitting module
Refines the fix in 3c4c205060 to only
put globals whose defs were cloned into the split regular LTO module
on the cloned llvm*.used globals. This avoids an issue where one of the
attached values was a local that was promoted in the original module
after the module was cloned. We only need to have the values defined in
the new module on those globals.

Fixes PR49251.

Differential Revision: https://reviews.llvm.org/D97013
2021-02-20 09:46:43 -08:00
Simon Pilgrim 609d0c9772 [InstCombine] matchBSwapOrBitReverse - remove pattern matching early-out. NFCI.
recognizeBSwapOrBitReverseIdiom + collectBitParts have pattern matching to bail out early if a bswap/bitreverse pattern isn't possible - we should be able to rely on this instead without any notable change in compile time.

This is part of a cleanup towards letting matchBSwapOrBitReverse /recognizeBSwapOrBitReverseIdiom use 'root' instructions that aren't ORs (FSHL/FSHRs in particular which can be prematurely created).

Differential Revision: https://reviews.llvm.org/D97056
2021-02-20 13:15:34 +00:00
Dávid Bolvanský cd54c57919 Reland "[Libcalls, Attrs] Annotate libcalls with noundef"
Fixed Clang tests.
2021-02-20 06:18:48 +01:00
Dávid Bolvanský 94d034fb86 Revert "[Libcalls, Attrs] Annotate libcalls with noundef"
This reverts commit 33b0c63775. Bots are failing. Some Clang tests need to be updated too.
2021-02-20 04:18:42 +01:00
Dávid Bolvanský 33b0c63775 [Libcalls, Attrs] Annotate libcalls with noundef
I think we can use here same logic as for nonnull.

strlen(X) - X must be noundef => valid pointer.

for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)

Reviewed By: jdoerfert, aqjune

Differential Revision: https://reviews.llvm.org/D95122
2021-02-20 04:10:07 +01:00
Dávid Bolvanský 68e6025cf7 Revert "[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly"
This reverts commit 05d891a19e.
2021-02-20 03:58:53 +01:00
Dávid Bolvanský 05d891a19e [BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94850
2021-02-20 03:56:01 +01:00
Jianzhou Zhao dab953c8e4 [dfsan] Add utils that get/set origins
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97087
2021-02-20 00:52:33 +00:00
Jianzhou Zhao cb1f1aab90 [dfsan] Add origin address calculation
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97065
2021-02-19 21:30:07 +00:00
Jianzhou Zhao efc8f3311b [msan] Set cmpxchg shadow precisely
In terms of https://llvm.org/docs/LangRef.html#cmpxchg-instruction,
the return type of chmpxchg is a pair {ty, i1}, while I think we
only wanted to set the shadow for the address 0th op, and it has type
ty.

Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D97029
2021-02-19 20:23:23 +00:00
Wei Mi 4ffad1fb48 [SampleFDO] Add PromotedInsns to prevent repeated ICP.
In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
We use 0 count value profile to memorize which target has been promoted
and prevent repeated ICP for the same target, so we delete PromotedInsns.
However, I found the implementation in the patch has some shortcomings
to be fixed otherwise there will still be repeated ICP. So I add
PromotedInsns back temorarily. Will remove it after I get a thorough fix.
2021-02-19 10:01:49 -08:00
Benjamin Kramer 59f442e6bb [LV] Fold single-use variable into assert. NFC. 2021-02-19 18:11:39 +01:00
Nikita Popov 71a8e4e7d6 [MemCopyOpt] Enable MemorySSA by default
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.

Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.

Differential Revision: https://reviews.llvm.org/D94376
2021-02-19 18:06:25 +01:00
Florian Hahn edc92a1c42
[LV] Remove VPCallback.
Now that all state for generated instructions is managed directly in
VPTransformState, VPCallBack is no longer needed. This patch updates the
last use of `getOrCreateScalarValue` to instead manage the value
directly in VPTransformState and removes VPCallback.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95383
2021-02-19 12:50:41 +00:00
Nikita Popov 2f17ed294f [DCE] Don't remove non-willreturn calls
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.

Differential Revision: https://reviews.llvm.org/D96993
2021-02-19 12:35:40 +01:00
Nikita Popov 370addb996 [IR] Move willReturn() to Instruction
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).

I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)

Differential Revision: https://reviews.llvm.org/D96992
2021-02-19 11:56:01 +01:00
Djordje Todorovic 1a2b3536ef Reland "[Debugify] Make the debugify aware of the original (-g) Debug Info"
As discussed on the RFC [0], I am sharing the set of patches that
    enables checking of original Debug Info metadata preservation in
    optimizations. The proof-of-concept/proposal can be found at [1].

    The implementation from the [1] was full of duplicated code,
    so this set of patches tries to merge this approach into the existing
    debugify utility.

    For example, the utility pass in the original-debuginfo-check
    mode could be invoked as follows:

      $ opt -verify-debuginfo-preserve -pass-to-test sample.ll

    Since this is very initial stage of the implementation,
    there is a space for improvements such as:
      - Add support for the new pass manager
      - Add support for metadata other than DILocations and DISubprograms

    [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
    [1] https://github.com/djolertrk/llvm-di-checker

    Differential Revision: https://reviews.llvm.org/D82545

The test that was failing is now forced to use the old PM.
2021-02-18 23:29:22 -08:00
Xun Li 3bf8f162a0 [Coroutine] Relax CoroElide musttail check
As discussed in D94834, we don't really need to do complicated analysis. It's safe to just drop the tail call attribute.

Differential Revision: https://reviews.llvm.org/D96926
2021-02-18 19:36:11 -08:00
Wei Mi 5fb65c02ca [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Jianzhou Zhao 7e658b2fdc [dfsan] Instrument origin variable and function definitions
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse, gbalats

Differential Revision: https://reviews.llvm.org/D96977
2021-02-18 23:50:05 +00:00
Hongtao Yu e87b1b1d4e [CSSPGO] Use callsite sample counts to annotate indirect call sites.
With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling.

I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D96990
2021-02-18 14:52:34 -08:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Ta-Wei Tu f70cdc5b5c [NPM] Properly reset parent loop after loop passes
This fixes https://bugs.llvm.org/show_bug.cgi?id=49185

When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`.
If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes.
However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes.

This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`.

Reviewed By: asbirlea, ychen

Differential Revision: https://reviews.llvm.org/D96727
2021-02-19 02:50:53 +08:00
Jianzhou Zhao 406dc54903 [dfsan] Refactor defining TLS variables
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96941
2021-02-18 18:04:21 +00:00
Jianzhou Zhao 2e6cd338c6 [dfsan] Refactor runtime functions checking
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96940
2021-02-18 18:01:46 +00:00
Philip Reames 8666463889 [instcombine] Exploit UB implied by nofree attributes
This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred.

Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug.

Differential Revision: https://reviews.llvm.org/D96349
2021-02-18 08:34:22 -08:00
Djordje Todorovic c1e23894fc Revert "[Debugify] Make the debugify aware of the original (-g) Debug Info"
This reverts rG8ee7c7e02953.
One test is failing, I'll reland this as soon as possible.
2021-02-18 02:04:27 -08:00
Djordje Todorovic 8ee7c7e029 [Debugify] Make the debugify aware of the original (-g) Debug Info
As discussed on the RFC [0], I am sharing the set of patches that
enables checking of original Debug Info metadata preservation in
optimizations. The proof-of-concept/proposal can be found at [1].

The implementation from the [1] was full of duplicated code,
so this set of patches tries to merge this approach into the existing
debugify utility.

For example, the utility pass in the original-debuginfo-check
mode could be invoked as follows:

  $ opt -verify-debuginfo-preserve -pass-to-test sample.ll

Since this is very initial stage of the implementation,
there is a space for improvements such as:
  - Add support for the new pass manager
  - Add support for metadata other than DILocations and DISubprograms

[0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
[1] https://github.com/djolertrk/llvm-di-checker

Differential Revision: https://reviews.llvm.org/D82545
2021-02-18 01:52:16 -08:00
Joseph Huber c3a3d20093 [LV] Add analysis remark for mixed precision conversions
Floating point conversions inside vectorized loops have performance
implications but are very subtle. The user could specify a floating
point constant, or call a function without realizing that it will
force a change in the vector width. An example of this behaviour is
seen in https://godbolt.org/z/M3nT6c . The vectorizer should indicate
when this happens becuase it is most likely unintended behaviour.

This patch adds a simple check for this behaviour by following floating
point stores in the original loop and checking if a floating point
conversion operation occurs.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D95539
2021-02-17 21:37:08 -05:00
Teresa Johnson d55d46f43b [WPD] Add an optional checking mode for debugging devirtualization
This adds an internal option -wholeprogramdevirt-check which if enabled
will guard each devirtualization with a runtime check against the
expected target, and an invocation of a debug trap if the check fails.
This is useful for debugging WPD failures involving undefined behavior
(e.g. casting to another class type not in the inheritance chain).

Differential Revision: https://reviews.llvm.org/D95969
2021-02-17 16:46:15 -08:00
Rong Xu 7397905ab0 [SampleFDO] Third Try: Refactor SampleProfile.cpp
Apply the patch for the third time after fixing buildbot failures.

Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
(4) Add inline keyword to avoid duplicated symbols -- they will
be removed later when the class is changed to a template.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-17 15:31:50 -08:00
Teresa Johnson 3c4c205060 [WPD][lld] Test handling of vtable definition from shared libraries
Adds a lld test for a case that the handling added for dynamically
exported symbols in 1487747e99 already
fixes. Because isExportDynamic returns true when the symbol is
SharedKind with default visibility, it will treat as dynamically
exported and block devirtualization when the definition of a vtable
comes from a shared library. This is desireable as it is dangerous to
devirtualize in that case, since there could be hidden overrides in the
shared library. Typically that happens when the shared library header
contains available externally definitions, which applications can
override. An example is std::error_category, which is overridden in LLVM
and causing failures after a self build with WPD enabled, because
libstdc++ contains hidden overrides of the virtual base class methods.

The regular LTO case in the new test already worked, but there are
2 fixes in this patch needed for the index-only case and the hybrid
LTO case. For the index-only case, WPD should not simply ignore
available externally vtables. A follow on fix will be made to clang to
emit type metadata for those vtables, which the new test is modeling.
For the hybrid case, we need to ensure when the module is split that any
llvm.*used globals are cloned to the regular LTO split module so
available externally vtable definitions are not prematurely deleted.

Another follow on fix will add the equivalent gold test, which requires
a small fix to the plugin to treat symbols in dynamic libraries the same
way lld already is.

Differential Revision: https://reviews.llvm.org/D96721
2021-02-17 12:49:24 -08:00
Vedant Kumar c28622fbf3 Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp"
Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455"

This reverts commit c73cbf218a.

Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)"

This reverts commit a23e6b321c.

Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp"

This reverts commit 6fd5ccff72.

Still seeing link failures when building llc (or other tools), due to
the new SampleProfileLoaderBaseImpl.h containing definitions that get
duplicated across multiple TU's.

```
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const*)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const*) const' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const*)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const*, llvm::BasicBlock const*>)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
```
2021-02-17 10:22:24 -08:00
William S. Moses 40862b1a74 [SROA] Propagate correct TBAA/TBAA Struct offsets
SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.

Differential Revision: https://reviews.llvm.org/D95826
2021-02-17 11:59:00 -05:00
Ta-Wei Tu 0eeaec2a6d [NFC] Refactor LoopInterchange into a loop-nest pass
This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change.
Changes that are not loop-nest related are split to D96650.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D96644
2021-02-18 00:55:38 +08:00
Sjoerd Meijer f78aa8b2c2 [LSR] Add a flag that overrides the target's preferred addressing mode
This adds a new flag -lsr-preferred-addressing-mode to override the target's
preferred addressing mode. It replaces flag -lsr-backedge-indexing, which is
equivalent to preindexed addressing that is one of the options that
-lsr-preferred-addressing-mode accepts.

Differential Revision: https://reviews.llvm.org/D96855
2021-02-17 16:50:21 +00:00
Sanjay Patel 85294703a7 [InstCombine] fold fcmp-of-copysign idiom
As discussed in:
https://llvm.org/PR49179
...this pattern shows up in library code.
There are several potential generalizations as noted,
but we need to be careful that we get FP special-values
right, and it's not clear how much variation we should
expect to see from this exact idiom.
2021-02-17 10:32:33 -05:00
Sjoerd Meijer 5a641cf194 Follow up of rGdea4a63e6359, which committed a slightly different version than
intended.
2021-02-17 10:07:26 +00:00
Sjoerd Meijer dea4a63e63 [LSR] Cleanup of getPreferredAddresingMode. NFC.
This is a follow up D96600 and cleans up most calls to
getPreferredAddresingMode. I.e., we really don't need to query the same things
again and again, but get the preferred addressing mode once for each loop. So
this should be a lot friendlier for compile times, especially if we start
implementing getPreferredAddresingMode.

Differential Revision: https://reviews.llvm.org/D96772
2021-02-17 09:45:29 +00:00
Rong Xu 6fd5ccff72 [SampleFDO] Reapply: Refactor SampleProfile.cpp
Reapply patch after fixing buildbot failure.
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-16 16:43:21 -08:00
Mehdi Amini c761fe77bd Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp"
This reverts commit 310b35304c.
The build is broken with -DBUILD_SHARED_LIBS=ON :

lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const*, llvm::ProfileSummaryInfo*, bool)':
SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const'
SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const'
...
2021-02-16 22:11:42 +00:00
Rong Xu 310b35304c [SampleFDO][NFC] Refactor SampleProfile.cpp
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-16 11:18:21 -08:00
Arnold Schwaighofer 627cfd4394 [coro async] Don't promote allocas to the frame or rewrite swifterror if there are no suspend points
Also don't call function to update the call graph if there are no
clones. The function will fail.

rdar://74277860

Differential Revision: https://reviews.llvm.org/D96620
2021-02-16 09:05:38 -08:00
Ta-Wei Tu 6b612a7baf [NFC][LoopInterchange] Explicitly pass both `InnerLoop` and `OuterLoop` to `processLoop`
This is a split patch of D96644.

Explicitly pass both `InnerLoop` and `OuterLoop` to function `processLoop` to remove the need to swap elements in loop list and allow making loop list an `ArrayRef`.
Also, fix inconsistent spellings of `OuterLoopId` and `Inner Loop Id` in debug log.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96650
2021-02-16 22:17:44 +08:00
Florian Hahn f64c626069
[VPlan] Remove unused Phi member from VPWidenPHIRecipe (NFC).
The member is not needed any longer after recent changes.
2021-02-16 13:53:06 +00:00
Kerry McLaughlin ba1e150d03 [SVE] Add support for scalable vectorization of loops with int/fast FP reductions
This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

```
unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}
```

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Reviewed By: david-arm, fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D95245
2021-02-16 13:50:06 +00:00
Sander de Smalen 00fe10c6a6 [SCEVExpander] Migrate costAndCollectOperands to use InstructionCost.
This patch changes costAndCollectOperands to use InstructionCost for
accumulated cost values.

isHighCostExpansion will return true if the cost has exceeded the budget.

Reviewed By: CarolineConcatto, ctetreau

Differential Revision: https://reviews.llvm.org/D92238
2021-02-16 09:27:34 +00:00
Florian Hahn 54a14c264a
[VPlan] Manage scalarized values using VPValues.
This patch updates codegen to use VPValues to manage the generated
scalarized instructions.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D92285
2021-02-16 09:04:10 +00:00
Akira Hatanaka 32dc79c5ef [ObjC][ARC] Do not perform code motion on precise release calls
This fixes a bug where an object can get deallocated before reaching the
end of its full formal lifetime.

rdar://72110887
rdar://74123176
2021-02-15 17:39:37 -08:00
Duncan P. N. Exon Smith 22a52dfddc TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto)
This commit fixes how metadata is handled in CloneModule to be sound,
and improves how it's handled in CloneFunctionInto (although the latter
is still awkward when called within a module).

Ruiling Song pointed out in PR48841 that CloneModule was changed to
unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in
fa35c1f80f for clarity). This flag papered
over a crash caused by other various changes made to CloneFunctionInto
over the past few years that made it unsound to use cloning between
different modules.

(This commit partially addresses PR48841, fixing the repro from
preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode
became unsound in df763188c9 and this
commit does not address that regression.)

RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use,
avoiding unnecessary clones of all referenced metadata when linking
between modules (with IRMover, the source module is discarded after
linking). It never makes sense to use when you're not discarding the
source. This commit drops its incorrect use in CloneModule.

Sadly, the right thing to do with metadata when cloning a function is
complicated, and this patch doesn't totally fix it.

The first problem is that there are two different types of referenceable
metadata and it's not obvious what to with one of them when remapping.

- `!0 = !{!1}` is metadata's version of a constant. Programatically it's
  called "uniqued" (probably a better term would be "constant") because,
  like `ConstantArray`, it's stored in uniquing tables. Once it's
  constructed, it's illegal to change its arguments.
- `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal
  to change the operands after construction.

What should be done with distinct metadata when cloning functions within
the same module?

- Should new, cloned nodes be created?
- Should all references point to the same, old nodes?

The answer depends on whether that metadata is effectively owned by a
function.

And that's the second problem. Referenceable metadata's ownership model
is not clear or explicit. Technically, it's all stored on an
LLVMContext. However, any metadata that is `distinct`, that transitively
references a `distinct` node, or that transitively references a
GlobalValue is specific to a Module and is effectively owned by it. More
specifically, some metadata is effectively owned by a specific Function
within a module.

Effectively function-local metadata was introduced somewhere around
c10d0e5ccd, which made it illegal for two
functions to share a DISubprogram attachment.

When cloning a function within a module, you need to clone the
function-local debug info and suppress cloning of global debug info (the
status quo suppresses cloning some global debug info but not all). When
cloning a function to a new/different module, you need to clone all of
the debug info.

Here's what I think we should do (eventually? soon? not this patch
though):
- Distinguish explicitly (somehow) between pure constant metadata owned
  by the LLVMContext, global metadata owned by the Module, and local
  metadata owned by a GlobalValue (such as a function).
- Update CloneFunctionInto to trigger cloning of all "local" metadata
  (only), perhaps by adding a bit to RemapFlag. Alternatively, split
  out a separate function CloneFunctionMetadataInto to prime the
  metadata map that callers are updated to call ahead of time as
  appropriate.

Here's the somewhat more isolated fix in this patch:
- Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to
  an enum called `CloneFunctionChangeType` that is one of
  LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule.
- The code maintaining the "functions uniquely own subprograms"
  invariant is now only active in the first two cases, where a function
  is being cloned within a single module. That's necessary because this
  code inhibits cloning of (some) "global" metadata that's effectively
  owned by the module.
- The code maintaining the "all compile units must be explicitly
  referenced by !llvm.dbg.cu" invariant is now only active in the
  DifferentModule case, where a function is being cloned into a new
  module in isolation.
- CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create
  uses LocalChangeOnly, since fa635d730f
  only set `ModuleLevelChanges` to trigger cloning of local metadata.
- CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs
  and special handling of !llvm.dbg.cu.
- Fixed some outdated header docs and left a couple of FIXMEs.

Differential Revision: https://reviews.llvm.org/D96531
2021-02-15 11:56:00 -08:00
Sjoerd Meijer 357237e93e Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit effc3b0799, with the build
problem fixed.
2021-02-15 11:33:00 +00:00
Max Kazantsev e3c759bd58 [LoopLoadElim] Pass ScalarEvolution in old pass manager. PR49141
Loop canonicalization may end up deleting blocks from CFG. And
Scalar Evolution may still keep cached referenced to those blocks
unless updated properly.
2021-02-15 18:08:23 +07:00
Sjoerd Meijer effc3b0799 Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit cd6de0e8de.
2021-02-15 11:01:23 +00:00
Sjoerd Meijer cd6de0e8de [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode
This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into
getPreferredAddressingMode() so that we have one interface to steer LSR in
generating the preferred addressing mode.

Differential Revision: https://reviews.llvm.org/D96600
2021-02-15 10:44:15 +00:00
Florian Hahn 3df5d5aace [ConstraintElimination] Fix variables used for pattern matching.
Re-using the matched variable in the pattern does not work as expected.
This patch fixes that by introducing a new variable for the 2nd level
match.
2021-02-14 18:42:37 +00:00
Kazu Hirata 910e2d1e57 [llvm] Use llvm::is_contained (NFC) 2021-02-14 08:36:20 -08:00
Sanjay Patel b40fde062c [InstCombine] fold fdiv with pow divisor (PR49147)
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.

This is part of solving:
https://llvm.org/PR49147
(The powi variant potentially has a different constraint.)

Differential Revision: https://reviews.llvm.org/D96648
2021-02-14 08:07:36 -05:00
Juneyoung Lee ed253ef772 [LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask
This patch fixes pr48832 by correctly generating the mask when a poison value is involved.

Consider this CFG (which is a part of the input):

```
for.body:                                         ; preds = %for.cond
  br i1 true, label %cond.false, label %land.rhs

land.rhs:                                         ; preds = %for.body
  br i1 poison, label %cond.end, label %cond.false

cond.false:                                       ; preds = %for.body, %land.rhs
  br label %cond.end

cond.end:                                         ; preds = %land.rhs, %cond.false
  %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ]

```

The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead.
The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation.

SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026).

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D95217
2021-02-14 21:12:34 +09:00
Teresa Johnson a80232bd5f [LTT] Address post-review comments (NFC)
Implement some post-review cleanup suggestions for D96083.
2021-02-13 15:52:59 -08:00
Tyker 642e9225c6 reland [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-13 13:03:11 +01:00
Wei Wang 80dc0661bd [LTO] Perform DSOLocal propagation in combined index
Perform DSOLocal propagation within summary list of every GV. This
avoids the repeated query of this information during function
importing.

Differential Revision: https://reviews.llvm.org/D96398
2021-02-12 22:58:26 -08:00
Arnold Schwaighofer e760ec2a01 [coro] Add support for polymorphic return typed coro.suspend.async
This allows for suspend point specific resume function types.

Return values from a suspend point can therefore be modelled as
arguments to the resume function. Allowing for directly passed return
types.

Differential Revision: https://reviews.llvm.org/D96136
2021-02-12 10:08:00 -08:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Kerry McLaughlin fea06efe7c [SVE][LoopVectorize] Support for vectorization of loops with function calls
Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs
and adds some simple tests for loops containing a function for which
there is a vectorized variant available.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96356
2021-02-12 13:47:43 +00:00
Sanjay Patel 79b1b4a581 [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247

So I think it is safe to now remove this complication from IR.

Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.

I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.

If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.

Differential Revision: https://reviews.llvm.org/D96552
2021-02-12 08:13:50 -05:00
Florian Hahn 85fe5c9345
[VPlan] Make VPRecipeBase inherit from VPUser directly (NFC).
The individual recipes have been updated to manage their operands using
VPUser a while back. Now that the transition is done, we can instead
make VPRecipeBase a VPUser and get rid of the toVPUser helper.
2021-02-12 13:06:58 +00:00
David Sherwood 01b87444cb [NFC][Analysis] Change struct VecDesc to use ElementCount
This patch changes the VecDesc struct to use ElementCount
instead of an unsigned VF value, in preparation for
future work that adds support for vectorized versions of
math functions using scalable vectors. Since all I'm doing
in this patch is switching the type I believe it's a
non-functional change. I changed getWidestVF to now return
both the widest fixed-width and scalable VF values, but
currently the widest scalable value will be zero.

Differential Revision: https://reviews.llvm.org/D96011
2021-02-12 11:07:58 +00:00
David Sherwood 9700228abc [Analysis] Change VFABI::mangleTLIVectorName to use ElementCount
Adds support for mangling TLI vector names for scalable vectors.

Differential Revision: https://reviews.llvm.org/D96338
2021-02-12 09:38:12 +00:00
Kazu Hirata 9dc62d1dc1 [PGO] Drop unnecessary const from return types (NFC) 2021-02-11 23:31:29 -08:00
Hongtao Yu de40f6d623 [CSSPGO] Process functions in a top-down order on a dynamic call graph.
Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining.

The profile top-down order for SCC is also extended to support non-CS profiles.

Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95988
2021-02-11 12:36:59 -08:00
Michael Kruse 606aa622b2 Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache"
This reverts commit b7d870eae7 and the
subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)"
(commit e6810cab09).

It caused indeterminism in the output, such that e.g. the
polly-x86_64-linux buildbot failed accasionally.
2021-02-11 12:17:38 -06:00
Sander de Smalen 703130fb01 [TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount
This will be needed in the loop-vectorizer where the minimum VF
requested may be a scalable VF. getMinimumVF now takes an additional
operand 'IsScalableVF' that indicates whether a scalable VF is required.

Reviewed By: kparzysz, rampitec

Differential Revision: https://reviews.llvm.org/D96020
2021-02-11 09:08:48 +00:00
Markus Lavin 9498315c9b Expand masked mem intrinsics correctly wrt big-endian
Need to take endianness into account when doing vector to scalar casts
such as %bc = bitcast <8 x i1> %v to i8
Companion commit for https://reviews.llvm.org/D94867
Upload in response to
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147862.html
Attempting to document the actual memory layout rules for vectors in
https://reviews.llvm.org/D94964

Differential Revision: https://reviews.llvm.org/D94765
2021-02-11 08:59:52 +00:00
Sander de Smalen be9bbb57f4 [LoopVectorize] NFC: Change selectVectorizationFactor to work on ElementCount.
This patch is NFC and changes occurrences of `unsigned Width`
and `unsigned i` to work on type ElementCount instead.

This patch is a preparatory patch with the ultimate goal of making
`computeMaxVF()` return both a max fixed VF and a max scalable VF,
so that `selectVectorizationFactor()` can pick the most cost-effective
vectorization factor.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96019
2021-02-11 08:47:59 +00:00
Kazu Hirata d12a0f4fc0 [GCOV] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-02-10 20:01:18 -08:00
Duncan P. N. Exon Smith fa35c1f80f ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC
Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and
`MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more
precisely describe its effect and clarify the header documentation.

Found this while helping to investigate PR48841, which pointed out an
unsound use of the flag in `CloneModule()`. For now I've just added a
FIXME there, but I'm hopeful that the new (more precise) name will
prevent other similar errors.
2021-02-10 16:53:21 -08:00
Benjamin Kramer 8fb4a4f7bb [SampleFDO] Silence -Wnon-virtual-dtor warning
There's no polymorphic deletion happening here.
2021-02-10 23:37:15 +01:00
Rong Xu db0d7d0ba9 [SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen
Break SampleProfileLoader into to a base and a derived class.
Base class (SampleProfileLoaderBaseImpl) includes the common
code for IR and MachineIR (CodeGen) sample loader.
It will be templatelized in the later patch.

Inline and Probe related code will remain in the derived class of
SampleProfileLoader and stays in SampleProfile.cpp.

We need to refactor some functions:
(1) getInstWeight() to enable the code sharing -- put the core into
getInstWeightImpl().
(2) emitAnnotation() and propagateWeights() to carve out the code
specific to SampleProfileLoader.
(3) make getInstWeight() and findFunctionSamples() virtual and override
in SampleProfileLoader as they need to access the fields in the derived
class.

Differential Revision: https://reviews.llvm.org/D95832
2021-02-10 13:29:15 -08:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Adrian Prantl 19fc8eede4 Add missing nullptr check.
salvageDebugInfoImpl() may fail and return a nullptr.
2021-02-10 12:15:24 -08:00
Sanjay Patel 6e2053983e [InstCombine] fold lshr(mul X, SplatC), C2
This is a special-case multiply that replicates bits of
the source operand. We need this fold to avoid regression
if we make canonicalization to `mul` more aggressive for
shl+or patterns.

I did not see a way to make Alive generalize the bit width
condition for even-number-of-bits only, but an example of
the proof is:
  Name: i32
  Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2))
  %m = mul nuw i32 %x, C1
  %t = lshr i32 %m, C2
  =>
  %t = and i32 %x, C1 - 2

  Name: i14
  %m = mul nuw i14 %x, 129
  %t = lshr i14 %m, 7
  =>
  %t = and i14 %x, 127

https://rise4fun.com/Alive/e52
2021-02-10 15:02:31 -05:00
Sander de Smalen 9db6e97a86 [LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount.
This patch is NFC and changes occurrences of `unsigned MaxVectorSize`
to work on type ElementCount.

This patch is a preparatory patch with the ultimate goal of making
`computeMaxVF()` return both a max fixed VF and a max scalable VF,
so that `selectVectorizationFactor()` can pick the most cost-effective
vectorization factor.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D96018
2021-02-10 08:52:10 +00:00
Tyker 5652e192fc Revert "[InstCombine] convert assumes to operand bundles"
This reverts commit 5eb2e994f9.
2021-02-10 01:32:00 +01:00
Florian Hahn fd8afa41eb
[VPlan] Use VPUser to manage CondBit
VP blocks keep track of a condition, which is a VPValue. This patch
updates VPBlockBase to manage the value using VPUser, so
replaceAllUsesWith properly updates the condition bit as well.

This is required to enable VP2VP transformations and it helps with
simplifying some of the code required to manage condition bits.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95382
2021-02-09 21:53:50 +00:00
Johannes Doerfert 81429abd83 [Attributor][FIX] Do not create UB by introducing a `noundef undef`
This was reported as PR49104. The reproducer uses varargs but the issue
is the same, we know an argument is dead but can't change the signature
for some reason. The PR49104 situation was: We are in an CG-SCC
traversal and we remove all the uses of an argument and proof it thereby
dead. However, if we do not remove the argument, via signature rewrite,
we need to ensure that the `undef` we introduce at the call site doesn't
clash with a `noundef` attribute.
2021-02-09 13:02:38 -06:00
Tyker 5eb2e994f9 [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-09 19:33:53 +01:00
Jianzhou Zhao 9887fdebd6 [dfsan] Refactor loadShadow
To simplify the review of https://reviews.llvm.org/D95835.

Reviewed-by: gbalats, morehouse

Differential Revision: https://reviews.llvm.org/D96180
2021-02-09 17:21:41 +00:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Chuanqi Xu 88d7876e1e [NFC] [Coroutine] Remove Unused Variables 2021-02-09 15:55:06 +08:00
Kazu Hirata 302313a264 [Transforms] Use range-based for loops (NFC) 2021-02-08 22:33:53 -08:00
Kazu Hirata de6c49ae31 [Transforms/Utils] Drop unnecessary const from a return type (NFC)
Identified with const-return-type.
2021-02-08 22:33:49 -08:00
Jinsong Ji 9202806241 Revert "[CostModel] Remove VF from IntrinsicCostAttributes"
This reverts commit 502a67dd7f.

This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.

Thanks Dave.
2021-02-09 02:14:14 +00:00
Arthur Eubanks 0eda454796 [SimpleLoopUnswitch] Don't non-trivially unswitch loops that are unsafe to clone
Non-trivial unswitching can clone loops.

The legacy -loop-unswitch pass also checks for this.

Fixes PR49085.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D96288
2021-02-08 13:19:24 -08:00
Jianzhou Zhao 64b448b983 [dfsan] Refactor visitCallBase
To simplify the review of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96177
2021-02-08 19:55:18 +00:00
Florian Hahn 68dc90b347
[ConstraintElimination] Decompose a few more GEP indices.
This patch adds handling for zero-extended GEP indices.
2021-02-08 18:06:38 +00:00
Florian Hahn 1f1f037ed3
[ConstraintElimination] Improve index handing during constraint building.
This patch improves the index management during constraint building.
Previously, the code rejected constraints which used values that were not
part of Value2Index, but after combining the coefficients of the new
indices were 0 (if ShouldAdd was 0).

In those cases, no new indices need to be added. Instead of adding to
Value2Index directly, add new indices to the NewIndices map. The caller
can then check if it needs to add any new indices.

This enables checking constraints like `a + x <= a + n` to `x <= n`,
even if there is no constraint for `a` directly.
2021-02-08 13:05:13 +00:00
Florian Hahn ca268ed285
[ConstraintElimination] Decompose zext for unsigned compares.
For unsigned compares, zext should be a no-op and we can add the
extended value to the constraint system.
2021-02-07 20:53:06 +00:00
Florian Hahn 3bb6dc0b26
[LV] Replace some uses of VectorLoopValueMap with VPTransformState (NFC)
This patch updates some places where VectorLoopValueMap is accessed
directly to instead go through VPTransformState.

As we move towards managing created values exclusively in VPTransformState,
this ensures the use always can fetch the correct value.

This is in preparation for D92285, which switches to managing scalarized
values through VPValues.

In the future, the various fix* functions should be moved directly into
the VPlan codegen stage.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95757
2021-02-07 18:28:21 +00:00
Kazu Hirata be23012d5a [Transforms/Utils] Use range-based for loops (NFC) 2021-02-07 09:49:36 -08:00
Sanjay Patel 6fd91be354 [Reassociate] allow or->add with shl operands
As discussed in:
https://llvm.org/PR49055

We invert instcombine's add->or transform here
because it makes it easier to identify factorization
transforms like the mul in the motivating test.

This extends the logic added with:
https://reviews.llvm.org/rG70472f3
https://reviews.llvm.org/rG93f3d7f

(I intentionally kept the formatting fix in this patch
to provide more context about the calling logic.)
2021-02-07 09:45:19 -05:00
Florian Hahn 853c52c988
[ConstraintElimination] Require GEPs to be inbounds for decomposition.
During decomposition, we assume the no-wrap properties of inbound GEPs.
Make sure the decomposed GEP is actually inbounds.
2021-02-07 11:08:53 +00:00
Johannes Doerfert b7d870eae7 [AssumptionCache] Avoid dangling llvm.assume calls in the cache
PR49043 exposed a problem when it comes to RAUW llvm.assumes. While
D96106 would fix it for GVNSink, it seems a more general concern. To
avoid future problems this patch moves away from the vector of weak
reference model used in the assumption cache. Instead, we track the
llvm.assume calls with a callback handle which will remove itself from
the cache if the call is deleted.

Fixes PR49043.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96168
2021-02-06 12:18:39 -06:00
Teresa Johnson 3a27933ec2 [LTT] Don't attempt to lower type tests used only by assumes
Type tests used only by assumes were original for devirtualization, but
are meant to be kept through the first invocation of LTT so that they
can be used for additional optimization. In the regular LTO case where
the IR is analyzed we may find a resolution for the type test and end up
rewriting the associated vtable global, which can have implications on
section splitting. Simply ignore these type tests.

Fixes PR48245.

Differential Revision: https://reviews.llvm.org/D96083
2021-02-06 09:02:10 -08:00
Sander de Smalen 79a6cfc29e NFC: Migrate LoopIdiomRecognize to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
2021-02-06 14:39:19 +00:00
Sander de Smalen ae27274b2f NFC: Migrate LoopFlatten to work on InstructionCost.
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96029
2021-02-06 11:47:04 +00:00
Kazu Hirata ea3175c15b [Transforms/Instrumentation] Use range-based for loops (NFC) 2021-02-05 21:02:08 -08:00
Sidharth Baveja 22ebbc4765 LoopUnrollAndJam] Only allow loops with single exit(ing) blocks
Summary:
This resolves an issue posted on Bugzilla. https://bugs.llvm.org/show_bug.cgi?id=48764
In this issue, the loop had  multiple exit blocks, which resulted in the
function getExitBlock to return a nullptr, which resulted in hitting the assert.
This patch ensures that loops which only have one exit block as allowed to be
unrolled and jammed.

Reviewed By: Whitney, Meinersbur, dmgreen

Differential Revision: https://reviews.llvm.org/D95806
2021-02-05 16:10:53 +00:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Arnold Schwaighofer 8a7f5ad0fd We can only move static allocas into the resume entry points
Dynamic allocas that still exist have been verified to be only used
'locally' not accross a suspend point.

rdar://73903220

Differential Revision: https://reviews.llvm.org/D96071
2021-02-05 06:06:10 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
Adrian Kuegel 7fe41ac3df Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
This reverts commit 3e5ce49e53.

Tests started failing on PPC, for example:
http://lab.llvm.org:8011/#/builders/105/builds/5569
2021-02-05 12:51:03 +01:00
Simon Pilgrim f7d07dbb29 IROutliner.cpp - fix Wdocumentation warning. NFCI.
Remove duplicate param
2021-02-05 11:38:09 +00:00
Simon Pilgrim 476b912e7c SampleProfile.cpp - fix Wdocumentation warning. NFCI.
Remove duplicate param
2021-02-05 11:31:17 +00:00
Simon Pilgrim 89edda7084 IROutliner.cpp - fix Wdocumentation warnings. NFCI. 2021-02-05 11:21:00 +00:00
David Green 502a67dd7f [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-05 09:34:24 +00:00
Kazu Hirata fb74e1e78a [Transforms/Scalar] Use range-based for loops (NFC) 2021-02-04 21:18:05 -08:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Philip Reames 3e5ce49e53 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-02-04 17:28:30 -08:00
Richard Smith ab243efb26 Don't infer attributes on '::operator new'.
These attributes were all incorrect or inappropriate for LLVM to infer:
- inaccessiblememonly is generally wrong; user replacement operator new
  can access memory that's visible to the caller, as can a new_handler
  function.
- willreturn is generally wrong; a custom new_handler is not guaranteed
  to terminate.
- noalias is inappropriate: Clang has a flag to determine whether this
  attribute should be present and adds it itself when appropriate.
- noundef and nonnull on the return value should be specified by the
  frontend on all 'operator new' functions if we want them, not here.

In any case, inferring attributes on functions declared 'nobuiltin' (as
these are when Clang emits them) seems questionable.
2021-02-04 13:59:49 -08:00
Richard Smith 1484ad4137 Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete"
Several of the new attributes here were incorrect, and even the ones
that are generally correct were being added even to nobuiltin calls.

This reverts commit bb3f169b59.
2021-02-04 13:59:49 -08:00
Sander de Smalen 75b2555d6e NFC: Migrate LoopUnrollPass to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D95817
2021-02-04 14:05:40 +00:00
Florian Hahn 703f6a6828
[ConstraintElimination] Support conditions from loop preheaders
This patch extends the condition collection logic to allow adding
conditions from pre-headers to loop headers, by allowing cases where the
target block dominates some of its predecessors.
2021-02-04 13:58:32 +00:00
Chuanqi Xu 9511fa2dda [NFC][Coroutine] Remove redundant comment
The functionallity in the TODO was added before:
https://reviews.llvm.org/rGb3a722e66b75328ab5e2eb5c8572022cb083855b
2021-02-04 12:54:30 +08:00