Commit Graph

5435 Commits

Author SHA1 Message Date
Dávid Bolvanský d828281e78 [AlwaysInliner] Respect noinline call site attribute
```
always_inline foo() { }

bar () {

noinline foo();
}
```

We should prefer call site attribute over attribute on decl. This is fix for AlwaysInliner, similar fix is needed for normal Inliner (follow up).

Related to https://reviews.llvm.org/D119061

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D119553
2022-02-11 19:23:11 +01:00
Nikita Popov e24067819f [ArgPromotion] Protect harder against recursive promotion (PR42028)
In addition to the self-recursion check, also check whether there
is more than one node in the SCC, which implies that there is a
larger cycle. I believe checking SCC structure (rather than
something like norecurse) is the right thing to do here, because
this is specifically about preventing infinite loops over the SCC.

Fixes https://github.com/llvm/llvm-project/issues/42028.

Differential Revision: https://reviews.llvm.org/D119418
2022-02-11 09:30:39 +01:00
Teresa Johnson dd3f483335 [ThinLTO][WPD] LICM set lookup (NFC)
Minor efficiency fix. There is no reason to perform the same set lookup
repeatedly in the inner loop as it is invariant there.

Differential Revision: https://reviews.llvm.org/D119474
2022-02-10 13:16:31 -08:00
Johannes Doerfert dd75c0ea64 [Attributor][NFC] Expose new API in AAPointerInfo
New users might want to check bins without a load or store instruction
at hand. Since we use those instructions only to find the offset and
size of the access anyway, we can expose an offset and size interface
to the outside world as well.

This commit mainly moves code around and exposes a class (OffsetAndSize)
as well as a method forallInterferingAccesses in AAPointerInfo.

Differential Revision: https://reviews.llvm.org/D119249
2022-02-10 13:52:24 -06:00
Johannes Doerfert d1387a26a5 [Attributor][FIX] Reachability needs to account for readonly callees
The oversight caused us to ignore call sites that are effectively dead
when we computed reachability (or more precise the call edges of a
function). The problem is that loads in the readonly callee might depend
on stores prior to the callee. If we do not track the call edge we
mistakenly assumed the store before the call cannot reach the load.
The problem is nicely visible in:
  `llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll`

Caused by D118673.

Fixes https://github.com/llvm/llvm-project/issues/53726
2022-02-10 13:52:24 -06:00
Johannes Doerfert e39b419312 [Attributor][FIX] Honor alloca address space in AAPrivatizablePtr
When we privatize a pointer (~argument promotion) we introduce new
private allocas as replacement. These need to be placed in the alloca
address space as later passes cannot properly deal with them otherwise.

Fixes https://github.com/llvm/llvm-project/issues/53725
2022-02-10 13:52:24 -06:00
Nikita Popov 8018d6be34 [ArgPromotion] Transfer metadata to promoted loads
Also transfer selected non-AA metadata to the promoted load.
Only metadata from guaranteed to execute loads is transferred.
2022-02-10 11:28:07 +01:00
Nikita Popov 68c1eeb4ba [ArgPromotion] Make implementation offset based
This rewrites ArgPromotion to be based on offsets rather than GEP
structure. We inspect all loads at constant offsets and remember
which types are loaded at which offsets. Then we promote based on
those types.

This generalizes ArgPromotion to work with bitcasted loads, and
is compatible with opaque pointers.

This patch also fixes incorrect handling of alignment during
argument promotion. Previously, the implementation only checked
that the pointer is dereferenceable, but was happy to speculate
overaligned loads. (I would have fixed this separately in advance,
but I found this hard to do with the previous implementation
approach).

Differential Revision: https://reviews.llvm.org/D118685
2022-02-09 09:35:01 +01:00
Sylvestre Ledru f2c2e924e7 Fix a typo (occured => occurred)
Reported:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005195
2022-02-08 21:35:26 +01:00
Joseph Huber caf7f05c1c [Attributor] Emit fixed-point remark on function list
This patch replaces the function we emit the remark on when we run into
the fix-point limit. Previously we got a function to emit a remark on
from the worklist's associated function. However, the worklist may not
always have an associated function in the case of global variables.
Replace this with the function set, and if there are no functions don't
emit the remark.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119248
2022-02-08 12:10:21 -05:00
Nikita Popov b896334834 [ArgPromotion] Check dereferenceability on argument as well
Before walking all the callers, check whether we have a
dereferenceable attribute directly on the argument.

Also make it clearer that the code currently does not treat
alignment correctly.
2022-02-08 10:29:51 +01:00
Johannes Doerfert dd101c808b [Attributor][FIX] Do not use assumed information for UB detection
The helper `Attributor::checkForAllReturnedValuesAndReturnInsts`
simplifies the returned value optimistically. In `AAUndefinedBehavior`
we cannot use such optimistic values when deducing UB. As a result, we
assumed UB for the return value of a function because we initially
(=optimistically) thought the function return is `undef`. While we later
adjusted this properly, the `AAUndefinedBehavior` was under the
impression the return value is "known" (=fix) and could never change.

To correct this we use `Attributor::checkForAllInstructions` and then
manually to perform simplification of the return value, only allowing
known values to be used. This actually matches the other UB deductions.

Fixes #53647
2022-02-07 20:19:19 -06:00
Kazu Hirata 3a3cb929ab [llvm] Use = default (NFC) 2022-02-06 22:18:35 -08:00
Kazu Hirata cb13ebbf46 [Transforms] Use default member initialization in AAIsDeadCallSiteReturned (NFC) 2022-02-05 21:39:25 -08:00
Hongtao Yu dee058c670 [CSSPGO] Turn on ext-tsp by default for CSSPGO.
I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D119048
2022-02-04 19:46:44 -08:00
Joseph Huber 6b78526b1b [OpenMP] Emit remark on the captured call instead of the variable
Changes the remark to emit on the function call that captures the globalized
variable instead of the globalized variable itself. The user should be able to
see which variable it was in the argument list of the function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106980
2022-02-04 17:50:53 -05:00
Benjamin Kramer 85243124cf Tweak some uses of std::iota to skip initializing the underlying storage. NFCI. 2022-02-04 17:00:50 +01:00
Kazu Hirata 3710078ceb [SampleProfile] Reduce indentation with an early return (NFC) 2022-02-03 12:22:23 -08:00
Alexandros Lamprineas 438a81a284 [Function Specialisation] Fix use after free
This is a fix for a use-after-free found by the address sanitizer when
compiling GCC: https://github.com/llvm/llvm-project/issues/52821

The Function Specialization pass may remove instructions, cached
inside the PredicateBase class, which are later being dereferenced
from the SCCPInstVisitor class. To prevent the dangling references
I am lazily deleting the dead instructions after the Solver has run.

Differential Revision: https://reviews.llvm.org/D118591
2022-02-02 16:32:10 +00:00
serge-sans-paille e188aae406 Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/

I've tried to summarize the biggest change below:

- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h

And the usual count of preprocessed lines:
$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after:  6189948

200k lines less to process is no that bad ;-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup

Differential Revision: https://reviews.llvm.org/D118652
2022-02-02 06:54:20 +01:00
Fangrui Song 30e8f83c84 [GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible
Generalize D99629 for ELF. A default visibility non-local symbol is preemptible
in a -shared link. `isInterposable` is an insufficient condition.

Moreover, a non-preemptible alias may be referenced in a sub constant expression
which intends to lower to a PC-relative relocation. Replacing the alias with a
preemptible aliasee may introduce a linker error.

Respect dso_preemptable and suppress optimization to fix the abose issues. With
the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic`
compile.
```
int aliasee;
extern int alias __attribute__((alias("aliasee"), visibility("hidden")));
void foo() { alias = 345; } // intended to access the local copy
```

While here, refine the condition for the alias as well.

For some binary formats like COFF, `isInterposable` is a sufficient condition.
But I think canonicalization for the changed case has little advantage, so I
don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or
`getPICLevel/getPIELevel` complexity.

For instrumentations, it's recommended not to create aliases that refer to
globals that have a weak linkage or is preemptible. However, the following is
supported and the IR needs to handle such cases.
```
int aliasee __attribute__((weak));
extern int alias __attribute__((alias("aliasee")));
```

There are other places where GlobalAlias isInterposable usage may need to be
fixed.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D107249
2022-02-01 10:41:16 -08:00
Nikita Popov 1652c3b80c [GlobalOpt] Avoid early exit before dead constant check
In a similar vein to 236fbf571d,
make sure we don't early-exit before the dead constant check.
2022-02-01 15:57:19 +01:00
Nikita Popov 79179a378b [ArgPromotion] Use range-based for loop (NFC) 2022-02-01 10:34:14 +01:00
Johannes Doerfert 3b8ffe668d [Attributor][FIX] Relax assertion in IRPosition::verify
A call base can be a floating value if we talk about the instruction and
not the return value. This distinction was not made before but is
important for liveness, e.g., a call site return value might be unused
(=dead) but the call site is not.
2022-02-01 02:25:44 -06:00
Johannes Doerfert a265cf22af [Attributor] Introduce the `AA::isPotentiallyReachable` helper APIs
To make usage easier (compared to the many reachability related AAs),
this patch introduces a helper API, `AA::isPotentiallyReachable`, which
performs all the necessary steps. It also does the "backwards"
reachability (see D106720) as that simplifies the AA a lot (backwards
queries were somewhat different from the other query resolvers), and
ensures we use cached values in every stage.

To test inter-procedural reachability in a reasonable way this patch
includes an extension to `AAPointerInfo::forallInterferingWrites`.
Basically, we can exclude writes if they cannot reach a load "during the
lifetime" of the allocation. That is, we need to go up the call graph to
determine reachability until we can determine the allocation would be
dead in the caller. This leads to new constant propagations (through
memory) in `value-simplify-pointer-info-gpu.ll`.

Note: The new code contains plenty debug output to determine how
reachability queries are resolved.

Parts extracted from D110078.

Differential Revision: https://reviews.llvm.org/D118673
2022-02-01 01:40:45 -06:00
Johannes Doerfert b51b83f68e [Attributor] Introduce the concept of query AAs
D106720 introduced features that did not work properly as we could add
new queries after a fixpoint was reached and which could not be answered
by the information gathered up to the fixpoint alone.

As an alternative to D110078, which forced eager computation where we
want to continue to be lazy, this patch fixes the problem.

QueryAAs are AAs that allow lazy queries during their lifetime. They are
never fixed if they have no outstanding dependences and always run as
part of the updates in an iteration. To determine if we are done, all
query AAs are asked if they received new queries, if not, we only need
to consider updated AAs, as before. If new queries are present we go for
another iteration.

Differential Revision: https://reviews.llvm.org/D118669
2022-02-01 01:40:44 -06:00
Kuter Dinel b2d1ae0611 [Attributor] AAFunctionReachability, Instruction reachability.
This patch implement instruction reachability for AAFunctionReachability
attribute. It is used to tell if a certain instruction can reach a function
transitively.

NOTE: I created a new commit based of D106720 and set the author back to
      Kuter. Other metadata, etc. is wrong. I also addressed the
      remaining review comments and fixed the unit test.

Differential Revision: https://reviews.llvm.org/D106720
2022-02-01 01:40:44 -06:00
Johannes Doerfert ac3ec22df9 [Attributor] Use AAFunctionReachability to determine AANoRecurse
We missed out on AANoRecurse in the module pass because we had no call
graph. With AAFunctionReachability we can simply ask if the function may
reach itself.

Differential Revision: https://reviews.llvm.org/D110099
2022-02-01 01:40:44 -06:00
Johannes Doerfert d1186ce7a9 [Attributor] Make interprocedural value explicit in genericValueTraversal
genericValueTraversal can look through arguments and allow value
simplification across function boundaries. In fact, the latter already
happened unchecked. With this change we allow the user of
genericValueTraversal to opt-out of interprocedural traversal if
required. We explicitly look through arguments now which helps to do
various things, incl. the propagation of constants into OpenMP parallel
regions (on the host).
2022-02-01 01:40:44 -06:00
Johannes Doerfert a1db0e523d [Attributor][FIX] Liveness handling in the isAssumedDead helpers
This fixes a conceptual problem with our AAIsDead usage which conflated
call site liveness with call site return value liveness. Without the
fix tests would obviously miscompile as we make genericValueTraversal
more powerful (in a follow up). The effects on the tests are mixed but
mostly marginal. The most prominent one is the lack of `noreturn` for
functions. The reason is that we make entire blocks live at the same
time (for time reasons). Now that we actually look at the block
liveness, which we need to do, the return instructions are live and
will survive. As an example,  `noreturn_async.ll` has been modified
to retain the `noreturn` even with block granularity. We could address
this easily but there is little need in practice.
2022-02-01 01:18:52 -06:00
Johannes Doerfert 0f471710f8 [Attributor] Use edge liveness rather than block liveness
We moved to the edge API a while back, not all uses were adjusted.
Edge liveness is more precise.
2022-02-01 01:18:51 -06:00
Johannes Doerfert 53b6753bdd [Attributor][FIX] Address two oversights in AAIsDead
No tests as these were found browsing the code and I'm not sure how to
test them properly.
2022-02-01 01:18:51 -06:00
Johannes Doerfert cfabffb034 [Attributor][NFCI] Improve debug diagnostic 2022-02-01 01:18:51 -06:00
Johannes Doerfert adf0d57f15 [Attributor] Provide convenient helpers for isAssumedRead{None,Only}
We have two attributes that can answer readnone queries. While there is
a dependence between them, it seems best to not force the users to know
what AA to ask. The helpers also allow to check for readonly nicely.

Test changes show where we now deduce readnone but haven't before,
mostly because we only asked AAMemoryBehavior and not AAMemoryLocation.
AANoAlias has not been ported to the new API yet.
2022-02-01 01:18:51 -06:00
Johannes Doerfert e140d51319 [Attributor] Use CFG reasoning to filter potentially interfering writes
Since D104432 we can look through memory by analyzing all writes that
might interfere with a load. This patch provides some logic to exclude
writes that cannot interfere with a location, due to CFG reasoning.
We make sure to avoid multi-thread write-read situations properly while
we ignore writes that cannot reach a load or writes that will be
overwritten before the load is reached.

Differential Revision: https://reviews.llvm.org/D106397
2022-02-01 01:18:51 -06:00
Johannes Doerfert 191fa419a6 [Attributor][NFC] Make debug output more useful and concise 2022-02-01 01:18:51 -06:00
Johannes Doerfert 3f0e670498 [Attributor][NFCI] Expose some nosync reasoning to outside users.
No-sync is a property that we need in more places as complex
transformations emerge. To simplify the query we provide an
`AA::isNoSyncInst` helper now and expose two existing helpers through
the `AANoSync` class.
2022-02-01 01:07:50 -06:00
Johannes Doerfert a5b6aef24e [Attributor][NFCI] Remove anonymous namespaces
The namespaces made it more complicate to implement static helpers,
among other things. We should not need them at all.
2022-02-01 01:07:50 -06:00
Johannes Doerfert 3c8a4c6f47 [OpenMP] Eliminate redundant barriers in the same block
Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and
bugs introduced later by me.

This patch allows us to remove redundant barriers if they are part
of a "consecutive" pair of barriers in a basic block with no impacted
memory effect (read or write) in-between them. Memory accesses to
local (=thread private) or constant memory are allowed to appear.
Technically we could also allow any other memory that is not used to
share information between threads, e.g., the result of a malloc that
is also not captured. However, it will be easier to do more reasoning
once the code is put into an AA. That will also allow us to look through
phis/selects reasonably. At that point we should also deal with calls,
barriers in different blocks, and other complexities.

Differential Revision: https://reviews.llvm.org/D118002
2022-02-01 01:07:50 -06:00
Johannes Doerfert 989674f110 [OpenMP] Ensure to remove noinline from all runtime functions eventually
We used to remove noinline from known OpenMP runtime functions (which
are declared in OMPKinds.td). Now we remove noinline from all functions
with the proper prefixes: __kmpc, _ZN4_OMP (= namespace omp), omp_
2022-02-01 01:07:50 -06:00
Fangrui Song 7aaf024dac [BitcodeWriter] Fix cases of some functions
`WriteIndexToFile` is used by external projects so I do not touch it.
2022-01-31 16:46:11 -08:00
Andrew Litteken 3785c1d055 [IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.

This also adds an additional command line flag debug option to disable outlining intrinsics.

Recommit of: 8de76bd569
Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109450
2022-01-28 13:52:21 -06:00
Nikita Popov cf0357a545 [BasicBlockUtils] Fix typo in API name (NFC)
detatch -> detach. As this requires touching all uses, also
lower-case it in accordance with the style guide.
2022-01-28 16:32:13 +01:00
Nikita Popov 0ebbf3435f [ArgPromotion] Don't assume all entry block instrs are executed
We should abort this walk if we hit any instruction that is not
guaranteed to transfer.
2022-01-28 16:08:42 +01:00
Nikita Popov 8b36c437df [ArgPromotion] Make areFunctionArgsABICompatible() static (NFC)
This function used to be shared with the Attributor, but can now
be made private.
2022-01-28 15:26:36 +01:00
Nikita Popov 9e7a2bfcf7 [OpenMPOpt] Add const qualifier (NFC)
Make it clear that this large lambda does not modify the vector.
2022-01-26 10:35:57 +01:00
Giorgis Georgakoudis 7cb4c26173 [OMPIRBuilder] Generate aggregate argument for parallel region outlined functions
Summary:
This patch modifies code generation in OpenMPIRBuilder to pass arguments
to the parallel region outlined function in an aggregate (struct),
besides the global_tid and bound_tid arguments. It depends on the
updated CodeExtractor (see D96854) for support. It mirrors functionality
of Clang codegen (see D102107).

Differential Revision: https://reviews.llvm.org/D110114
2022-01-25 20:53:45 -05:00
Andrew Litteken ba79295c48 [NFC][IROutliner] fix namespace and unused variable 2022-01-25 18:41:30 -06:00
Andrew Litteken e8f4e41b6b [IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region.
We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function.

Recommit of 515eec3553

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106997
2022-01-25 18:25:50 -06:00
Andrew Litteken e50b217b4e Revert "[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region."
This reverts commit 515eec3553.

By mistake, commit message was not complete.
2022-01-25 18:24:19 -06:00
Andrew Litteken 515eec3553 [IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region. 2022-01-25 18:20:10 -06:00
Andrew Litteken 9c2daf648c Revert "[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions"
This reverts commit 8de76bd569.

Reverting due to failure of different-intrinsics.ll on lld-x86_64-win buildbot.
2022-01-25 18:19:33 -06:00
Dávid Bolvanský fe30370b00 Reland "[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360)" 2022-01-26 01:11:06 +01:00
Andrew Litteken 8de76bd569 [IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.

This also adds an additional command line flag debug option to disable outlining intrinsics.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109450
2022-01-25 17:06:09 -06:00
Dávid Bolvanský 90f185c964 Revert "[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360)"
This reverts commit ceec438368. Clang tests fail.
2022-01-25 23:13:46 +01:00
Dávid Bolvanský ceec438368 [AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360)
Problem: Migration to new PM broke flatten attribute.

This is one use case why LLVM should support inlining call-site with alwaysinline.  The flatten attribute is nowdays broken, so we should either land patch like this one or remove everything related to  flatten attribute from Clang.

Second use case is something like "per call site inlining intrinsics" to control inlining even more; mentioned in
https://lists.llvm.org/pipermail/cfe-dev/2018-September/059232.html

Fixes https://github.com/llvm/llvm-project/issues/53360

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D117965
2022-01-25 22:55:30 +01:00
Andrew Litteken f5f377d1fc [IRSim][IROutliner] Adding support for recognizing and outlining indirect function calls, and function calls with different names, but the same type
The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions.

There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109448
2022-01-25 15:19:28 -06:00
Andrew Litteken dcc3e728ca [IROutliner] Allowing Phi Nodes in exit blocks
In addition to having multiple exit locations, there can be multiple blocks leading to the same exit location, which results in a potential phi node. If we find that multiple blocks within the region branch to the same block outside the region, resulting in a phi node, the code extractor pulls this phi node into the function and uses it as an output.

We make sure that this phi node is given an output slot, and that the two values are removed from the outputs if they are not used anywhere else outside of the region. Across the extracted regions, the phi nodes are combined into a single block for each potential output block, similar to the previous patch.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106995
2022-01-25 11:33:53 -06:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Joseph Huber 5eb49009eb [OpenMP] Add more identifier to created shared globals
Currenly we push some variables to a global constant containing shared
memory as an optimization. This generated constant had internal linkage
and should not have collided with any known identifiers in the
translation unit. However, there have been observed cases of this
optimiztaion unintentionally colliding with undocumented PTX
identifiers. This patch adds a suffix to the created globals to
hopefully bypass this.

Depends on D118059

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D118068
2022-01-24 20:37:54 -05:00
Joseph Huber 06cfdd5224 [OpenMP][Fix] Properly inherit calling convention
Previously in OpenMPOpt we did not correctly inherit the calling
convention of the callee when creating new OpenMP runtime calls. This
created issues when the calling convention was changed during
`GlobalOpt` but a new call was creating without the correct calling
convention. This lead to the call being replaced with a poison value in
`InstCombine` due to undefined behaviour and causing large portions of
the program to be incorrectly eliminated. This patch correctly inherits
the existing calling convention from the callee.

Reviewed By: tianshilei1992, jdoerfert

Differential Revision: https://reviews.llvm.org/D118059
2022-01-24 20:37:52 -05:00
Nikita Popov 67346b43e0 [Attributor] Use MemoryLocation to get pointer operand and accessed type (NFCI)
This relies on existing APIs and avoids accessing the pointer
element type. The alternative would be to extend getPointerOperand()
to also return the accessed type, but I figured going through
MemoryLocation would be cleaner.

Differential Revision: https://reviews.llvm.org/D117868
2022-01-24 10:10:13 +01:00
Nikita Popov e7762653d3 [Attributor] Avoid some pointer element type accesses 2022-01-21 11:20:10 +01:00
Johannes Doerfert 37e0c58559 [Attributor][FIX] AAValueConstantRange should not loop unconstrained
The old method to avoid unconstrained expansion of the constant range in
a loop did not work as soon as there were multiple instructions in
between the phi and its input. We now take a generic approach and limit
the number of updates as a fallback. The old method is kept as it
catches "the common case" early.
2022-01-20 18:07:04 -06:00
Johannes Doerfert 7bf9065ad7 [Attributor][NFC] Clang format 2022-01-20 18:06:53 -06:00
Sjoerd Meijer fabf1de132 [FuncSpec] Add a reference, and some other clarifying comments. NFC. 2022-01-20 17:01:08 +00:00
Johannes Doerfert b4a7559844 [OpenMP][FIX] Replace ICVs only with values valid at the getter position
While we might know the value if an ICV at a getter position it is not
always clear that we can simply use it. Verify the value is valid first
to avoid invalid IR.

Fixes #53300.
2022-01-19 18:40:13 -06:00
Eli Friedman 86cdff0e21 [OpenMPOpt] Use SetVector to store list of kernels.
Fixes test failures on reverse-iteration buildbot.
2022-01-19 13:55:32 -08:00
Wenlei He 7cca13bc3a [PartialInline] Bail out on asm-goto/callbr
Fixing ICE when partial inline tries to deal with blockaddress uses of function which is typical for asm-goto/callbr. We ran into this with PGO multi-region partial inline.

Differential Revision: https://reviews.llvm.org/D117509
2022-01-19 10:57:57 -08:00
Nikita Popov a115bbea9b [Attributor] Remove notional overindexing check
AAPointerInfo currently bails on constant expression GEPs with
notional overindexing. I don't think this is necessary, as the
following code handling GEPOperator will deal with arbitrary
indices appropriately.

Differential Revision: https://reviews.llvm.org/D117203
2022-01-19 11:30:04 +01:00
Mircea Trofin 3e8553aab4 [mlgo][inline] Improve global state tracking
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.

Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.

This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D115847
2022-01-18 17:45:34 +00:00
Philip Reames 26049b8ce3 [GlobalOpt] Generalize malloc-to-global for any allocation function
We can generalize the malloc-to-global transform for other allocation functions which are both a) removable, and b) have a known initialization value.

One subtlety that I want to point out - mostly because I hadn't realized it was true until I took a closer look - is that the existing code doesn't prove that initialization/malloc happens only once. The initialization function can be called multiple times. This is correct without special handling for malloc as undef can map to any value previously written, but a non-undef initializing allocation it means we may end up memseting the new global repeatedly. In particular, this means it's not legal to fold the memset into the initializer of the global.

Differential Revision: https://reviews.llvm.org/D117503
2022-01-17 15:06:23 -08:00
Nikita Popov 12bee2c054 [GlobalOpt] Drop an incorrect check
This was a last-minute addition to D117249, and of course I ended
up inverting the condition in a way that caused an uninitialized
memory read.

I've dropped it entirely, as I don't think we actually care whether
the size is zero or not here. The previous code wasn't checking
this either.
2022-01-17 10:10:56 +01:00
Nikita Popov 499f1ca79f [GlobalOpt] Use generic type when converting malloc to global
The malloc to global transform currently determines the type of the
global by looking at bitcasts of the malloc. This is limited (the
transform fails if there are multiple different types) and
incompatible with opaque pointers.

My initial approach was to construct an appropriate struct type
based on usage in loads/stores. What this patch does instead is
to always create an [i8 x AllocSize] global, without trying to
guess types at all.

This does mean that other transforms that require a certain global
type may break. I fixed two of these in D117034 and D117223, which
I believe should be sufficient to avoid regressions. In particular,
the global SRA change should end up splitting the global into
naturally-typed sub-globals, at which point all other optimizations
should work.

Differential Revision: https://reviews.llvm.org/D117092
2022-01-17 09:55:33 +01:00
Nikita Popov 4796b4ae7b [GlobalOpt] Make global SRA offset based
Currently global SRA uses the GEP structure to determine how to
split the global. This patch instead analyses the loads and stores
that are performed on the global, and collects which types are used
at which offset, and then splits the global according to those.

This is both more general, and works fine with opaque pointers.
This is also closer to how ordinary SROA is performed.

Differential Revision: https://reviews.llvm.org/D117223
2022-01-17 09:28:36 +01:00
Bryce Wilson 28b6e2cb3d
[Attributor] [NFC] Use canonical variable name
Differential Revision: https://reviews.llvm.org/D117241
2022-01-13 23:06:00 -08:00
Philip Reames 5d5d4d94f0 [Attributor] Generalize heap to stack to any allocator with relevant properties
This completes removal of the isXLike queries, and depends on a whole series of earlier patches which have already landed.

Differential Revision: https://reviews.llvm.org/D117242
2022-01-13 15:33:24 -08:00
Philip Reames cf66f01ec1 [Attributor] Share code for abstract interpretation of allocation sizes with getObjectSize [NFC-ish]
The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code.

Note this is not NFC for two reasons:
* The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space.
* I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that.

Differential Revision: https://reviews.llvm.org/D117241
2022-01-13 15:33:24 -08:00
Arthur Eubanks 9a0fe1b0fc [Inline] Attempt to delete any discardable if unused functions
Previously we limited ourselves to only internal/private functions. We
can also delete linkonce_odr functions.

Minor compile time wins:
https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions

Major memory wins on tramp3d:
https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss

Relanding with fix for compile times D117236.

Reviewed By: nikic, mtrofin

Differential Revision: https://reviews.llvm.org/D115545
2022-01-13 14:48:38 -08:00
Arthur Eubanks 757e044dce [Inliner] Don't removeDeadConstantUsers() when checking if a function is dead
If a function has many uses, this can take a good chunk of compile times.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117236
2022-01-13 14:29:45 -08:00
Nikita Popov 1cbb456123 [GlobalOpt] Fix global to select transform under opaque pointers
We need to check that the load/store type is also the same, as this
is no longer implicitly checked through the pointer type.
2022-01-13 11:13:06 +01:00
James Y Knight 55fcbf0a84 Revert "[Inline] Attempt to delete any discardable if unused functions"
Somehow this ends up causing an infinite loop in the inliner.

This reverts commit d5be48c66d.
2022-01-13 03:06:47 +00:00
Philip Reames 9979299705 [Attributor] Simplify how we handle required alignment during heap-to-stack [NFC]
The existing code duplicated the same concern in two places, and (weirdly) changed the inference of the allocation size based on whether we could meet the alignment requirement.  Instead, just directly check the allocation requirement.
2022-01-12 17:34:17 -08:00
Philip Reames d1f4c6a611 [Attributor] Generalize calloc handling in heap-to-stack for any init value [NFC]
Rewrite the calloc specific handling in heap-to-stack to allow arbitrary init values.  The basic problem being solved is that if an allocation is initilized to anything other than zero, this must be explicitly done for the formed alloca as well.

This covers the calloc case today, but once a couple of earlier guards are removed in this code, downstream allocators with other init values could also be handled.

Inspired by discussion on D116971
2022-01-12 16:58:39 -08:00
Philip Reames 8e76720cf2 [Attributor] Reuse object size evaluation code [NFC] 2022-01-12 16:58:39 -08:00
Philip Reames db57065b36 [Attributor] Use getAllocAlignment where possible [NFC]
Inspired by D116971.
2022-01-12 16:58:39 -08:00
Arthur Eubanks fe827a93f6 [ModuleInliner] Properly delete dead functions
Followup to D116964 where we only did this in the CGSCC inliner.
Fixes leaks reported in D116964.
2022-01-12 09:57:43 -08:00
Arthur Eubanks d5be48c66d [Inline] Attempt to delete any discardable if unused functions
Previously we limited ourselves to only internal/private functions. We
can also delete linkonce_odr functions.

Minor compile time wins:
https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions

Major memory wins on tramp3d:
https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss

Reviewed By: nikic, mtrofin

Differential Revision: https://reviews.llvm.org/D115545
2022-01-12 08:36:04 -08:00
Nikita Popov 5642ce5ac2 [GlobalOpt] Drop redundant setExternallyInitialized() call (NFC)
This is part of copyAttributesFrom().
2022-01-12 09:42:58 +01:00
Nikita Popov f3e87176e1 [GlobalOpt] Support "stored once" optimization for different types
GlobalOpt can optimize a global with undef initializer and a single
store to put the stored value into the initializer instead. Currently,
this requires the type of the global and the store to match.

This patch extends support to cases with different types (but same
size), in which case we create a new global to replace the old one.

Differential Revision: https://reviews.llvm.org/D117034
2022-01-12 09:39:31 +01:00
Mircea Trofin 248d55af3e [NFC][MLGO] Use LazyCallGraph::Node to track functions.
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)

Differential Revision: https://reviews.llvm.org/D116964
2022-01-11 19:23:47 -08:00
Johannes Doerfert 7b39dccbe4 [Attributor][FIX] Ensure "IsExact" is false for non-exact accesses
If we look at potentially interfering accesses we need to ensure the
"IsExact" flag is set appropriately. Accesses that have an "unknown"
size or offset cannot be exact matches and we missed to flag that.

Error and test reported by Serguei N. Dmitriev.
2022-01-10 10:09:36 -06:00
Serge Guelton d2cc6c2d0c Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is
inefficient: it makes its constructor slightly heavier, and involves extra
allocation for each new string attribute. Storing the attribute key/value as
strings implies extra allocation/copy step.

Use a sorted vector instead. Given the low number of attributes generally
involved, this is cheaper, as showcased by

https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions

Differential Revision: https://reviews.llvm.org/D116599
2022-01-10 14:49:53 +01:00
Nikita Popov 92d55e7336 [MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall()
We currently have two similar implementations of this concept:
isNoAliasCall() only checks for the noalias return attribute.
isNoAliasFn() also checks for allocation functions.

We should switch to only checking the attribute. SLC is responsible
for inferring the noalias return attribute for non-new allocation
functions (with a missing case fixed in
348bc76e35).
For new, clang is responsible for setting the attribute,
if -fno-assume-sane-operator-new is not passed.

Differential Revision: https://reviews.llvm.org/D116800
2022-01-10 09:18:15 +01:00
Johannes Doerfert 4e8a02e7f4 [Attributor][FIX] Remove assumption that doesn't have to hold
There is no guarantee we strip all GEPOperators and the conservative
handling doesn't even require us to.
2022-01-09 13:15:53 -06:00
Johannes Doerfert 6c745e04fa [Attributor][FIX] Ensure order for multiple references into map
If we have multiple references into a map we need to ensure the ones
created late do not invalidate the ones created early. To do that we
need to make sure all but the first are not modifying the map, hence
for them the keys have to be present already.

Fixes #52875.
2022-01-08 16:59:21 -06:00
Simon Pilgrim 274359cf09 [OpenMPOpt] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr. NFC 2022-01-08 13:47:35 +00:00
Kazu Hirata b932bdf59f [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-07 17:45:09 -08:00
Arthur Eubanks f96ab6cc1b Revert "[Inline] Attempt to delete any discardable if unused functions"
This reverts commit 335a3163aa.

Causes crashes when building llvm-test-suite's kc under ReleaseLTO-g.
2022-01-07 13:12:40 -08:00
Arthur Eubanks 335a3163aa [Inline] Attempt to delete any discardable if unused functions
Previously we limited ourselves to only internal/private functions. We
can also delete linkonce_odr functions.

Minor compile time wins:
https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions

Major memory wins on tramp3d:
https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D115545
2022-01-07 11:05:26 -08:00
Philip Reames 6b0ff0969d Extract utility function for checking initial value of allocation [NFC, try 2]
This is a reoccuring pattern, we can consolidate three copies into one.  The main motivation is to reduce usages of isMallocLike.

The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.
2022-01-07 08:44:08 -08:00
Philip Reames a3573f203e Fix a bug in 67a3331e (cast instead of dyn_cast)
The original commit was expected to be NFC, but I didn't account for the fact that invokes could be considered allocation functions.  Interestingly, only one builder caught the problem.
2022-01-07 08:25:02 -08:00
Kazu Hirata 2aed08131d [llvm] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
2022-01-07 00:39:14 -08:00
Nikita Popov c8189da201 [ModuleUtils] Remove dead arg from filterDeadComdatFunctions() (NFC)
The module argument is no longer used.
2022-01-07 09:12:16 +01:00
Philip Reames c6a0c1585a Revert "Extract utility function for checking initial value of allocation [NFC]"
This reverts commit 9ce30fe86f.  Appears to be causing a problem on a buildbot, revert while investigating.

https://green.lab.llvm.org/green//job/clang-stage1-RA/26818/consoleFull#-1502953973d489585b-5106-414a-ac11-3ff90657619c
2022-01-06 19:05:51 -08:00
Philip Reames 9ce30fe86f Extract utility function for checking initial value of allocation [NFC]
This is a reoccuring pattern, we can consolidate three copies into one.  The main motivation is to reduce usages of isMallocLike.
2022-01-06 18:02:14 -08:00
Philip Reames 7052670e96 Move getMallocAllocatedType and getMallocArraySize to GlobalOpt [NFC]
These are implementation details of the global-opt transform and not easily reuseable, so remove them from the analysis header.
2022-01-06 18:02:13 -08:00
Philip Reames 67a3331e4f Inline extractMallocCall to sole use and delete [NFC] 2022-01-06 18:02:13 -08:00
Philip Reames c16fd6a376 Rename doesNotReadMemory to onlyWritesMemory globally [NFC]
The naming has come up as a source of confusion in several recent reviews.  onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.
2022-01-05 08:52:55 -08:00
Nikita Popov 99c6b12b92 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case b) it bypasses casts that might not be legal generally but
do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time. This replaces two usages with a more
complete helper. Other usages will be replaced separately, because
they have larger impact.

This is part of D115924.
2022-01-05 12:30:46 +01:00
Sjoerd Meijer e550dfa4a6 Silence a few unused variable warnings. NFC. 2022-01-05 09:15:07 +00:00
Philip Reames 0b09313cd5 [funcattrs] Infer writeonly argument attribute [part 2]
This builds on the code from D114963, and extends it to handle calls both direct and indirect. With the revised code structure (from series of previously landed NFCs), this is pretty straight forward.

One thing to note is that we can not infer writeonly for arguments which might be captured. If the pointer can be read back by the caller, and then read through, we have no way to track that. This is the same restriction we have for readonly, except that we get no mileage out of the "callee can be readonly" exception since a writeonly param on a readonly function is either a) readnone or b) UB. This means we can't actually infer much unless nocapture has already been inferred.

Differential Revision: https://reviews.llvm.org/D115003
2022-01-04 09:07:54 -08:00
serge-sans-paille 9290ccc3c1 Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.

Differential Revision: https://reviews.llvm.org/D116110
2022-01-04 15:37:46 +01:00
Nikita Popov bbeaf2aac6 [GlobalOpt][Evaluator] Rewrite global ctor evaluation (fixes PR51879)
Global ctor evaluation currently models memory as a map from Constant*
to Constant*. For this to be correct, it is required that there is
only a single Constant* referencing a given memory location. The
Evaluator tries to ensure this by imposing certain limitations that
could result in ambiguities (by limiting types, casts and GEP formats),
but ultimately still fails, as can be seen in PR51879. The approach
is fundamentally fragile and will get more so with opaque pointers.

My original thought was to instead store memory for each global as an
offset => value representation. However, we also need to make sure
that we can actually rematerialize the modified global initializer
into a Constant in the end, which may not be possible if we allow
arbitrary writes.

What this patch does instead is to represent globals as a MutableValue,
which is either a Constant* or a MutableAggregate*. The mutable
aggregate exists to allow efficient mutation of individual aggregate
elements, as mutating an element on a Constant would require interning
a new constant. When a write to the Constant* is made, it is converted
into a MutableAggregate* as needed.

I believe this should make the evaluator more robust, compatible
with opaque pointers, and a bit simpler as well.

Fixes https://github.com/llvm/llvm-project/issues/51221.

Differential Revision: https://reviews.llvm.org/D115530
2022-01-04 09:30:54 +01:00
Kazu Hirata e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887e.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Kazu Hirata fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Fangrui Song b69fe48ccf [IROutliner] Move global namespace cl::opt inside llvm:: 2021-12-30 01:12:55 -08:00
Johannes Doerfert 944aa0421c Reapply "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 73ece231ee and
reapplies 7bfcdbcbf3 with mlir changes.
Also reverts commit 423ba12971 and
includes the unit test changes of
16da214004.
2021-12-29 01:10:38 -06:00
Mehdi Amini 73ece231ee Revert "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 7bfcdbcbf3.
Broke MLIR build
2021-12-29 06:57:36 +00:00
Johannes Doerfert 5602c866c0 [Attributor] Look through allocated heap memory
AAPointerInfo, and thereby other places, can look already through
internal global and stack memory. This patch enables them to look
through heap memory returned by functions with a `noalias` return.

In the future we can look through `noalias` arguments as well but that
will require AAIsDead to learn that such memory can be inspected by the
caller later on. We also need teach AAPointerInfo about dominance to
actually deal with memory that might not be `null` or `undef`
initialized. D106397 is a first step in that direction already.

Reviewed By: kuter

Differential Revision: https://reviews.llvm.org/D109170
2021-12-29 00:21:36 -06:00
Johannes Doerfert 3e0c512ce6 [OpenMP] Simplify all stores in the device code
Similar to loads, we want to be aggressive when it comes to store
simplification. Not everything in LLVM handles dead stores well when
address space casts are involved, we can simply ask the Attributor to do
it for us though.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109998
2021-12-29 00:19:38 -06:00
Johannes Doerfert 7bfcdbcbf3 [OpenMP][NFCI] Embed the source location string size in the ident_t
One of the unused ident_t fields now holds the size of the string
(=const char *) field so we have an easier time dealing with those
in the future.

Differential Revision: https://reviews.llvm.org/D113126
2021-12-28 23:53:29 -06:00
Johannes Doerfert 6e2fcf8513 [Attributor][FIX] Ensure store uses are correlated with reloads
While we skipped uses in stores if we can find all copies of the value
when the memory is loaded, we did not correlate the use in the store
with the use in the load. So far this lead to less precise results in the
offset calculations which prevented deductions. With the new
EquivalentUseCB callback argument the user of checkForAllUses can be
informed of the correlation and act on it appropriately.

Differential Revision: https://reviews.llvm.org/D109662
2021-12-28 23:53:29 -06:00
Johannes Doerfert 9f04a0ea43 [OpenMP][FIX] Make AAExecutionDomain deterministic 2021-12-28 23:53:29 -06:00
Johannes Doerfert ba70f3a5d9 [OpenMP][FIX] Make heap2shared deterministic
Issue #52875 reported non-determinism, this is the first step to avoid
it. We iterate over MallocCalls so we should keep the order stable.
2021-12-28 23:53:28 -06:00
Johannes Doerfert 7de5da2a67 [OpenMP][NFC] Move address space enum into OMPConstants header 2021-12-28 23:53:28 -06:00
Joseph Huber 6e220296d7 [OpenMP] Use alignment information in HeapToShared
This patch uses the return alignment attribute now present in the
`__kmpc_alloc_shared` runtime call to set the alignment of the shared
memory global created to replace it.

Depends on D115971

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D116319
2021-12-27 16:58:27 -05:00
Joseph Huber 38fc89623b [Attributor][Fix] Add alignment return attribute to HeapToStack
This patch changes the HeapToStack optimization to attach the return alignment
attribute information to the created alloca instruction. This would cause
problems when replacing the heap allocation with an alloca did not respect the
alignment of the original heap allocation, which would typically be aligned on
an 8 or 16 byte boundary. Malloc calls now contain alignment attributes,
so we can use that information here.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115888
2021-12-27 16:58:23 -05:00
Nikita Popov 69ffc3cee9 [Attributor] Directly call areTypesABICompatible() hook
Instead of using the ArgumentPromotion implementation, we now walk
call sites using checkForAllCallSites() and directly call
areTypesABICompatible() using the replacement types. I believe
that resolves the TODO in the code.

Differential Revision: https://reviews.llvm.org/D116033
2021-12-24 09:20:31 +01:00
Philip Reames ee5d5e19f9 [funcattrs] Use callsite param attributes from indirect calls when inferring access attributes
Arguments to an indirect call is by definition outside the SCC, but there's no reason we can't use locally defined facts on the call site. This also has the nice effect of further simplifying the code.

Differential Revision: https://reviews.llvm.org/D116118
2021-12-22 18:21:59 -08:00
Nikita Popov f5ac23b5ae [ArgPromotion][TTI] Pass types to ABI compatibility hook
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.

This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.

This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).

Differential Revision: https://reviews.llvm.org/D116031
2021-12-22 09:37:51 +01:00
minglotus-6 9c49f8d705 [LTO][WPD] Ignore unreachable function by analyzing IR.
In regular LTO, analyze IR and discard unreachable functions when finding virtual call targets.

Differential Revision: https://reviews.llvm.org/D116056
2021-12-21 18:13:03 +00:00
Philip Reames b7b308c50a [funcattrs] Infer access attributes for vararg arguments
This change allows us to infer access attributes (readnone, readonly) on arguments passed to vararg functions. Since there isn't a formal argument corresponding to the parameter, they'll never be considered part of the speculative SCC, but they can still benefit from attributes on the call site or the callee function.

The main motivation here is just to simplify the code, and remove some special casing. Previously, an indirect vararg call could return more precise results than an direct vararg call which is just weird.

Differential Revision: https://reviews.llvm.org/D115964
2021-12-21 09:34:14 -08:00
Philip Reames 1fee7195c9 [funcattrs] Fix incorrect readnone/readonly inference on captured arguments
This fixes a bug where we would infer readnone/readonly for a function which passed a value to a function which could capture it. With the value captured in memory, the function could reload the value from memory after the call, and write to it. Inferring the argument as readnone or readonly is unsound.

@jdoerfert apparently noticed this about two years ago, and tests were checked in with 76467c4, but the issue appears to have never gotten fixed.

Since this seems like this issue should break everything, let me explain why the case is actually fairly narrow. The main inference loop over the argument SCCs only analyzes nocapture arguments. As such, we can only hit this when construction the partial SCCs. Due to that restriction, we can only hit this when we have either a) a function declaration with a manually annotated argument, or b) an immediately self recursive call.

It's also worth highlighting that we do have cases we can infer readonly/readnone on a capturing argument validly. The easiest example is a function which simply returns its argument without ever accessing it.

Differential Revision: https://reviews.llvm.org/D115961
2021-12-21 09:34:14 -08:00
Sjoerd Meijer 9e3ae8d296 [FuncSpec] Rename internal option. NFC.
Rename option MaxConstantsThreshold to MaxClonesThreshold. Not only is this
more descriptive, this is also in preparation of introducing another threshold
to analyse more than just 1 constant argument as we currently do, and to better
distinguish these options/thresholds.
2021-12-21 11:02:01 +00:00
Nikita Popov 2926d6d335 [ConstantFold][GlobalOpt] Don't create x86_mmx null value
This fixes the assertion failure reported at
https://reviews.llvm.org/D114889#3198921 with a straightforward
check, until the cleaner fix in D115924 can be reapplied.
2021-12-21 09:11:41 +01:00
Sami Tolvanen 5dc8aaac39 [llvm][IR] Add no_cfi constant
With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces
function references with CFI jump table references, which is a problem
for low-level code that needs the address of the actual function body.

For example, in the Linux kernel, the code that sets up interrupt
handlers needs to take the address of the interrupt handler function
instead of the CFI jump table, as the jump table may not even be mapped
into memory when an interrupt is triggered.

This change adds the no_cfi constant type, which wraps function
references in a value that LowerTypeTestsModule::replaceCfiUses does not
replace.

Link: https://github.com/ClangBuiltLinux/linux/issues/1353

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D108478
2021-12-20 12:55:32 -08:00
Nikita Popov aeb36ae0f4 Revert "[ConstantFolding] Unify handling of load from uniform value"
This reverts commit 9fd4f80e33.

This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c
in test-suite. Either the test is incorrect, or clang is generating
incorrect union initialization code. I've submitted
https://reviews.llvm.org/D115994 to fix the test, assuming my
interpretation is correct. Reverting this in the meantime as it
may take some time to resolve.
2021-12-18 20:46:52 +01:00
Kazu Hirata fee57711fe Use DenseMap::lookup (NFC) 2021-12-17 18:19:25 -08:00
Kazu Hirata 2b7be47b22 [llvm] Strip redundant lambda (NFC) 2021-12-17 10:51:40 -08:00
Philip Reames 4c9e31a481 [funcattrs] Use early return to clarify code in determinePointerAccessAttrs [NFC]
Instead of having the speculative path be the untaken path in the branch, explicitly have it return.  This does require tail duplicating one call, but the resulting code is shorter and easier to understand.  Also rewrite the condition using appropriate accessors.
2021-12-17 10:00:36 -08:00
Philip Reames 54ee8bb73a [funcattrs] Use getDataOperandNo where appropriate [NFC]
We'd manually duplicated the same logic and assertions; we can use the utility instead.
2021-12-17 09:35:29 -08:00
Philip Reames 33cbaab141 [funcattrs] Consistently treat calling a function pointer as a non-capturing read
We were being wildly inconsistent about what memory access was implied by an indirect function call. Depending on the call site attributes, you could get anything from a read, to unknown, to none at all. (The last was a miscompile.)

We were also always traversing the uses of a readonly indirect call. This is entirely unneeded as the indirect call does not capture. The callee might capture itself internally, but that has no implications for this caller. (See the nice explanation in the CaptureTracking comments if that case is confusing.)

Note that elsewhere in the same file, we were correctly computing the nocapture attribute for indirect calls. The changed case only resulted in conservatism when computing memory attributes if say the return value was written to.

Differential Revision: https://reviews.llvm.org/D115916
2021-12-17 09:02:03 -08:00
Nikita Popov 9fd4f80e33 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case and b) it bypasses casts that might not be legal generally
but do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time, as well as incomplete type checks in
some cases. In particular, this fixes the assertion reported in
https://reviews.llvm.org/D114889#3198921, as well as a similar
assertion that could be triggered via constant folding.

Differential Revision: https://reviews.llvm.org/D115924
2021-12-17 17:05:06 +01:00
Sjoerd Meijer b7b61fe091 [FuncSpec] Create helper to update state. NFC.
This creates a helper function updateSpecializedFuncs and is a NFC just to make
the function that drives the transformation easier to read.
2021-12-17 12:14:33 +00:00
Sjoerd Meijer 78a392cf9f [FuncSpec] Respect MaxConstantsThreshold
This is a follow up of D115458 and truncates the worklist of actual arguments
that can be specialised to 'MaxConstantsThreshold' candidates if
MaxConstantsThreshold was exceeded. Thus, this changes the behaviour of option
-func-specialization-max-constants. Before it didn't specialise at all when
this threshold was exceeded, but now it specialises up to MaxConstantsThreshold
candidates from the sorted worklist.

Differential Revision: https://reviews.llvm.org/D115509
2021-12-17 09:25:45 +00:00
Sjoerd Meijer 89bcfd1632 Recommit "[FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates."
Replaced llvm:sort with llvm::stable_sort, this was failing on the bot with
expensive checks enabled.
2021-12-17 09:02:51 +00:00
Sjoerd Meijer 5b139a583d Revert "[FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates."
This reverts commit 20b03d6536.

This shows some failed tests on a bot with expensive checks enabled that I need
to look at.
2021-12-16 12:56:11 +00:00
Sjoerd Meijer 20b03d6536 [FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates.
This mostly is the same code that is refactored to decouple the cost and
benefit analysis. The biggest change is top-level function specializeFunctions
that now drives the transformation more like this:

  specializeFunctions() {
    Cost = getSpecializationCost(F);
    calculateGains(F, Cost);
    specializeFunction(F);
  }

while this is just a restructuring, it helps the functional change in
calculateGains. I.e., we now sort the candidates based on the expected
specialisation gain, which we didn't do before. For this, a book keeping struct
ArgInfo was introduced. If we have a list of N candidates, but we only want
specialise less than N as set by option -func-specialization-max-constants, we
sort the list and discard the candidates that give the least benefit.

Given a formal argument, this change results in selecting the best actual
argument(s). This is NFC'ish in that this shouldn't change the current output
(hence no test change here), but in follow ups starting with D115509, it
should and I want to go one step further and compare all functions and all
arguments, which will mostly build on top of this refactoring and change.

Differential Revision: https://reviews.llvm.org/D115458
2021-12-16 11:55:37 +00:00
minglotus-6 4ab5527c15 [ThinLTO] Ignore unreachable virtual functions in WPD in thin LTO.
Differential Revision: https://reviews.llvm.org/D115648
2021-12-16 02:24:20 +00:00
Hongtao Yu d7b7b64914 [CSSPGO] Warn instead of error out for modules that are not probed.
Modules that are not compiled with pseudo probe enabled can still be compiled with a sample profile input, such as in LTO postlink where other modules are probed. Since the profile is unrelated to the current modules, we should warn instead of error out the compilation.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D115642
2021-12-14 16:38:50 -08:00
Hongtao Yu 5740bb801a [CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.

For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts,  since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.

In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.

A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes,  a nested profile shown as below can also have a CFG checksum.

```

main:1968679:12
 2: 24
 3: 28 _Z5funcAi:18
 3.1: 28 _Z5funcBi:30
 3: _Z5funcAi:1467398
  0: 10
  1: 10 _Z8funcLeafi:11
  3: 24
  1: _Z8funcLeafi:1467299
   0: 6
   1: 6
   3: 287884
   4: 287864 _Z3fibi:315608
   15: 23
   !CFGChecksum: 138828622701
   !Attributes: 2
  !CFGChecksum: 281479271677951
  !Attributes: 2
```

Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
 - Changes in the sample loader to support probe-based nested profile.

I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.

Test Plan:

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D115205
2021-12-14 14:40:25 -08:00
Mingming Liu 09a704c5ef [LTO] Ignore unreachable virtual functions in WPD in hybrid LTO.
Differential Revision: https://reviews.llvm.org/D115492
2021-12-14 20:18:04 +00:00
Kazu Hirata 36b8a4f9f3 [llvm] Use llvm::is_contained (NFC) 2021-12-11 11:42:09 -08:00
Sami Tolvanen 9a74c753fe [ThinLTO][MC] Use conditional assignments for promotion aliases
Inline assembly refererences to static functions with ThinLTO+CFI were
fixed in D104058 by creating aliases for promoted functions. Creating
the aliases unconditionally resulted in an unexpected size increase in
a Chrome helper binary:

https://bugs.chromium.org/p/chromium/issues/detail?id=1261715

This is caused by the compiler being unable to drop unused code now
referenced by the alias in module-level inline assembly. This change
adds a .set_conditional assembly extension, which emits an assignment
only if the target symbol is also emitted, avoiding phantom references
to functions that could have otherwise been dropped.

This is an alternative to the solution proposed in D112761.

Reviewed By: pcc, nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D113613
2021-12-10 12:21:37 -08:00
Arthur Eubanks 1172712f46 [NFC] Replace some deprecated getAlignment() calls with getAlign()
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D115370
2021-12-09 08:43:19 -08:00
Nikita Popov a3a478be40 [Inliner] Add debug message for history skip (NFC) 2021-12-09 15:11:56 +01:00
Joseph Huber 744aa09f52 [OpenMP] Make reduction functions SPMD compatible
Reduction functions were guarded before which was wrong, these are SPMD
compatible.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115159
2021-12-06 12:32:02 -05:00
Joseph Huber 9ea5b97203 [OpenMP][FIX] Invalidate the SPMDCompatibilityTracker explicitly
Before SPMDzation it was sufficient to add an incompatible instruction
to the SPMDCompatibilityTracker. However, now adding instructions means
they need guarding. As calls cannot be guarded in general we need to
explicitly prevent SPMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115158
2021-12-06 12:31:57 -05:00
Matt Arsenault a25111c9e2 Attributor: Fix typo in function name 2021-12-04 11:25:22 -05:00
Philip Reames 7b54de5fef [funcattrs] Fix a bug in recently introduced writeonly argument inference
This fixes a bug in 740057d.  There's two ways to describe the issue:
* One caller hasn't yet proven nocapture on the argument.  Given that, the inference routine is responsible for bailing out on a potential capture.
* Even if we know the argument is nocapture, the access inference needs to traverse the exact set of users the capture tracking would (or exit conservatively).  Even if capture tracking can prove a store is non-capturing (e.g. to a local alloc which doesn't escape), we still need to track the copy of the pointer to see if it's later reloaded and accessed again.

Note that all the test changes except the newly added ones appear to be false negatives.  That is, cases where we could prove writeonly, but the current code isn't strong enough.  That's why I didn't spot this originally.
2021-12-03 08:57:15 -08:00
Hongtao Yu 4e24ca1cdc [CSSPGO] Turn on Profi by default
As titled.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D115011
2021-12-02 17:56:38 -08:00
Philip Reames 740057d185 [funcattrs] Infer writeonly argument attribute
This change extends the current logic for inferring readonly and readnone argument attributes to also infer writeonly.

This change is deliberately minimal; there's a couple of areas for follow up.
* I left out all call handling and thus any benefit from the SCC walk. When examining the test changes, I realized the existing code is imprecise, and am going to fix that in it's own revision before adding in the writeonly handling. (Mostly because updating the tests is hard when I, the human, can't figure out whether the result is correct.)
* I left out handling for storing a value (as opposed to storing to a pointer). This should benefit readonly/readnone as well, and applies to a bunch of other instructions. Seemed worth having as a separate review.

Differential Revision: https://reviews.llvm.org/D114963
2021-12-02 13:04:09 -08:00
spupyrev 7cc2493daa profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

UPD Dec 1st 2021:
- synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`;
- added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109860
2021-12-01 15:30:38 -08:00
Nikita Popov 8d1759c404 [GlobalOpt] Simplify CleanupConstantGlobalUsers()
This bases the CleanupConstantGlobalUsers() implementation around
the ConstantFoldLoadFromConst() API. The general approach is that
we discover all users while looking through casts, and then
constant fold loads and drop stores and memintrinsics.

This avoids special cases and limitations in the previous
implementation, which is also incompatible with opaque pointers.
The result is a bit more powerful than before, because we now use
more general load folding logic which can for example look through
pointer bitcasts between different sizes. This is where the test
changes come from, as we now fold more loads and can thus remove
more globals.

Differential Revision: https://reviews.llvm.org/D114889
2021-12-01 21:06:25 +01:00
Joseph Huber 058c312a44 [OpenMP][FIX] SPMDzation guarding needs to account for all reaching kernels
If two reaching kernels disagree on the execution mode we cannot guard a
function right now. Ensure we do not as we otherwise will cause a
deadlock.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D114866
2021-12-01 11:44:32 -05:00
Hongtao Yu bf317f6698 [CSSPGO] Sorting nodes in a cycle of profiled call graph.
For nodes that are in a cycle of a profiled call graph, the current order the underlying scc_iter computes purely depends on how those nodes are reached from outside the SCC and inside the SCC, based on the Tarjan algorithm. This does not honor profile edge hotness, thus does not gurantee hot callsites to be inlined prior to cold callsites. To mitigate that, I'm adding an extra sorter on top of scc_iter to sort scc functions in the order of callsite hotness, instead of changing the internal of scc_iter.

Sorting on callsite hotness can be optimally based on detecting cycles on a directed call graph, i.e, to remove the coldest edge until a cycle is broken. However, detecting cycles isn't cheap. I'm using an MST-based approach which is faster and appear to deliver some performance wins.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D114204
2021-11-30 09:01:08 -08:00
Joseph Huber 7986a5f23e [OpenMP] Add RTL function to externalization RAII
This patch adds the `__kmpc_get_warp_size` OpenMP RTL function to the
externalization RAII struct. This was getting optimized out and then
being replaced with an undefined value once added back in, causing bugs
for complex reductions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D114802
2021-11-30 10:19:06 -05:00
Mehdi Amini 1392b654ff Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)"
This reverts commit 884b6dd311.
The windows build is broken with a linker error.
2021-11-23 20:10:36 +00:00
spupyrev 884b6dd311 profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109860
2021-11-23 11:02:40 -08:00
Zarko Todorovski 0d3add216f [llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts
Reworded some comments and asserts to avoid usage of `sanity check/test`

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114372
2021-11-23 13:22:55 -05:00
Philip Reames 065f777d27 Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)"
This reverts commit b00fc19822.  This change fails to build (link) on ubuntu x86,
2021-11-23 09:18:28 -08:00
spupyrev b00fc19822 profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109860
2021-11-23 09:08:30 -08:00
Arthur Eubanks e3e25b5112 [NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor
In a CGSCC pass manager, we may visit the same function multiple times
due to SCC mutations. In the inliner pipeline, this results in running
the function simplification pipeline on a function multiple times even
if it hasn't been changed since the last function simplification
pipeline run.

We use a newly introduced analysis to keep track of whether or not a
function has changed since the last time the function simplification
pipeline has run on it. If we see this analysis available for a function
in a CGSCCToFunctionPassAdaptor, we skip running the function passes on
the function. The analysis is queried at the end of the function passes
so that it's available after the first time the function simplification
pipeline runs on a function. This is a per-adaptor option so it doesn't
apply to every adaptor.

The goal of this is to improve compile times. However, currently we
can't turn this on by default at least for the higher optimization
levels since the function simplification pipeline is not robust enough
to be idempotent in many cases, resulting in performance regressions if
we stop running the function simplification pipeline on a function
multiple times. We may be able to turn this on for -O1 in the near
future, but turning this on for higher optimization levels would require
more investment in the function simplification pipeline.

Heavily inspired by D98103.

Example compile time improvements with flag turned on:
https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113947
2021-11-17 09:06:46 -08:00
Hongtao Yu 042cefd2b5 [CSSPGO] Fix a hash code truncating issue in ContextTrieNode.
std::hash returns a 64bit hash code while previously we were using only lower 32 bits which caused hash collision for large workloads.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D113688
2021-11-16 11:01:52 -08:00
Arthur Eubanks 19867de9e7 [NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.

So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.

This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.

However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.

Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).

Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss

D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation.  D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113304
2021-11-15 14:44:53 -08:00
Kazu Hirata feb40a3a47 [llvm] Use range-based for loops with instructions (NFC) 2021-11-14 19:40:48 -08:00
Kazu Hirata d243cbf8ea [llvm] Use isa instead of dyn_cast (NFC) 2021-11-14 19:40:46 -08:00
Mircea Trofin a32c2c3808 [NFC] Use Optional<ProfileCount> to model invalid counts
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.

Differential Revision: https://reviews.llvm.org/D113839
2021-11-14 19:03:30 -08:00
Kazu Hirata 7379736774 [llvm] Use range-based for loops with User::operands (NFC) 2021-11-14 09:32:38 -08:00
Kazu Hirata 098e935174 [llvm] Use range-based for loops with CallBase::args (NFC) 2021-11-14 09:32:36 -08:00
Kazu Hirata 7505b7045f [llvm] Use GetElementPtrInst::indices (NFC) 2021-11-13 21:43:28 -08:00
Joel E. Denny c9dfe322ee [OpenMP] Fix main thread barrier for Pascal and amdgpu
Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113602
2021-11-12 11:18:45 -05:00
Joseph Huber e52937eba0 [OpenMP] Use AAAssumptionInfo to get assumptions in OpenMPOpt
This patch uses the abstract attributor introduced in D111054 to get the
assumption values instead of the `hasAssumption` function. This also
calls it so assumption information should propagate throug the device
where applicabile.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111445
2021-11-09 17:39:21 -05:00
Joseph Huber b8a825b483 [Attributor] Introduce AAAssumptionInfo to propagate assumptions
This patch introduces a new abstract attributor instance that propagates
assumption information from functions. Conceptually, if a function is
only called by functions that have certain assumptions, then we can
apply the same assumptions to that function. This problem is similar to
calculating the dominator set, but the assumptions are merged instead of
nodes.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111054
2021-11-09 17:39:18 -05:00
Liqiang Tao 6cad45d5c6 [llvm][Inline] Add a module level inliner
Add module level inliner, which is a minimum viable product at this point.
Also add some tests for it.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-August/152297.html

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D106448
2021-11-09 11:03:29 +08:00
Arthur Eubanks 28a06a1b87 [NFC][FuncAttrs] Keep track of modified functions
This is in preparation for only invalidating analyses on changed
functions.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113303
2021-11-08 15:04:56 -08:00
Kazu Hirata 0d182d9d1e [Transforms] Use make_early_inc_range (NFC) 2021-11-07 17:03:15 -08:00
Benjamin Kramer 9b8b16457c Put implementation details into anonymous namespaces. NFCI. 2021-11-07 15:18:30 +01:00
Kazu Hirata 87e53a0ad8 [llvm] Use make_early_inc_range (NFC) 2021-11-05 19:39:07 -07:00
Sjoerd Meijer 3fd1902ad8 [FuncSpec] Enable it only with -O3
Function specialisation was running at all optimisation levels (if enabled on
the command line, it is not on by default). That was an oversight and not
something we want to do. Function specialisation duplicates functions when it
triggers, so the backend is processing more functions/instructions resulting in
compile-time increases, which seems more appropriate with -O3 and inline with
GCC. Please note that since function specialisation is not enabled by default,
this didn't require updating any pass manager tests.

Differential Revision: https://reviews.llvm.org/D112129
2021-11-04 13:59:00 +00:00
Tim Northover 3d39612b3d Coroutines: don't infer function attrs before lowering
Coroutines have weird semantics that don't quite match normal LLVM functions,
so trying to infer even simple attributes based on thier contents can go wrong.
2021-11-04 10:24:28 +00:00
Arthur Eubanks 88052fc362 [ArgPromo] Preserve FunctionAnalysisManagerCGSCCProxy
We already make sure to properly clear analyses for deleted functions.

This makes investigating some future potential compile time improvements easier.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113032
2021-11-03 14:56:58 -07:00
Johannes Doerfert d61aac76bf [OpenMP][FIX] Do not signal SPMD-mode but then keep generic-mode
If we assume SPMD-mode during the fixpoint iteration we have to execute
the kernel in SPMD-mode. If we change our mind during manifest there is
the chance of a mismatch between the simplification, e.g., of
`__kmpc_is_spmd_exec_mode` calls, and the execution mode. This problem
was introduced in D109438.

This patch is compromise to resolve the problem purely in OpenMP-opt
while trying to keep the benefits of D109438 around. This might not
always work, see `get_hardware_num_threads_in_block_fold` but it often
does. At the same time we do keep value specialization and execution
mode in sync.

Proper solutions to this problem should be considered. I believe a new
execution mode is the easiest way forward (Singleton-SPMD).
Alternatively, SPMD-mode execution can be used with a way to provide a
new thread_limit (here 1) to the runtime. This is more general and could
be useful if we see `num_threads` clauses or workshared loops with small
trip counts in the kernel. In either proposal we need to disable the
guarding for the kernel (which was the motivation for D109438).

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D112894
2021-11-02 23:22:04 -05:00
Johannes Doerfert 73720c8059 [OpenMP][FIX] Introduce and use a simple generic-mode barrier
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112893
2021-11-02 23:22:01 -05:00
Johannes Doerfert e6e440ae5f [OpenMP][FIX] Ensure guarding uses proper global name
Global symbols cannot have any name so we need to sanitize the string
first. Also remove an assertion that is not actually necessary nor
true in general.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D112892
2021-11-02 23:21:53 -05:00
Kazu Hirata 1b108ab975 [Transforms] Use make_early_inc_range (NFC) 2021-11-02 18:13:23 -07:00
Kazu Hirata c714da2ceb [Transforms] Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC) 2021-10-31 07:57:32 -07:00
Kazu Hirata c8b1ed5fb2 [clang, llvm] Use Optional::getValueOr (NFC) 2021-10-30 19:00:21 -07:00
Roman Lebedev 25043c8276
[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate
As noted in https://reviews.llvm.org/D90924#inline-1076197
apparently this is a pretty common pattern,
let's not repeat it yet again, but have it in a common place.

There may be some more places where it could be used,
but these are the most obvious ones.
2021-10-30 17:50:06 +03:00
modimo 5caad9b5d3 [InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner
Adds the following switches:

1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are:

 1. Original: defers to original advisor
 2. AlwaysInline: inline all sites not in replay
 3. NeverInline: inline no sites not in replay

2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are:

  1. Line
  2. LineColumn
  3. LineDiscriminator
  4. LineColumnDiscriminator

Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks.

All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using:
1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function
2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system
3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang

An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another.

Updated various tests to cover the new switches and negative remarks

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D112040
2021-10-29 12:32:03 -07:00
modimo 51ce567b38 [SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect
Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO.

Testing:
ninja check-all

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D112033
2021-10-29 12:04:52 -07:00
Jay Foad 1b758925ad [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction
createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.

A follow-up patch will remove createReplacementInstr.

Differential Revision: https://reviews.llvm.org/D112791
2021-10-29 15:02:58 +01:00
Yuanfang Chen c18ed69873 [Internalize] Preserve __stack_chk_fail in Internalizer correctly
Move the section collecting `AlwaysPreserved` up before any
`maybeInternalize` is called. Otherwise, functions in `AlwaysPreserved` (in this case, `__stack_chk_fail`)
are not preserved.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D112684
2021-10-28 11:22:26 -07:00
Johannes Doerfert acf3093117 [Attributor][FIX] Do not ignore memory writes in AAMemoryBehavior
Even if we look for `nocapture` we need to bail on escaping pointers.
The crucial thing is that we might not look at a big enough scope when
we derive the memory behavior. Thus, it might be `nocapture` in a larger
context while it is "captured" in a smaller context.
2021-10-27 21:04:32 -05:00
Johannes Doerfert 734f91441d [Attributor][NFC] Improve debug messages 2021-10-27 21:04:31 -05:00
Johannes Doerfert 8a4551b893 [Attributor][FIX] Use right address space to avoid assertion
When we strip and accumulate constant offsets we need to pick the right
address space such that the offset APInt has the right bit width.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112544
2021-10-27 18:22:37 -05:00
Nick Desaulniers 3ccd041af9 [LowerTypeTests] Emit cfi_jt aliases regardless of function export
A constant complaint we get is that the __typeid__ symbols in the CFI
jump tables causes confusing stack traces in applications. Emit the more
readable cfi_jt aliases regardless of function export (LTO vs Thin LTO).

Reviewed By: pcc, tejohnson

Differential Revision: https://reviews.llvm.org/D107934
2021-10-27 11:36:26 -07:00
Arthur Eubanks 4a9db7367d [AlwaysInliner] Invalidate analyses when we delete functions
Fixes PR52292.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D112473
2021-10-25 13:36:32 -07:00
Nikita Popov 75384ecdf8 [InstSimplify] Refactor invariant.group load folding
Currently strip.invariant/launder.invariant are handled by
constructing constant expressions with the intrinsics skipped.
This takes an alternative approach of accumulating the offset
using stripAndAccumulateConstantOffsets(), with a flag to look
through invariant.group intrinsics.

Differential Revision: https://reviews.llvm.org/D112382
2021-10-25 10:56:25 +02:00
Kazu Hirata 9800731367 [Target, Transforms] Use predecessors instead of pred_begin and pred_end (NFC) 2021-10-24 17:35:35 -07:00
Nikita Popov 5bb7562962 [Attributor] Generalize GEP construction
Make use of the getGEPIndicesForOffset() helper for creating GEPs.
This handles arrays as well, uses correct GEP index types and
reduces code duplication.

Differential Revision: https://reviews.llvm.org/D112263
2021-10-22 18:30:43 +02:00
Itay Bookstein 08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
modimo 313c657fce [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope
The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks.

This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself.

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D110658
2021-10-18 13:08:39 -07:00
Sjoerd Meijer 67a58fa3a6 [FuncSpec] Don't run the solver if there's nothing to do
Even if there are no interesting functions, the SCCP solver would still run
before bailing. Now bail earlier, avoid running the solver for nothing.

Differential Revision: https://reviews.llvm.org/D111645
2021-10-13 19:05:19 +01:00
Mircea Trofin ea4a6c8426 [Inline] Make sure the InlineAdvisor is correctly cleared.
If another inlining session came after a ModuleInlinerWrapperPass, the
advisor alanysis would still be cached, but its Result would be cleared.
We need to clear both.

This addresses PR52118

Differential Revision: https://reviews.llvm.org/D111586
2021-10-12 10:42:41 -07:00
Hongtao Yu 098a0d8fbc [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110847
2021-10-12 09:44:12 -07:00
Sjoerd Meijer fc0fa85171 [FuncSpec] Allow ConstExprs that are function pointers
This is a follow up of D110529 that disallowed constexprs. That change
introduced a regression as this also disallowed constexprs that are function
pointers, which is actually one of the motivating use cases that we do want to
support.

Differential Revision: https://reviews.llvm.org/D111567
2021-10-12 11:44:26 +01:00
Itay Bookstein 40ec1c0f16 [IR][NFC] Rename getBaseObject to getAliaseeObject
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
2021-10-06 19:33:10 -07:00
Kuba Mracek 7329abf2f8 [GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots
Differential Revision: https://reviews.llvm.org/D109114
2021-10-06 15:55:55 -07:00
Simon Pilgrim 21661607ca [llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
2021-10-06 12:04:30 +01:00
Mircea Trofin 7d541eb4d4 [inliner] Mandatory inlining decisions produce remarks
This also removes the need to disable the mandatory inlining phase in
tests.

In a departure from the previous remark, we don't output a 'cost' in
this case, because there's no such thing. We just report that inlining
happened because of the attribute.

Differential Revision: https://reviews.llvm.org/D110891
2021-10-05 14:01:25 -07:00
wlei fb29d812e4 [CSSPGO] Rename the field of SampleContextFrame
Differential Revision: https://reviews.llvm.org/D110980
2021-10-04 19:06:59 -07:00
Joseph Huber f074a6a041 [OpenMP] Add options to change Attributor max iterations in OpenMPOpt
This patch adds a new command line option `openmp-opt-max-iterations`
that controls the maximum number of iterations the attributor will run
for when compiling OpenMP target device code. This patch also adds a
remark to indicate when the attributor failed because it did not run
for enough iterations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110749
2021-10-04 09:39:04 -04:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Kazu Hirata 4f0225f6d2 [Transforms] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-01 09:57:40 -07:00
Joseph Huber c11ebfea6d [OpenMP][NFC] Fix linting messages in OpenMPOpt
Summary:
This patch addresses some linting messages I keep getting in my editor
when working on OpenMPOpt.
2021-09-29 16:07:33 -04:00
Joseph Huber 87ce7e65f2 [OpenMP] Add missing distribute definitions to AAKernelInfo
Summary:
The RTL functions added in https://reviews.llvm.org/D110429 were
mistakenly left out from the list of safe runtime calls in AAKernelInfo.
This patch adds them in.
2021-09-29 16:06:34 -04:00
modimo 20faf78919 [ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation
Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.

This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:
1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.

Implementation-wise this adds the following summary function attributes:
1. noUnwind: function is noUnwind
2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions)
3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)

Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D36850
2021-09-27 12:28:07 -07:00
Sjoerd Meijer eba76056a3 [FuncSpec] Don't specialise (or crash) on poison or constexpr values
Function specialization was crashing on poison values and constexpr values.
The problem is that these values are not added to the solver, so it crashes
when a lookup is performed for these values. This fixes that by not
specialising on these values. For poison that is obvious, but for constexpr
this is a change in behaviour. Thus, in one way this is a bit of a stopgap, but
specialising on constexpr values wasn't done very intentionally, and need some
more work and tests if we wanted to support this.

As a follow up, we need to look if the solver should exit more gracefully and
return a "don't know", or that it should really support these constexprs.

This should fix PR51600 (https://bugs.llvm.org/show_bug.cgi?id=51600).

Differential Revision: https://reviews.llvm.org/D110529
2021-09-27 14:58:53 +01:00
Teresa Johnson 96cb97c453 [ThinLTO] Update combined index for SamplePGO indirect calls to locals
In ThinLTO for locals we normally compute the GUID from the name after
prepending the source path to get a unique global id. SamplePGO indirect
call profiles contain the target GUID without this uniquification,
however (unless compiling with -funique-internal-linkage-names).

In order to correctly handle the call edges added to the combined index
for these indirect calls, during importing and bitcode writing we
consult a map of original to full GUID to identify the actual callee.
However, for a large application this was consuming a lot of compile
time as we need to do this repeatedly (especially during importing where
we may traverse call edges multiple times).

To fix this implement a suggestion in one of the FIXME comments, and
actually modify the call edges during a single traversal after the index
is built to perform the fixups once. I combined this fixup with the dead
code analysis performed on the index in order to avoid adding an
additional walk of the index. The dead code analysis is the first
analysis performed on the index.

This reduced the time required for a large thin link with SamplePGO by
about 20%.

No new test added, but I confirmed that there are existing tests that
will fail when no fixup is performed.

Differential Revision: https://reviews.llvm.org/D110374
2021-09-24 12:29:49 -07:00
Bjorn Pettersson c5e0313e44 [ModuleInlinerWrapperPass] Do some naive printing of wrapped pipeline with -print-pipeline-passes
Bisecting and reducing opt pipelines that includes the
ModuleInlinerWrapperPass has turned out to be a bit problematic.
This is far from perfect (it still lacks information about inline
advisor params etc.), but it should give some kind of hint to what
the wrapped pipeline looks like when using -print-pipeline-passes.

Reviewed By: aeubanks, mtrofin

Differential Revision: https://reviews.llvm.org/D109878
2021-09-23 09:54:42 +02:00
Johannes Doerfert c6457dcae8 [OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state
This patch fixes a problem when the AAKernelInfo state was invalidated,
e.g., due to `optnone` for a kernel, but not all parts indicated the
invalidation properly. We further eliminate most full state
invalidations as they should never be necessary.

Differential Revision: https://reviews.llvm.org/D109468
2021-09-23 00:04:30 -05:00
Johannes Doerfert 0a16c56010 [OpenMP][NFC] Improve debug output 2021-09-23 00:04:29 -05:00
Shilei Tian 423d34f74a [OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
Joseph Huber 1cf86df883 [OpenMP] Make sure the Thread ID function is not removed
Summary:
The thread ID function was reintroduced in D110195, but could
potentially be removed by the optimizer. Make the function noinline to
preserve the call sites and add it to the externalization RAII so its
definition is not removed by the attributor.
2021-09-22 10:13:18 -04:00
Florian Hahn a7c6471a85
[Passes] Run vector-combine early with -fenable-matrix.
IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.

This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like

    using matrix_t = double __attribute__((matrix_type(15, 15)));

    void foo(unsigned i, matrix_t &A, matrix_t &B) {
      for (unsigned j = 0; j < 4; ++j)
        for (unsigned k = 0; k < i; k++)
          B[k][j] -= A[k][j] * B[i][j];
    }

https://clang.godbolt.org/z/6dKxK1Ed7

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D102496
2021-09-22 12:48:32 +01:00
Wenlei He 5f187f0afa [SamplePGO] Add switch to honor zero count on block level as accurate
Add a new LLVM switch `-profile-sample-block-accurate` to trust zero block counts for branches. Currently we leave out such zero counts when annotating branch weight metadata, which would lead to weights being considered as unknown.

Differential Revision: https://reviews.llvm.org/D110117
2021-09-21 17:06:37 -07:00
Bjorn Pettersson c8cb7f611f [NewPM] Make InlinerPass (aka 'inline') a parameterized pass
In default pipelines the ModuleInlinerWrapperPass is adding the
InlinerPass to the pipeline twice, once due to MandatoryFirst (passing
true in the ctor) and then a second time with false as argument.

To make it possible to bisect and reduce opt test cases for this
part of the pipeline we need to be able to choose between the two
different variants of the InlinerPass when running opt. This patch is
changing 'inline' to a CGSCC_PASS_WITH_PARAMS in the PassRegistry,
making it possible run opt with both -passes=cgscc(inline) and
-passes=cgscc(inline<only-mandatory>).

Reviewed By: aeubanks, mtrofin

Differential Revision: https://reviews.llvm.org/D109877
2021-09-20 12:52:52 +02:00
Joseph Huber 27905eeb89 [Attributor] Change AAExecutionDomain to check intrinsic edges
The AAExecutionDomain instance checks if a BB is executed by the main
thread only. Currently, this only checks the `__kmpc_kernel_init` call
for generic regions to indicate the path taken by the main thread. In
the new runtime, we want to be able to detect basic blocks even in SPMD
mode. For this we enable it to check thread-ID intrinsics being compared
to zero as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109849
2021-09-17 19:51:38 -04:00
Simon Pilgrim 77f6c0bcaa Fix Wdocumentation warnings. NFCI.
Fix parameter name typos and drop returns statements from void functions
2021-09-17 12:45:56 +01:00
Sjoerd Meijer 97cc678cc4 [FuncSpec] Specialising on addresses of const global values.
This introduces an option to allow specialising on the address of global
values. This option is off by default because it is likely not that profitable
to do so and needs more investigation. Before, we were specialising on addresses
and thus this changes the default behaviour.

Differential Revision: https://reviews.llvm.org/D109775
2021-09-17 08:07:05 +01:00
Christudasan Devadasan 167ff5280d [GlobalOpt] Do not shrink global to bool for an unfavorable AS
Do not call `TryToShrinkGlobalToBoolean` for address spaces
that don't allow initializers. It inserts an initializer value
while shrinking to bool. Used the target hook introduced with
D109337 to skip this call for the restricted address spaces.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109823
2021-09-16 23:13:30 -04:00
Nikita Popov 0fc624f029 [IR] Return AAMDNodes from Instruction::getMetadata() (NFC)
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.

Differential Revision: https://reviews.llvm.org/D109852
2021-09-16 21:06:57 +02:00
Kazu Hirata 24c8eaec94 [Transforms] Use make_early_inc_range (NFC) 2021-09-15 19:55:24 -07:00
Markus Lavin 1ac209ed76 [NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.

Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.

The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).

LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch

Differential Revision: https://reviews.llvm.org/D109310
2021-09-15 08:34:04 +02:00
Matt Arsenault fdd9761dd1 Attributor: Fix crash on undef in !callees 2021-09-14 19:49:34 -04:00
Kazu Hirata d9e46beace [IPO] Use make_early_inc_range (NFC) 2021-09-14 08:59:36 -07:00
Jon Chesterfield 71052ea1e3 [openmp] Apply code change from D109500 2021-09-13 18:33:53 +01:00
Jon Chesterfield bfcf979978 Revert "[openmp] Fix 51647, corrupt bitcode on amdgpu"
This reverts commit d5c049a3f6.
Going to re-commit it in pieces for easier application to 13
2021-09-13 18:25:07 +01:00
dpalermo d5c049a3f6 [openmp] Fix 51647, corrupt bitcode on amdgpu
Patch by @dpalermo

The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader.

This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use.

This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space.

Reviewed By: ronlieb, jdoerfert, dpalermo-phab

Differential Revision: https://reviews.llvm.org/D109500
2021-09-13 15:24:48 +01:00
Kuter Dinel 9a193bdc81 [Attributor][FIX] AACallEdges, fix propagation error.
This patch fixes a error made in 2cc6f7c8e1. That patch
added a call site position but there was a small error with the way
the presence of a unknown call edge was being propagated from call site
to function. This patch fixes that error. This error was effecting some
AMDGPU tests.
2021-09-13 03:45:26 +03:00
Kuter Dinel 66a0b3464c [Attributor] AAFunctionReachability, Handle CallBase Reachability.
This patch makes it possible to query callbase reachability
(Can a callbase reach a function Fn transitively).
The patch moves the reachability query handling logic to a member class,
this class will have more users within the AA once we add other function
reachability queries.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106402
2021-09-13 01:35:44 +03:00
Kuter Dinel 2cc6f7c8e1 [Attributor] Create a call site position for AACalledges
This patch adds a call site position for AACallEdges, this
allows us to ask questions about which functions a specific
`CallBase` might call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106208
2021-09-13 01:17:05 +03:00
Kazu Hirata e030d31fda [GlobalOpt] Use make_early_inc_range (NFC) 2021-09-11 07:23:22 -07:00
Joseph Huber 7eb899cbcd [OpenMP] Add more verbose remarks for runtime folding
We peform runtime folding, but do not currently emit remarks when it is
performed. This is because it comes from the runtime library and is
beyond the users control. However, people may still wish to view  this
and similar information easily, so we can enable this behaviour using a
special flag to enable verbose remarks.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109627
2021-09-10 17:36:06 -04:00
Johannes Doerfert 99ea8ac9f1 Reapply "[OpenMP] Group side-effects to improve guarding efficiency"
This reapplies ca134c3963, effectively
reverting commit d2f206e0af.

Minor test changes to make the test pass.
2021-09-10 15:22:57 -05:00
Johannes Doerfert c09fbbdcfb Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals""
This reapplies commit 7dbba3376f, or, put
differently, this reverts commit d9a8d20827.

The test now requires the amdgpu and nvptx backend explicitly as it
won't work without properly.
2021-09-10 15:22:56 -05:00
Joseph Huber 9e2fc0ba37 [OpenMP] Check OpenMP assumptions on call-sites as well
This patch adds functionality to check assumption attributes on call
sites as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109376
2021-09-10 14:52:47 -04:00
Johannes Doerfert d2f206e0af Revert "[OpenMP] Group side-effects to improve guarding efficiency"
This reverts commit ca134c3963.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:24:00 -05:00
Johannes Doerfert d9a8d20827 Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"
This reverts commit 7dbba3376f.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:23:08 -05:00
Johannes Doerfert 7dbba3376f [GlobalOpt][FIX] Do not embed initializers into AS!=0 globals
Not all address spaces support initializers for globals and we can
therefore not set them without checking if they are allowed. This
patch adds a hook into TTI to check if an AS allows non-undef
initializers. We disable it for all but address space 0 by default,
NVPTX and AMDGPU targets allow all but address space 3.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109337
2021-09-10 12:08:50 -05:00
Johannes Doerfert ca134c3963 [OpenMP] Group side-effects to improve guarding efficiency
When we guard side-effects as part of SPMDzation we do it for
consecutive instructions that need guarding. This patch will try to
reorder guarded side-effects in a block to decrease the number of
guarded regions we need. It does not use any smarts, e.g., alias
analysis, to move side-effects over non-interfering reads. Instead,
it only moves side-effects downwards to the next guarded side-effect
if there was nothing in between that could have possibly be affected.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D109070
2021-09-10 12:08:48 -05:00
Sjoerd Meijer 4f9217c519 [FuncSpec] Don't specialise call sites that have the MinSize attribute set
The MinSize attribute can be attached to both the callee and the caller
in the callsite. Function specialisation was already skipped for function
declarations (callees) with MinSize. This also skips specialisations for
the callsite when it has MinSize set.

Differential Revision: https://reviews.llvm.org/D109441
2021-09-10 09:01:45 +01:00
Kazu Hirata 92c9ff6d5f [IR, Transforms] Use arg_empty (NFC) 2021-09-09 08:50:10 -07:00
Sjoerd Meijer ecff9e3da5 [FuncSpec] Fixed minor formatting issues. NFC. 2021-09-09 10:36:54 +01:00
Andrew Litteken 0087bb4a9a [IROutliner] Using canonical values to find corresponding values. (NFC)
D104143 introduced canonical value numbering between regions, which allows for the easy identification of items across a region, eliminating the need in the outliner to create parallel lists of instructions for each region, and replace output values in a less convoluted way.

Additionally, in a future commit, the output values will not necessarily be recorded values from the region itself, it could be a combination value where the actual value being output is a PHINode instead.  This new method allows us to handle the replacement of the output value to the stored value with the corresponding item in the same place for both normal output values, and PHINode outputs instead of handling the different types of outputs in different locations.

Reviewers: paquette, roelofs

Differential Revision: https://reviews.llvm.org/D108656
2021-09-08 11:36:05 -07:00
Joseph Huber 6b9a3ec3a2 [OpenMP] Do not SPMDize generic regions with no parallel
This patch changes SPMDization to not trigger for regions with no
parallelism. Otherwise, this will introduce unnecessary barriers that
will slow the single-threaded region down.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109438
2021-09-08 14:33:15 -04:00
Benjamin Kramer 373b7622c1 [IROutliner] Remove unused variable. NFC. 2021-09-08 18:33:41 +02:00
Andrew Litteken c172f1ad39 [IROutliner] Adding supports for multiple exits
When we start outlining across branches, there is the possibility that we will have two different blocks with different output locations, or a single branch that goes to two blocks outside of the region that is being outlined. While the CodeExtractor provides most of the mechanisms by using the return value of the extracted function as the input to a switch statement to correctly branch to the correct location, we need special handling for different output schemas to each location.

This is done by repeating the existing storing scheme for each different exit block. We have a map from the return values used, to the basic block that is used to store the outputs for that particular exit block within the outlined function. Then if needed, we create a switch statement for each return block to branch to the correct set of stored outputs.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106993
2021-09-08 08:58:07 -07:00
Sjoerd Meijer 88a2031207 [FuncSpec] Fix typo in option description. NFC. 2021-09-08 12:58:46 +01:00
Andrew Litteken 81d3ac0cf2 [IROutliner] Adding outlining for single entry/single exit multiblock regions
Using the similarity found from the IRSimilarity Identifier, we take regions with structural similarity, and deduplicate them into a separate function. The Code Extractor is able to provide most of this functionality.

For simplicity, we start by only outlining regions with a single entry and single exit branch, this reduces the complexity in handling phi nodes outside the region, and handling many sets of outputs for each of the different exit blocks.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D106990
2021-09-07 08:51:54 -07:00
Andrew Litteken bd4b1b5f6d [IRSim] Adding support for recognizing branch similarity
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.

This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.

We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.

Reviewers: paquette, yroux

Differential Revision: https://reviews.llvm.org/D106989
2021-09-06 11:55:38 -07:00
Michael Kruse 650bbc5620 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.

Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-04 19:18:58 -05:00
Kazu Hirata bb51f76fb1 [ForceFunctionAttrs] Add const (NFC) 2021-09-03 22:29:58 -07:00
Wenlei He 054487c5b2 [CSSPGO] Honor preinliner decision for ThinLTO importing
When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088
2021-09-02 17:29:26 -07:00
Kevin Athey 04ed6e7afc Revert "[CSSPGO] Honor preinliner decision for ThinLTO importing"
This reverts commit a2768b4732.

Breaks sanitizer-x86_64-linux-fast buildbot:
https://lab.llvm.org/buildbot/#/builders/5/builds/11334

Log snippet:
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/early-inline.ll (65549 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/early-inline.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -instcombine -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/einline.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fbf4de4009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/inline-cold.ll (65643 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/inline-cold.ll' FAILED ********************
Script:
--
: 'RUN: at line 4';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 5';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 8';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 11';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=9999999 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 14';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=-500 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fcd534a209a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  LLVM :: Transforms/SampleProfile/early-inline.ll
  LLVM :: Transforms/SampleProfile/inline-cold.ll
2021-09-02 14:48:31 -07:00
Wenlei He f7fff46acc [CSSPGO] Allow inlining recursive call for preinliner
When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining.

However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining.

The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner.

Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change.

This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used.

With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks)

Differential Revision: https://reviews.llvm.org/D109104
2021-09-02 11:24:27 -07:00
Wenlei He a2768b4732 [CSSPGO] Honor preinliner decision for ThinLTO importing
When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088
2021-09-02 08:24:06 -07:00
Wenlei He c000b8bd5c [CSSPGO] Use preinliner decision by default when available
For CSSPGO, turn on `sample-profile-use-preinliner` by default. This simplifies the use of llvm-profgen preinliner as it's now simply driven by ContextShouldBeInlined flag for each context profile without needing extra compiler switch.

Note that llvm-profgen's preinliner is still off by default, under switch `csspgo-preinliner`.

Differential Revision: https://reviews.llvm.org/D109111
2021-09-01 23:45:38 -07:00
Arthur Eubanks 52e6d70c40 [NFC] Use newly introduced *AtIndex methods
Introduced in D108788. These are clearer.
2021-09-01 11:18:41 -07:00
Hongtao Yu dde162d8a5 [CSSPGO] Fix an access violation due to invalided std::vector pointer invalidation.
std::vector pointers can be invalided while growing. Using std::list instead.
2021-09-01 10:24:17 -07:00
Hongtao Yu 7ca8030030 [CSSPGO] Enable loading MD5 CS profile.
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster.

There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D108342
2021-09-01 09:19:47 -07:00
Joseph Huber 29a74a3915 [OpenMP] Add an option to always inline OpenMP device functions.
Performance on GPU targets can be highly variable, sometimes inlining
everything hurts performance and sometimes it greatly improves it. Add
an option to toggle this behaviour to better investigate it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109014
2021-08-31 18:48:30 -04:00
Kuba Mracek 4c066bd08b [GlobalDCE] Handle relative pointers in VFE (for Swift vtables)
To support Virtual Function Elimination to Swift, this PR adds support for Swift
vtables which contain "relative pointers" instead of direct pointer references.
These are in the form of:

@symbol = ... {
  i32 trunc (i64 sub (i64 ptrtoint (<type> @target to i64), i64 ptrtoint (... @symbol to i64)) to i32)
}

The PR extends GlobalDCE's way of looking up a vtable offset into a dependency
to be able to see through this expression and find the target symbol.

Differential Revision: https://reviews.llvm.org/D107645
2021-08-31 07:07:22 -07:00
Hongtao Yu b9db70369b [CSSPGO] Split context string to deduplicate function name used in the context.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.

A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is  time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.

When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.

Testing
This is no-diff change in terms of code quality and profile content (for text profile).

For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.

The compile time of ads is dropped by 25%.

Differential Revision: https://reviews.llvm.org/D107299
2021-08-30 20:09:29 -07:00
Andrew Litteken c58d4c4bd3 [IROutliner] Changing outliner to prioritize reductions on assembly rather than IR instruction
Currently, the IROutliner uses a simple metric to outline the largest amount
of IR possible to outline first if it fits the cost model. This is model
loses out on smaller blocks of code that have higher reductions in cost that
are contained within larger blocks of IR.

This reverses the order, where we calculate all of the costs first, and then
reorder and extract items based on the calculated results.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106440
2021-08-30 13:43:08 -07:00
Andrew Litteken f564299fe9 [IROutliner] Ensure instructions at end of candidate are excluded
Occasionally instructions are between the last instruction in a region,
and the following instruction as identified by the Candidate.  This
adds an extra check right before splitting a candidate that excludes the region from being split/checked for outlining to remove errors.

Tests Added:
Tranforms/IROuutliner/outlining-extra-bitcasts.ll

Reviewer: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D104142
2021-08-30 09:30:26 -07:00
Andrew Litteken 063af63b96 [IRSim][IROutliner] Canonicalizing commutative value numbering between similarity sections.
When the initial relationship between two pairs of values between
similar sections is ambiguous to commutativity, arguments to the
outlined functions can be passed in such that the order is incorrect,
causing miscompilations.  This adds a canonical mapping to each
similarity section, so that we can maintain the relationship of global
value numbering from one section to another.

Added Tests:
Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering

Reviewers: jroelofs, jpaquette, yroux

Differential Revision: https://reviews.llvm.org/D104143
2021-08-27 15:02:56 -07:00
Johannes Doerfert 56e372b56e [Attributor][NFC] Silence unused variable warning 2021-08-27 16:38:13 -05:00
Johannes Doerfert e05940de2a [Attributor][FIX] Recursion via memory needs to be tracked explicitly
Recursion can happen when we see a PHI use the second time or when we
look at a store value operand use again. We already visited the
potential copies and doing so again will just cause endless looping.

Reviewed By: kuter

Differential Revision: https://reviews.llvm.org/D108190
2021-08-27 13:12:13 -05:00
Johannes Doerfert caa3b28260 [Attributor][FIX] Do not treat byval args as local memory (for now)
For now we do should not treat byval arguments as local copies performed
on the call edge, though, in general we should. To make that happen we
need to teach various passes, e.g., DSE, about the copy effect of a
byval. That would also allow us to mark functions only accessing byval
arguments as readnone again, atguably their acceses have no effect
outside of the function, like accesses to allocas.

Reviewed By: kuter

Differential Revision: https://reviews.llvm.org/D108140
2021-08-27 13:12:11 -05:00
Sanjay Patel 416a119f9e [GlobalOpt] don't hoist constant expressions that can trap
We try to forward a stored-once-constant-value from one global access
to another, but that's not safe if the constant value is an expression
that can trap.

The tests are reduced from the miscompile examples in:
https://llvm.org/PR47578

Differential Revision: https://reviews.llvm.org/D108771
2021-08-27 08:10:20 -04:00
Wenlei He a45d72e024 [CSSPGO] Add switch for sample loader to honor global pre-inliner decision from llvm-profgen
The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy.

Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner.

Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked.

Differential Revision: https://reviews.llvm.org/D108677
2021-08-25 17:20:15 -07:00
Wenlei He a6f15e9a49 [CSSPGO] Use probe inline tree to track zero size fully optimized context for pre-inliner
This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen.

Differential Revision: https://reviews.llvm.org/D108350
2021-08-25 09:01:11 -07:00
Vyacheslav Zakharin 2e192ab1f4 [CodeExtractor] Preserve topological order for the return blocks.
Differential Revision: https://reviews.llvm.org/D108673
2021-08-25 08:09:01 -07:00
Shimin Cui cea5ab090b [GlobalOpt] Fix the assert for null check of global value
This is to fix the reported assert - https://bugs.llvm.org/show_bug.cgi?id=51608.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D108674
2021-08-24 20:47:33 -04:00
Chuanqi Xu 2556f58148 [FuncSpec] Don't specialize function which are easy to inline
It would waste time to specialize a function which would inline finally.
This patch did two things:

- Don't specialize functions which are always-inline.
- Don't spescialize functions whose lines of code are less than threshold
(100 by default).

For spec2017int, this patch could reduce the number of specialized
functions by 33%. Then the compile time didn't increase for every
benchmark.

Reviewed By: SjoerdMeijer, xbolva00, snehasish

Differential Revision: https://reviews.llvm.org/D107897
2021-08-23 19:20:21 +08:00
Wenlei He eca03d2768 [CSSPGO] Track and use context-sensitive post-optimization function size to drive global pre-inliner in llvm-profgen
This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions.

To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can not query context byte size under switch `context-cost-for-preinliner`. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context.

The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from `Offset2LocStackMap`, which will be addressed in split context work; 2) using inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed.

Differential Revision: https://reviews.llvm.org/D108180
2021-08-18 22:50:57 -07:00
Rong Xu 5fdaaf7fd8 [SampleFDO] Flow Sensitive Sample FDO (FSAFDO) profile loader
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.

To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
     -disable-ra-fsprofile-loader=false \
     -disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.

Differential Revision: https://reviews.llvm.org/D107878
2021-08-18 18:37:35 -07:00
Arthur Eubanks 7557d6c896 [NFC] Cleanup calls to CallBase::getAttribute() 2021-08-18 09:39:33 -07:00
Joseph Huber 13d8f000d7 [OpenMP][NFC] Improve debug message for shared memory
Summary:
Make the debug message for HeapToShared more helpful by showing the
actual call.
2021-08-18 11:56:09 -04:00
Joseph Huber 58f9326487 [OpenMP] Change AAKernelInfo to ignore non-kernels
Currently, AAKernelInfo will fail on an assertion if we attempt to run
it on a kernel without the init / deinit runtime calls. However, this
occurs for global constructors on the device. This will cause OpenMPOpt
to crash whenever global constructors are present. This patch removes
this assertion and just gives up instead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108258
2021-08-18 11:24:29 -04:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Arthur Eubanks ad727ab7d9 [NFC] Migrate some callers away from Function/AttributeLists methods that take an index
These methods can be confusing.
2021-08-17 21:05:40 -07:00
Arthur Eubanks 46cf82532c [NFC] Replace Function handling of attributes with less confusing calls
To avoid magic constants and confusing indexes.
2021-08-17 21:05:40 -07:00
Arthur Eubanks 16890e0040 [GlobalOpt] Check stored once value's type before setting global initializer
In the provided test case, we were trying to set the global's
initializer to `i32* null` when the global's value type was `@0`.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D108232
2021-08-17 14:34:29 -07:00
Joseph Huber 339aa76526 [OpenMP][NFC] Add option to print module after OpenMPOpt for debugging
This patch adds an extra option to print the module after running one of
the OpenMPOpt passes if debugging is enabled. This makes it much easier
to inspect the effects of this pass when doing debugging.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108146
2021-08-17 12:46:10 -04:00
Nikita Popov 3c503ba06a [FunctionImport] Fix build with old mingw (NFC)
std::errc::operation_not_supported is not universally supported.
Make use of LLVM's errc interoperability header, which lists
known-good errc values.
2021-08-15 15:47:59 +02:00
Arthur Eubanks d7593ebaee [NFC] Clean up users of AttributeList::hasAttribute()
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().

Add hasRetAttr() since it was missing from AttributeList.
2021-08-13 11:59:18 -07:00
Arthur Eubanks 80ea2bb574 [NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes()
This is more consistent with similar methods.
2021-08-13 11:16:52 -07:00
Arthur Eubanks a0c42ca56c [NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
2021-08-13 10:58:21 -07:00
Giorgis Georgakoudis 60e643fe05 [OpenMP][Fix] Fix disable spmdization option
Besides SPMDization, other analysis and optimization for original, frontend-generated SPMD regions uses information from the AAKernelInfoFunction attribute. This fix makes sure disabling SPMDization through the corresponding option applies only to generic mode regions, which should not be SPMDized, while it leaves unaffected the attribute state of original SPMD regions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108001
2021-08-12 17:59:14 -07:00
Johannes Doerfert 4e7d7cae67 [Attributor][FIX] Do not try to rewrite functions with casted call sites
If we cast a function at the call site it is hard(er) to get the rewrite
correct, let's not attempt it for now.

Fixes PR51448.
2021-08-12 10:39:53 -05:00
Johannes Doerfert 5f543919b2 [Attributor][FIX] Guard constant casts with type size checks 2021-08-12 10:39:53 -05:00
Johannes Doerfert a420f80bf1 [Attributor] Do not delete volatile stores to null/undef
See D106309.

Differential Revision: https://reviews.llvm.org/D107906
2021-08-12 10:39:52 -05:00
Liqiang Tao 422fc5603a [llvm][Inline] Refactor out InlineOrder
Move InlineOrder to separated file.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D107831
2021-08-12 22:19:53 +08:00
Johannes Doerfert fc32a5c87d [Attributor][NFC] Try to make the windows build bots happy
Failed for some reason, potentially because of the inner type
declaration in combination with the `using`. This might help.

Failure:
https://lab.llvm.org/buildbot/#/builders/127/builds/15432
2021-08-11 01:11:37 -05:00
Johannes Doerfert e7e3585cde [Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly
PHI nodes are not pass through but change their value, we have to
account for that to avoid missing stores.

Follow up for D107798 to fix PR51249 for good.

Differential Revision: https://reviews.llvm.org/D107808
2021-08-11 00:49:54 -05:00
Johannes Doerfert 96da6dd6ba [Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249)
AAPointerInfoFloating needs to visit all uses and some multiple times if
we go through PHI nodes. Attributor::checkForAllUses keeps a visited set
so we don't recurs endlessly. We now allow recursion for non-phi uses so
we track all pointer offsets via PHI nodes properly without endless
recursion.

This replaces the first attempt D107579.

Differential Revision: https://reviews.llvm.org/D107798
2021-08-11 00:49:54 -05:00
Johannes Doerfert e0c5d83a92 [OpenMP][FIX] Disabled optimizations have to be made known
To avoid simplification with wrong constants we need to make sure we
know that we won't perform specific optimizations based on the users
request. The non-SPMDzation and non-CustomStateMachine flags did only
prevent the final transformation but allowed to value simplification
to go ahead.

Differential Revision: https://reviews.llvm.org/D107862
2021-08-11 00:49:53 -05:00
Chuanqi Xu 0fd03feb4b [FuncSpec] Return changed if function is changed by tryToReplaceWithConstant
The may get changed before specialization by RunSCCPSolver. In other
words, the pass may change the function without specialization happens.
Add test and comment to reveal this.
And it may return No Changed if the function get changed by
RunSCCPSolver before the specialization. It looks like a potential bug.

Test Plan: check-all

Reviewed By: https://reviews.llvm.org/D107622

Differential Revision: https://reviews.llvm.org/D107622
2021-08-06 17:00:17 +08:00
Chuanqi Xu 62fc3e0ad6 [NFC] [FuncSpec] Remove unused variables in isArgumentInteresting 2021-08-06 16:38:20 +08:00
Chuanqi Xu cc3f40bb41 [FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish)
Noticed that the computation for function specialization cost of a
function wouldn't change during the traversal of the arguments for the
function. We could hoist the computation out of the traversal. I
observed about ~1% improvement on compile time for spec2017. But I guess
it may not be precise. This should be NFC and fine.

Reviewed By: Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D107621
2021-08-06 15:43:05 +08:00
Chuanqi Xu 82ca845b47 [NFC] [FuncSpec] Update the Todo list for recursive functions
Now the recursive functions may get specialized many times when
`func-specialization-max-iters` increases. See discussion in
https://reviews.llvm.org/D106426 for details.
2021-08-06 14:43:17 +08:00
Giorgis Georgakoudis 29a3e3dd7b [OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106892
2021-08-04 11:49:24 -07:00
Sjoerd Meijer 30fbb06979 [FuncSpec] Support specialising recursive functions
This adds support for specialising recursive functions. For example:

    int Global = 1;
    void recursiveFunc(int *arg) {
      if (*arg < 4) {
        print(*arg);
        recursiveFunc(*arg + 1);
      }
    }
    void main() {
      recursiveFunc(&Global);
    }

After 3 iterations of function specialisation, followed by inlining of the
specialised versions of recursiveFunc, the main function looks like this:

    void main() {
      print(1);
      print(2);
      print(3);
    }

To support this, the following has been added:
- Update the solver and state of the new specialised functions,
- An optimisation to propagate constant stack values after each iteration of
  function specialisation, which is necessary for the next iteration to
  recognise the constant values and trigger.

Specialising recursive functions is (at the moment) controlled by option
-func-specialization-max-iters and is opt-in for compile-time reasons. I.e.,
the default is -func-specialization-max-iters=1, but for the example above we
would need to use -func-specialization-max-iters=3. Future work is to see if we
can increase the default, or improve the cost-model/heuristics to control
compile-times.

Differential Revision: https://reviews.llvm.org/D106426
2021-08-04 08:07:04 +01:00
Shimin Cui 2d9759c790 [GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc
Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107397
2021-08-03 19:22:53 -04:00
Sami Tolvanen 7ce1c4da77 ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Relands 700d07f8ce with -msvc targets
fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-08-03 11:35:30 -07:00
Shimin Cui 7ce98cf56e [GlobalOpt] Fix the assert for stored once non-pointer to global address
This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107302
2021-08-02 19:23:29 -04:00
Shimin Cui 732b05555c [GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc
I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D106589
2021-07-31 18:42:02 -04:00
Joseph Huber cd0dd8ece8 [OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding
This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
 - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
 - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
 - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
 - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106802
2021-07-29 19:28:31 -04:00
Joseph Huber adbaa39dfc [Attributor] Change function internalization to not replace uses in internalized callers
The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106931
2021-07-28 18:57:28 -04:00
Jose M Monsalve Diaz 5ab6aedda9 [OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033
2021-07-27 21:47:12 -04:00
Johannes Doerfert 3dca83961c Reapply "[Attributor] Disable simplification AAs if a callback is present""
This reapplies commit cbb709e251 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.

This reverts commit aa27430a62.
2021-07-27 19:14:50 -05:00
Johannes Doerfert aa27430a62 Revert "[Attributor] Disable simplification AAs if a callback is present"
This reverts commit cbb709e251 as it
breaks the tests, which was not supposed to happen. Investigating now.
2021-07-27 18:09:42 -05:00
Johannes Doerfert fd520e75f1 [Attributor] Verify `checkForAllUses` return value properly
Also do not emit more than one remark after Heap2Stack failed.
2021-07-27 17:50:27 -05:00
Johannes Doerfert cbb709e251 [Attributor] Disable simplification AAs if a callback is present
AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at
the IR by default. If queried for a IR position which has a
simplification callback we should either look at the callback return, or
give up. We do the latter for now.
2021-07-27 17:50:26 -05:00
Alexey Zhikhartsev 02077da7e7 Add jump-threading optimization for deterministic finite automata
The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

```
          sw.bb                        sw.bb
        /   |   \                    /   |   \
   case1  case2  case3          case1  case2  case3
        \   |   /                /       |       \
        latch.bb             latch.2  latch.3  latch.1
         br sw.bb              /         |         \
                           sw.bb.2     sw.bb.3     sw.bb.1
                            br case2    br case3    br case1
```

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99205
2021-07-27 14:34:04 -04:00
Johannes Doerfert 70b75f62fc [OpenMP] Try to simplify all loads in device code
Eliminating loads/stores in the device code is worth the extra effort,
especially for the new device runtime.

At the same time we do not compute AAExecutionDomain for non-device code
anymore, there is no point.

Differential Revision: https://reviews.llvm.org/D106845
2021-07-27 01:44:15 -05:00
Johannes Doerfert c55e18824d [Attributor][FIX] Copy all members in the assignment operator
Also improve debug output slightly.
2021-07-27 01:44:13 -05:00
Johannes Doerfert d4bfce5521 [Attributor] Utilize the InstSimplify interface to simplify instructions
When we simplify at least one operand in the Attributor simplification
we can use the InstSimplify to work on the simplified operands. This
allows us to avoid duplication of the logic.

Depends on D106189

Differential Revision: https://reviews.llvm.org/D106190
2021-07-27 00:56:23 -05:00
wlei f0d41b58da [CSSPGO] Tweak ICP threshold in top-down inliner
This change slightly relaxed the current ICP threshold in top-down inliner, specifically always allow one ICP for it. It shows some perf improvements on SPEC and our internal benchmarks. Also renamed the previous flag. We can also try to turn off PGO ICP in the future.

Reviewed By: wenlei, hoy, wmi

Differential Revision: https://reviews.llvm.org/D106588
2021-07-26 21:49:20 -07:00
Johannes Doerfert 25a3130d89 [Local] Do not introduce a new `llvm.trap` before `unreachable`
This is the second attempt to remove the `llvm.trap` insertion after
https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6
reverted the first one. It is not clear what the exact issue was back
then and it might already be gone by now, it has been >5 years after
all.

Replaces D106299.

Differential Revision: https://reviews.llvm.org/D106308
2021-07-26 23:33:36 -05:00
Johannes Doerfert 41bd26dff9 [Attributor] Delete dead stores
D106185 allows us to determine if a store is needed easily. Using that
knowledge we can start to delete dead stores.

In AAIsDead we now track more state as an instruction can be dead (= the
old optimisitc state) or just "removable". A store instruction can be
removable while being very much alive, e.g., if it stores a constant
into an alloca or internal global. If we would pretend it was dead
instead of only removablewe we would ignore it when we determine what
values a load can see, so that is not what we want.

Differential Revision: https://reviews.llvm.org/D106188
2021-07-26 23:33:36 -05:00
Johannes Doerfert adddd3dbda [Attributor] Introduce getPotentialCopiesOfStoredValue and use it
This patch introduces `getPotentialCopiesOfStoredValue` which uses
AAPointerInfo to determine all "aliases" or "potential copies" of a
value that is stored into memory. This operation can fail but if it
succeeds it means we can visit all "uses" of a value even if it is
temporarily stored in memory.

There are two users for the function:
  1) `Attributor::checkForAllUses` which will now ignore the value use
     in a store if all "potential copies" can be identified and instead
     be visited. This allows various AAs, including AAPointerInfo
     itself, to look through memory.
  2) `AANoCapture` which uses a custom use tracking through the
     CaptureTracker interface and therefore needs to be thought
     explicitly.

Differential Revision: https://reviews.llvm.org/D106185
2021-07-26 23:33:36 -05:00
Shilei Tian e97e0a4fad [AbstractAttributor] Fold __kmpc_parallel_level if possible
Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible.
Note that `__kmpc_parallel_level` doesn't take activeness into consideration,
based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead
of 0, 129, 130, etc. that also indicate activeness.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106154
2021-07-26 22:46:19 -04:00
Johannes Doerfert be2b569646 [OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass
While rewriteDeviceCodeStateMachine should probably be folded into
buildCustomStateMachine, we at least need the optimization to happen.
This was not reliably the case in the CGSCC pass but in the Module pass
it seems to work reliably.

This also ports a test to the new kernel encoding (target_init/deinit),
and makes sure we cannot run the kernel in SPMD mode.

Differential Revision: https://reviews.llvm.org/D106345
2021-07-26 21:26:07 -05:00
Johannes Doerfert e6f3e648c9 [Attributor][FIX] Do not return CHANGED unconditionally
This caused us to rerun AAMemoryBehaviorFloating::updateImpl over and
over again. Unfortunately it turned out to be hard to reproduce the
behavior in a reasonable way.
2021-07-26 21:22:02 -05:00
Johannes Doerfert 8befd05aad [Attributor][FIX] Track change status for AAIsDead properly
If we add a new live edge we need to indicate a change or otherwise the
new live block is not shown to users. Similarly, new known dead ends and
a changed `ToBeExploredFrom` set need to cause us to return CHANGED.
2021-07-26 21:21:59 -05:00
Joseph Huber e757a3b05f [OpenMP][NFC] Remove unncessary capture in RAII struct
Summary:
There was an unnecessary variable assigned to the information cache when we
only need it in the constructor to extract the function declaration.
2021-07-26 15:05:55 -04:00
Joseph Huber 58725c12bb [OpenMP] Introduce RAII to protect certain RTL calls from DCE
This patch introduces a new RAII struct that will temporarily make an OpenMP
RTL function have external linkage. This is done before the attributor is
invoked to prevent it from incorrectly removing some function definitions that
we will use later. For example, if we determine all calls to one function are
dead, because it has internal linkage it can safely be removed. Later when we
try to get an instance to that function to modify the source using
`getOrCreateRuntimeFunction` we will then get an empty declaration for that
function that won't be defined anywhere. This patch prevents this from
occurring.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106707
2021-07-25 14:15:47 -04:00
Nikita Popov 087a8eea35 [Attributes] Clean up handling of UB implying attributes (NFC)
Rather than adding methods for dropping these attributes in
various places, add a function that returns an AttrBuilder with
these attributes, which can then be used with existing methods
for dropping attributes. This is with an eye on D104641, which
also needs to drop them from returns, not just parameters.

Also be more explicit about the semantics of the method in the
documentation. Refer to UB rather than Undef, which is what this
is actually about.
2021-07-25 18:21:13 +02:00
Kuter Dinel 96709823ec [AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: https://reviews.llvm.org/D104997
2021-07-24 06:07:15 +03:00
Kuter Dinel 0cd964ff25 [Attributor][FIX] checkForAllInstructions, correctly handle declarations
checkForAllInstructions was not handling declarations correctly.
It should have been returning false when it gets called on a declaration

The patch also fixes a test case for AAFunctionReachability for it to be able
to pass after the changes to the checkForAllinstructions.

Differential Revision: https://reviews.llvm.org/D106625
2021-07-24 02:21:29 +03:00
Shilei Tian ae69f46867 [AbstractAttributor] Refine logic to indicate pessimistic fixed point when folding `__kmpc_is_spmd_exec_mode`
Since we are using assumed information now, the logic should be refined to avoid
unncessary assertion.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106630
2021-07-23 13:36:47 -04:00
Giorgis Georgakoudis f97de4cb0b [OpenMPOpt] Move dedup runtime calls after init for target regions
Deduplication in OpenMPOpt finds redundant OpenMP runtime calls and replaces them with a single call placed in the earliest safe location in the IR. When deduplication happens in a target region this patch makes sure replacement calls are put after target_init.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106556
2021-07-23 05:54:01 -07:00
Johannes Doerfert 6ca969353c [Attributor] If provided, only look at simplification callbacks not IR
A simplification callback can mean that the IR value is modified beyond
the apparent IR semantics. That is, a `i1 true` could be replaced by an
`i1 false` based on high-level domain-specific information. If a user
provides a simplification callback we will not look at the IR but
instead give up if the callback returns a nullptr.
2021-07-22 23:57:37 -05:00
Giorgis Georgakoudis eaab880e45 [Attributor][Fix] Add overrides for AA2HS analysis 2021-07-22 18:20:14 -07:00
Giorgis Georgakoudis f8c40ed8f8 [OpenMP] Use AAHeapToStack/AAHeapToShared analysis in SPMDization
SPMDization D102307 detects incompatible OpenMP runtime calls to abort converting a target region to SPMD mode. Calls to memory allocation/de-allocation routines kmpc_alloc_shared, kmpc_free_shared are incompatible unless they are removed by AAHeapToStack/AAHeapToShared analysis. This patch extends SPMDization detection to include AAHeapToStack/AAHeapToShared analysis results for enlarging the scope of possible SPMDized regions detected.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105634
2021-07-22 18:08:37 -07:00
Hongtao Yu ab5ac659c8 [CSSPGO] Fix a typo in SampleContextTracker
Fixing a typo in SampleContextTracker to use debug name when debug linkage name is no present. This should only affect C programs.

Saw 0.6% perf win on Cinder which is mostly C code.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D106599
2021-07-22 16:44:50 -07:00
Shilei Tian 1a7f779022 [OpenMPOpt] Add support for BooleanStateWithSetVector
D101977 added `BooleanStateWithPtrSetVector` to store pointers to a set meanwhile
tracking boolean state. One of the limitation is that it can only store pointer.
We might want it to store other types of values, such as integer for parallel
level. This patch generalizes the idea and create `BooleanStateWithSetVector`.
`BooleanStateWithPtrSetVector` therefore becomes a type alias of `BooleanStateWithSetVector`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106149
2021-07-22 13:12:29 -04:00
Johannes Doerfert 0c0eb76782 [Attributor][FIX] Improve call graph updating
If we remove a non-intrinsic instruction we need to tell the (old) call
graph about it. This caused problems with some features down the line as
they allowed to removed calls more aggressively.
2021-07-22 00:07:56 -05:00
Johannes Doerfert 94d3b59c56 [Attributor][FIX] Do not introduce multiple instances of SSA values
If we have a recursive function we could create multiple instantiations
of an SSA value, one per recursive invocation of the function. This is a
problem as we use SSA value equality in various places. The basic idea
follows from this test:

```
static int r(int c, int *a) {
  int X;
  return c ? r(false, &X) : a == &X;
}

int test(int c) {
  return r(c, undef);
}
```

If we look through the argument `a` we will end up with `X`. Using SSA
value equality we will fold `a == &X` to true and return true even
though it should have been false because `a` and `&X` are from different
instantiations of the function.

Various tests for this  have been placed in value-simplify-instances.ll
and this commit fixes them all by avoiding to produce simplified values
that could be non-unique at runtime. Thus, the result of a simplify
value call will always be unique at runtime or the original value, both
do not allow to accidentally compare two instances of a value with each
other and conclude they are equal statically (pointer equivalence) while
they are unequal at runtime.
2021-07-22 00:07:55 -05:00
Johannes Doerfert c819266ecc [Attributor] Improve the Attributor::getAssumedConstant interface
Similar to Attributor::getAssumedSimplified we need to allow IRPs
directly to get the right simplification callback (and context).
2021-07-22 00:07:55 -05:00
Johannes Doerfert c4b1fe05dd [OpenMP][FIX] Use name + type checks not only name checks for calls
A call that is analyzed in an optimization needs to be verified against
the name and type of the runtime function to avoid that we look at
arguments that do not exist (anymore). This can happen if the signature
was rewritten. Since we will not set RFI.Declaration if the type doesn't
match we can use it (if it's not null) to determine if the signature is
as expected.

Differential Revision: https://reviews.llvm.org/D106341
2021-07-21 22:51:05 -05:00
Johannes Doerfert c7781a0978 [Attributor][NFC] Clang format 2021-07-21 22:51:05 -05:00
Joseph Huber 16206d17cd [OpenMP] Strip NoInline from known OpenMP runtime functions
This patch strips the NoInline attribute from known OpenMP runtime functions.
This is done so that we can denote certain runtime functions as NoInline to
ensure their call sites are intact so they can be checked by OpenMPOpt. We
don't wan't this noinline attribute to remain for any functions after OpenMPOpt
has been run however.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106482
2021-07-21 21:18:26 -04:00
Joseph Huber 196fe994b8 [OpenMP] Fold `__kmpc_is_generic_main_thread_id` if possible
This patch adds the ability to fold `__kmpc_is_generic_main_thread_id` if we
know for a fact that it is executed by the initial thread using
AAExecutionDomain. This combined with folding `__kmpc_is_spmd_exec_mode` will
allow us to fully fold `__kmpc_is_generic_main_thread`.

Depends on D106438 D106437

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106439
2021-07-21 21:18:22 -04:00
Joseph Huber 4a66860424 [OpenMP] Add an option to disable function internalization
Function internalization can sometimes occur in situations where we want to
keep the call sites intact. This patch adds an option to disable function
internalization and prevents the device runtime from being internalized while
creating the bitcode library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106438
2021-07-21 21:18:18 -04:00
Joseph Huber 7d57639264 [OpenMP] Add new execution mode for SPMD execution with Generic semantics
Qualified kernels can be transformed from generic-mode to SPMD mode using an
optimization in OpenMPOpt. This patch introduces a new execution mode to
indicate kernels that have been transformed from generic-mode to SPMD-mode.
These kernels have SPMD-mode execution, but need generic-mode semantics for
scheduling the blocks and threads. Without this far too few blocks will be
scheduled for a generic region as SPMD mode expects the trip count to be
divided by the number of threads.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D106460
2021-07-21 20:57:28 -04:00
Giorgis Georgakoudis 3f71b425b2 [Attributor] Preserve BBs and instructions added in AA manifests
Manifesting AbstractAttributes may add new BBs in the IR. This patch provides an interface to register those BBs in the Attributor so that those BBs and containing instructions are not deleted as dead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106383
2021-07-21 11:27:00 -07:00
Giorgis Georgakoudis b0e06e1fc0 [Attributor][NFC] Modify isAssumedHeapToStack for const argument
There is no need for a non-const argument interface and the const argument modification covers existing and upcoming use cases.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106418
2021-07-21 10:28:21 -07:00
Arthur Eubanks 8bc298d041 [NewPM][Inliner] Check if deleted function is in current SCC
In weird cases, the inliner will inline internal recursive functions,
sometimes causing them to have no more uses, in which case the
inliner will mark the function to be deleted. The function is
actually deleted after the call to
updateCGAndAnalysisManagerForCGSCCPass(). In
updateCGAndAnalysisManagerForCGSCCPass(), UR.UpdatedC may be set to
the SCC containing the function to be deleted. Then the inliner calls
CG.removeDeadFunction() which can cause that SCC to be deleted, even
though it's still stored in UR.UpdatedC.

We could potentially check in the wrappers/pass managers if UR.UpdatedC
is in UR.InvalidatedSCCs before doing anything with it, but it's safer
to do this as close to possible to the call to CG.removeDeadFunction()
to avoid issues with allocating a new SCC in the same address as
the deleted one.

It's hard to find a small test case since we need to have recursive
internal functions be reachable from non-internal functions, yet they
need to become non-recursive and not referenced by other functions when
inlined.

Similar to https://reviews.llvm.org/D106306.

Fixes PR50788.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D106405
2021-07-21 08:47:45 -07:00
Sami Tolvanen e901e581ef Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 700d07f8ce.

Reverting due to a ThinLTO+CFI breakage on -msvc targets.
2021-07-20 13:59:46 -07:00
Fangrui Song 3924877932 [IR] Rename `comdat noduplicates` to `comdat nodeduplicate`
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.

In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics.  The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106319
2021-07-20 12:47:10 -07:00
Nikita Popov 0c794abff1 [ThinTLOBitcodeWriter] Fix unused variable warning (NFC) 2021-07-20 20:19:47 +02:00
Sami Tolvanen 700d07f8ce ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them. This version uses module inline assembly
to avoid issues with LowerTypeTestsModule.

Relands commmit 8e3b5cb39e with arch
specific tests fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-07-20 10:30:02 -07:00
Giorgis Georgakoudis e8439ec893 [OpenMP] Set RequiresFullRuntime false in SPMDization
SPMDization in D102307 does not change the RequiresFullRuntime argument of kmpc_target_init/deinit calls. However, the constraints of SPMDization detection for converting a target region to SPMD mode should guarantee that the region does not require full runtime support. Hence, this patch sets RequiresFullRuntime to false for improved execution performance.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105556
2021-07-20 09:54:51 -07:00
Shimin Cui 0b043bb39b This patch extends the OptimizeGlobalAddressOfMalloc to handle the null check of global pointer variables. It is disabled with https://reviews.llvm.org/rGb7cd291c1542aee12c9e9fde6c411314a163a8ea. This PR is to reenable it while fixing the original problem reported. The fix is to set the store value correctly when creating store for the new created global init bool symbol.
Reviewed By: efriedma

Differential Revision:  https://reviews.llvm.org/D102711
2021-07-20 12:27:26 -04:00
Kazu Hirata 4a30a5c8d9 [SampleProfile] Remove ProfileIsValid (NFC)
The last use was removed on Jan 22, 2021 in commit
c9cd9a0066.
2021-07-20 08:07:04 -07:00
Johannes Doerfert d62bbbebbf [Attributor] Initialize effectively unused value to appease UBSAN 2021-07-20 09:18:51 -05:00
Johannes Doerfert c66cbee140 [Attributor] Use set vector instead of vector to prevent duplicates 2021-07-20 01:39:34 -05:00
Johannes Doerfert 5957cf9f11 [Attributor] Simplify to values in the genericValueTraversal
We already simplified to a constant, given the new interface we can also
simplify to a generic value.
2021-07-20 01:39:34 -05:00
Johannes Doerfert 5eba7846a5 [Attributor] Use checkForAllUses instead of custom use tracking
AAMemoryBehaviorFloating used a custom use tracking mechanism even
though checkForAllUses exists and is already more powerful. Further,
AAMemoryBehaviorFloating uses AANoCapture to guarantee that there are no
aliases and following the uses is sufficient. This is an OK assumption
if checkForAllUses is used but custom tracking is easily out of sync
with AANoCapture and problems follow.
2021-07-20 01:39:33 -05:00
Johannes Doerfert b96ea6b1fd [Attributor] Ensure to simplify operands in AAValueConstantRange
As with other patches before, the simplification callback interface
requires us to go through the Attributor::getAssumedSimplified API first
before we recurs.

It is unclear if the problem can be explicitly tested with our current
infrastructure.
2021-07-20 00:35:14 -05:00
Johannes Doerfert 5fbb51d8d5 [Attributor] Extend the AAValueSimplify compare simplification logic
We first simplify the operands of a compare and then reason on the
simplified versions, e.g., with AANonNull.

This does improve the simplification capabilities but also fixes a
potential problem that has not yet been observed by simplifying the
operands first.
2021-07-20 00:35:14 -05:00
Johannes Doerfert 9c00aabd60 [Attributor][NFCI] Expose `getAssumedUnderlyingObjects` API 2021-07-20 00:35:13 -05:00
Johannes Doerfert 5e169818fb [Attributor][NFC] Fix function name spelling 2021-07-20 00:35:13 -05:00
Johannes Doerfert 44a9ee170c [Attributor][FIX] Do not simplify byval arguments
A byval argument is a different value in the caller and callee, we
cannot propagate the information as part of AAValueSimplify. Users that
want to deal with byval arguments need to specifically perform the
argument -> call site step. We do not do this for now.
2021-07-19 22:48:51 -05:00
Johannes Doerfert c2281f1565 [Attributor] Introduce AAPointerInfo
This patch introduces AAPointerInfo which tracks the uses of a pointer
and places them in "bins" based on their offset from the base and access
size.

As with other AAs, any pointer can be tracked but it is up to the user
to make sense of the results. The user in this patch is AAValueSimplify
and AAPotentialValues which both utilize AAPointerInfo to determine the
value of a load. For now, this is restricted to loads of allocas and
internal globals. Through the use of AAPointerInfo and the "bins" we can
track struct members separately. The users also know that storing only
zeros (at unknown indices) will result in loading only 0 (from unknown
indices). Other than that, the users are flow and context insensitive
(for now).

To deal with the "bins" more easily, AAPointerInfo provides a
forallInterfearingAccesses that applies a callback on all accesses
that might interfere with a given load or store.

Differential Revision: https://reviews.llvm.org/D104432
2021-07-19 22:48:35 -05:00
Johannes Doerfert 28c78a9e12 [Attributor] Simplify loads
As a first step to simplify loads we only handle `null` and `undef`
underlying objects, as well as objects that have the load as a single user.
Loads of those values can be replaced by the initializer, if any.
Proper reasoning is introduced in a follow up patch

Differential Revision: https://reviews.llvm.org/D103862
2021-07-19 22:47:29 -05:00
Johannes Doerfert 97387fdf6d [OpenMP] Fix carefully track SPMDCompatibilityTracker
We did not properly use SPMDCompatibilityTracker in various places.
This patch makes sure we look at the validity properly and also fix
the state if we can.

Differential Revision: https://reviews.llvm.org/D106085
2021-07-19 22:47:03 -05:00
Shilei Tian d3454ee8d2 [AbstractAttributor] Fix two issues in folding __kmpc_is_spmd_exec_mode
This patch fixed two issues found when folding `__kmpc_is_spmd_exec_mode`:
1. When the reaching kernels are empty, it should not fold to generic mode.
2. When creating AA for the caller when updating information, the dependency
   should be required.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D106209
2021-07-17 13:13:44 -04:00
Wenlei He f9f3c34e0f [CSSPGO] Turn on iterative-BFI for CSSPGO
Iterative-BFI produces better count quality and performance when evaluated on internal benchmarks. Turning it on by default now for CSSPGO. We can consider turn it on by default for AutoFDO as well in the future.

Differential Revision: https://reviews.llvm.org/D106202
2021-07-16 17:35:49 -07:00
Sami Tolvanen 0ad1d9fdf2 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 8e3b5cb39e.

Reverting to investigate test failures.
2021-07-16 14:47:33 -07:00
Sami Tolvanen 8e3b5cb39e ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them. This version uses module inline assembly
to avoid issues with LowerTypeTestsModule.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-07-16 14:33:34 -07:00
Joseph Huber b910a109f8 [OpenMP][NFC] Update the comment header for optimizations. 2021-07-16 14:13:13 -04:00
Joseph Huber 2c31d5ebfb [OpenMP] Add IDs to OpenMP remarks
This patch adds unique idenfitiers to the existing OpenMP remarks. This makes
it easier to identify the corresponding documentation for each remark that will
be hosted in the OpenMP webpage.

Depends on D105898

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105939
2021-07-16 14:07:03 -04:00