Commit Graph

4786 Commits

Author SHA1 Message Date
Mircea Trofin 4b1c8070bb [NFC][ArgumentPromotion] Clear FAM cached results of erased function.
Not doing it here can lead to subtle bugs - the analysis results are
associated by the Function object's address. Nothing stops the memory
allocator from allocating new functions at the same address.
2021-03-18 09:17:32 -07:00
Wenlei He a5d30421a6 [CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.

For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:

 - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
 - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
 - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
 - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.

Differential Revision: https://reviews.llvm.org/D98590
2021-03-15 12:22:15 -07:00
Hongtao Yu beea06c106 [NFC][Inliner] Debugging support to print funtion size after each inlining.
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98439
2021-03-14 22:11:53 -07:00
Chenguang Wang 166620a4f0
[ArgPromotion] Copy additional metadata for loads.
Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D93927
2021-03-14 21:28:14 +00:00
Johannes Doerfert ff256c1376 [Attributor] Derive `willreturn` based on `mustprogress`
Since D86233 we have `mustprogress` which, in combination with
`readonly`, implies `willreturn`. The idea is that every side-effect
has to be modeled as a "write". Consequently, `readonly` means there
is no side-effect, and `mustprogress` guarantees that we cannot "loop"
forever without side-effect.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94125
2021-03-11 23:31:44 -06:00
Nikita Popov 2fe85dd289 [Attributor] Don't access pointer elem type in constructPointer (NFC)
Splitting this out as the change is non-trivial: The way this code
handled pointer types doesn't really make sense, as GEPs can only
apply an offset to the outermost pointer, but can't drill down
into interior pointer types (which would require dereferencing
memory).

Instead give special treatment to the first (pointer) index.
I've hardcoded it to zero as that's the only way the function is
used right now, but handling non-zero indexes would be
straightforward.

The original goal here was to have an element type for CreateGEP.
2021-03-11 21:36:40 +01:00
Wenlei He 051f2c144e [SamplePGO] Skip inlinee profile scaling for sample loader inlining
For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.

Differential Revision: https://reviews.llvm.org/D98187
2021-03-11 10:18:26 -08:00
kuterd d75c9e61a5 [Attributor] Attributor call site specific AAValueConstantRange
This patch makes uses of the context bridges introduced in D83299 to make
AAValueConstantRange call site specific.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83744
2021-03-11 01:19:44 +03:00
Wei Mi ee35784a90 [SampleFDO] Support enabling -funique-internal-linkage-name.
now -funique-internal-linkage-name flag is available, and we want to flip
it on by default since it is beneficial to have separate sample profiles
for different internal symbols with the same name. As a preparation, we
want to avoid regression caused by the flip.

When we flip -funique-internal-linkage-name on, the profile is collected
from binary built without -funique-internal-linkage-name so it has no uniq
suffix, but the IR in the optimized build contains the suffix. This kind of
mismatch may introduce transient regression.

To avoid such mismatch, we introduce a NameTable section flag indicating
whether there is any name in the profile containing uniq suffix. Compiler
will decide whether to keep uniq suffix during name canonicalization
depending on the NameTable section flag. The flag is only available for
extbinary format. For other formats, by default compiler will keep uniq
suffix so they will only experience transient regression when
-funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call
getCanonicalFnName. Those places are fixed.

Differential Revision: https://reviews.llvm.org/D96932
2021-03-09 21:41:40 -08:00
Hongtao Yu 4c3d759d00 [CSSPGO] Always use callsite samples as callsite probe counts.
For CS profile, the callsite count of previously inlined callees is populated with the entry count of the callees. Therefore when trying to get a weight for calliste probe after inlinining, the callsite count should always be used. The same fix has already been made for non-probe case.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98094
2021-03-08 22:52:36 -08:00
Fangrui Song fb2cf0dd60 [FunctionImport] Delete unneeded setLive. NFC
ValueInfo's in Worklist are guaranteed to be live.
2021-03-06 14:09:54 -08:00
Ta-Wei Tu 8a003861a3 [NPM] Add -enable-loopinterchange option to NPM
We have the `enable-loopinterchange` option in legacy pass manager but not in NPM.
Add `LoopInterchange` pass to the optimization pipeline (at the same position as before)
when `enable-loopinterchange` is turned on.

Reviewed By: aeubanks, fhahn

Differential Revision: https://reviews.llvm.org/D98116
2021-03-07 02:39:28 +08:00
William S. Moses d163e75c81 [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-06 12:57:32 -05:00
Michael Kruse b119120673 [clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
 * Only the worksharing-loop directive.
 * Recognizes only the nowait clause.
 * No loop nests with more than one loop.
 * Untested with templates, exceptions.
 * Semantic checking left to the existing infrastructure.

This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
 * The distance function: How many loop iterations there will be before entering the loop nest.
 * The loop variable function: Conversion from a logical iteration number to the loop variable.

These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.

The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.

For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.

Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D94973
2021-03-04 22:52:59 -06:00
Wei Mi 2357d29335 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
William S. Moses 2b896e39bf Revert "[Attributor] Enable heap-to-stack of any size"
This reverts commit 51bd42ef9b.
2021-03-04 17:24:56 -05:00
William S. Moses 51bd42ef9b [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-04 17:17:23 -05:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Johannes Doerfert 5b70c12f3e [Attributor] Make DepClass a required argument
We often used a sub-optimal dependence class in the past because we
didn't see the argument. Let's make it explicit so we remember to think
about it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert e592dad82e [Attributor] Fold "TrackDependence" into the DepClassTy enum
We don't need a bool and an enum to express the three options we
currently have. This makes the interface nicer and much easier to
use optional dependencies. Also avoids mistakes where the bool is
false and enum ignored.
2021-03-04 00:35:52 -06:00
Johannes Doerfert c8c93fdf0a [Attributor] Avoid work for GEPs and wait till the users are visited 2021-03-04 00:35:52 -06:00
Johannes Doerfert f3f88287c5 [Attributor] Use known alignment as lower bound to avoid work
If we know already more than available from a use, we don't need to
invest time on it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert c14213e030 [Attributor][NFC] Move some trivial checks up 2021-03-04 00:35:52 -06:00
Johannes Doerfert 09c3eebf5f [Attributor] Use sensible initialization in AANoCaptureCallSiteReturned 2021-03-04 00:35:51 -06:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
William S. Moses b077d82b00 [Attributor] Conditinoally delete fns
Allow the attributor to delete functions only if requested

Differential Revision: https://reviews.llvm.org/D97238
2021-02-27 20:37:42 -05:00
Fangrui Song 4d63892acb [SanitizerCoverage] Drop !associated on metadata sections
In SanitizerCoverage, the metadata sections (`__sancov_guards`,
`__sancov_cntrs`, `__sancov_bools`) are referenced by functions.  After
inlining, such a `__sancov_*` section can be referenced by more than one
functions, but its sh_link still refers to the original function's section.
(Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to
section violates the invariant.)

If the original function's section is discarded (e.g. LTO internalization +
`ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error.

This above reasoning means that `!associated` is not appropriate to be called by
an inlinable function. Non-interposable functions are inline candidates, so we
have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections
but is expected to parallel a metadata section, so we have to make sure the two
sections are retained or discarded at the same time. A section group does the
trick.  (Note: we have a module ctor, so `getUniqueModuleId` guarantees to
return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return
non-null.)

For interposable functions, we could keep using `!associated`, but
LTO can change the linkage to `internal` and allow such functions to be inlinable,
so we have to drop `!associated`, too. To not interfere with section
group resolution, we need to use the `noduplicates` variant (section group flag 0).
(This allows us to get rid of the ModuleID parameter.)
In -fno-pie and -fpie code (mostly dso_local), instrumented interposable
functions have WeakAny/LinkOnceAny linkages, which are rare. So the
section group header overload should be low.

This patch does not change the object file output for COFF (where `!associated` is ignored).

Reviewed By: morehouse, rnk, vitalybuka

Differential Revision: https://reviews.llvm.org/D97430
2021-02-25 11:59:23 -08:00
Rong Xu 6103b6ad69 [SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class
This patch makes SampleProfileLoaderBaseImpl a template class so it
can be used in CodeGen transformation.

Noticeable changes:
 * use one template parameter and use IRTraits to get other used
   types an type specific functions.
 * remove the temporary "inline" keywords in previous refactor
   patch.
 * change the template function findEquivalencesFor to a regular
   function. This function has a single caller with type of
   PostDominatorTree. It's simpler to use the type directly
   because MachinePostDominatorTree is not a derived type of
   template DominatorTreeBase.

Differential Revision: https://reviews.llvm.org/D96981
2021-02-25 08:26:17 -08:00
Fangrui Song ef312951fd collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128
And delete the SmallPtrSetImpl overload.

While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97257
2021-02-23 16:09:06 -08:00
Fangrui Song ed02f52d28 Fix unstable SmallPtrSet iteration issues due to collectUsedGlobalVariables
While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Depends on D97128 (which added a new SmallVecImpl overload for collectUsedGlobalVariables).

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97139
2021-02-23 16:09:05 -08:00
Fangrui Song 3adb89bb9f [ThinLTO] Make cloneUsedGlobalVariables deterministic
Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements
is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements.

While here, decrease inline element counts from 8 to 4. The number of
`llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full
LTO/hybrid LTO, the number may be large, so we need to be careful.

According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is
good for a large project with WholeProgramDevirt, when available_externally
vtables are placed in the llvm.compiler.used set.

Differential Revision: https://reviews.llvm.org/D97128
2021-02-23 16:09:05 -08:00
Teresa Johnson 0a5949dcfa [WPD] Fix handling of pure virtual base class
The fix in 3c4c205060 caused an assert in
the case of a pure virtual base class. In that case, the vTableFuncs
list on the summary will be empty, so we were hitting the new assert
that the linkage type was not available_externally.

In the case of pure virtual, we do not want to assert, and additionally
need to set VS so that we don't treat it conservatively and quit the
analysis of the type id early.

This exposed a pre-existing issue where we were not updating the vcall
visibility on pure virtual functions when whole program visibility was
specified. We were skipping updating the visibility on any global vars
that didn't have any vTableFuncs, which meant all pure virtual were not
updated, and the later analysis would block any devirtualization of
calls that had a type id used on those pure virtual vtables (see the
handling in the other code modified in this patch). Simply remove that
check. It will mean that we may update the vcall visibility on global
vars that aren't vtables, but that setting is ignored for any global
vars that didn't have type metadata anyway.

Added a new test case that asserted without removing the assert, and
that requires the other fixes in this patch (updateVCallVisibilityInIndex
and not skipping all vtables without virtual funcs) to get a successful
devirtualization with index-only WPD. I added cases to test hybrid and
regular LTO for completeness, although those already worked without the
fixes here.

With this final fix, a clang multistage bootstrap with WPD builds and
runs all tests successfully.

Differential Revision: https://reviews.llvm.org/D97126
2021-02-23 16:07:09 -08:00
Kristina Bessonova e97aab8d15 [ThinLTO] Fix import of multiply defined global variables
Currently, if there is a module that contains a strong definition of
a global variable and a module that has both a weak definition for
the same global and a reference to it, it may result in an undefined symbol error
while linking with ThinLTO.

It happens because:
* the strong definition become internal because it is read-only and can be imported;
* the weak definition gets replaced by a declaration because it's non-prevailing;
* the strong definition failed to be imported because the destination module
  already contains another definition of the global yet this def is non-prevailing.

The patch adds a check to computeImportForReferencedGlobals() that allows
considering a global variable for being imported even if the module contains
a definition of it in the case this def has an interposable linkage type.

Note that currently the check is based only on the linkage type
(and this seems to be enough at the moment), but it might be worth to account
the information whether the def is prevailing or not.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D95943
2021-02-21 18:34:12 +02:00
Teresa Johnson fde55a9c9b [LTO] Fix cloning of llvm*.used when splitting module
Refines the fix in 3c4c205060 to only
put globals whose defs were cloned into the split regular LTO module
on the cloned llvm*.used globals. This avoids an issue where one of the
attached values was a local that was promoted in the original module
after the module was cloned. We only need to have the values defined in
the new module on those globals.

Fixes PR49251.

Differential Revision: https://reviews.llvm.org/D97013
2021-02-20 09:46:43 -08:00
Wei Mi 4ffad1fb48 [SampleFDO] Add PromotedInsns to prevent repeated ICP.
In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
We use 0 count value profile to memorize which target has been promoted
and prevent repeated ICP for the same target, so we delete PromotedInsns.
However, I found the implementation in the patch has some shortcomings
to be fixed otherwise there will still be repeated ICP. So I add
PromotedInsns back temorarily. Will remove it after I get a thorough fix.
2021-02-19 10:01:49 -08:00
Nikita Popov 71a8e4e7d6 [MemCopyOpt] Enable MemorySSA by default
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.

Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.

Differential Revision: https://reviews.llvm.org/D94376
2021-02-19 18:06:25 +01:00
Nikita Popov 370addb996 [IR] Move willReturn() to Instruction
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).

I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)

Differential Revision: https://reviews.llvm.org/D96992
2021-02-19 11:56:01 +01:00
Wei Mi 5fb65c02ca [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Hongtao Yu e87b1b1d4e [CSSPGO] Use callsite sample counts to annotate indirect call sites.
With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling.

I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D96990
2021-02-18 14:52:34 -08:00
Teresa Johnson d55d46f43b [WPD] Add an optional checking mode for debugging devirtualization
This adds an internal option -wholeprogramdevirt-check which if enabled
will guard each devirtualization with a runtime check against the
expected target, and an invocation of a debug trap if the check fails.
This is useful for debugging WPD failures involving undefined behavior
(e.g. casting to another class type not in the inheritance chain).

Differential Revision: https://reviews.llvm.org/D95969
2021-02-17 16:46:15 -08:00
Rong Xu 7397905ab0 [SampleFDO] Third Try: Refactor SampleProfile.cpp
Apply the patch for the third time after fixing buildbot failures.

Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
(4) Add inline keyword to avoid duplicated symbols -- they will
be removed later when the class is changed to a template.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-17 15:31:50 -08:00
Teresa Johnson 3c4c205060 [WPD][lld] Test handling of vtable definition from shared libraries
Adds a lld test for a case that the handling added for dynamically
exported symbols in 1487747e99 already
fixes. Because isExportDynamic returns true when the symbol is
SharedKind with default visibility, it will treat as dynamically
exported and block devirtualization when the definition of a vtable
comes from a shared library. This is desireable as it is dangerous to
devirtualize in that case, since there could be hidden overrides in the
shared library. Typically that happens when the shared library header
contains available externally definitions, which applications can
override. An example is std::error_category, which is overridden in LLVM
and causing failures after a self build with WPD enabled, because
libstdc++ contains hidden overrides of the virtual base class methods.

The regular LTO case in the new test already worked, but there are
2 fixes in this patch needed for the index-only case and the hybrid
LTO case. For the index-only case, WPD should not simply ignore
available externally vtables. A follow on fix will be made to clang to
emit type metadata for those vtables, which the new test is modeling.
For the hybrid case, we need to ensure when the module is split that any
llvm.*used globals are cloned to the regular LTO split module so
available externally vtable definitions are not prematurely deleted.

Another follow on fix will add the equivalent gold test, which requires
a small fix to the plugin to treat symbols in dynamic libraries the same
way lld already is.

Differential Revision: https://reviews.llvm.org/D96721
2021-02-17 12:49:24 -08:00
Vedant Kumar c28622fbf3 Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp"
Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455"

This reverts commit c73cbf218a.

Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)"

This reverts commit a23e6b321c.

Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp"

This reverts commit 6fd5ccff72.

Still seeing link failures when building llc (or other tools), due to
the new SampleProfileLoaderBaseImpl.h containing definitions that get
duplicated across multiple TU's.

```
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const*)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const*) const' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const*)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const*, llvm::BasicBlock const*>)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
```
2021-02-17 10:22:24 -08:00
Rong Xu 6fd5ccff72 [SampleFDO] Reapply: Refactor SampleProfile.cpp
Reapply patch after fixing buildbot failure.
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-16 16:43:21 -08:00
Mehdi Amini c761fe77bd Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp"
This reverts commit 310b35304c.
The build is broken with -DBUILD_SHARED_LIBS=ON :

lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const*, llvm::ProfileSummaryInfo*, bool)':
SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const'
SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const'
...
2021-02-16 22:11:42 +00:00
Rong Xu 310b35304c [SampleFDO][NFC] Refactor SampleProfile.cpp
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-16 11:18:21 -08:00
Duncan P. N. Exon Smith 22a52dfddc TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto)
This commit fixes how metadata is handled in CloneModule to be sound,
and improves how it's handled in CloneFunctionInto (although the latter
is still awkward when called within a module).

Ruiling Song pointed out in PR48841 that CloneModule was changed to
unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in
fa35c1f80f for clarity). This flag papered
over a crash caused by other various changes made to CloneFunctionInto
over the past few years that made it unsound to use cloning between
different modules.

(This commit partially addresses PR48841, fixing the repro from
preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode
became unsound in df763188c9 and this
commit does not address that regression.)

RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use,
avoiding unnecessary clones of all referenced metadata when linking
between modules (with IRMover, the source module is discarded after
linking). It never makes sense to use when you're not discarding the
source. This commit drops its incorrect use in CloneModule.

Sadly, the right thing to do with metadata when cloning a function is
complicated, and this patch doesn't totally fix it.

The first problem is that there are two different types of referenceable
metadata and it's not obvious what to with one of them when remapping.

- `!0 = !{!1}` is metadata's version of a constant. Programatically it's
  called "uniqued" (probably a better term would be "constant") because,
  like `ConstantArray`, it's stored in uniquing tables. Once it's
  constructed, it's illegal to change its arguments.
- `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal
  to change the operands after construction.

What should be done with distinct metadata when cloning functions within
the same module?

- Should new, cloned nodes be created?
- Should all references point to the same, old nodes?

The answer depends on whether that metadata is effectively owned by a
function.

And that's the second problem. Referenceable metadata's ownership model
is not clear or explicit. Technically, it's all stored on an
LLVMContext. However, any metadata that is `distinct`, that transitively
references a `distinct` node, or that transitively references a
GlobalValue is specific to a Module and is effectively owned by it. More
specifically, some metadata is effectively owned by a specific Function
within a module.

Effectively function-local metadata was introduced somewhere around
c10d0e5ccd, which made it illegal for two
functions to share a DISubprogram attachment.

When cloning a function within a module, you need to clone the
function-local debug info and suppress cloning of global debug info (the
status quo suppresses cloning some global debug info but not all). When
cloning a function to a new/different module, you need to clone all of
the debug info.

Here's what I think we should do (eventually? soon? not this patch
though):
- Distinguish explicitly (somehow) between pure constant metadata owned
  by the LLVMContext, global metadata owned by the Module, and local
  metadata owned by a GlobalValue (such as a function).
- Update CloneFunctionInto to trigger cloning of all "local" metadata
  (only), perhaps by adding a bit to RemapFlag. Alternatively, split
  out a separate function CloneFunctionMetadataInto to prime the
  metadata map that callers are updated to call ahead of time as
  appropriate.

Here's the somewhat more isolated fix in this patch:
- Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to
  an enum called `CloneFunctionChangeType` that is one of
  LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule.
- The code maintaining the "functions uniquely own subprograms"
  invariant is now only active in the first two cases, where a function
  is being cloned within a single module. That's necessary because this
  code inhibits cloning of (some) "global" metadata that's effectively
  owned by the module.
- The code maintaining the "all compile units must be explicitly
  referenced by !llvm.dbg.cu" invariant is now only active in the
  DifferentModule case, where a function is being cloned into a new
  module in isolation.
- CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create
  uses LocalChangeOnly, since fa635d730f
  only set `ModuleLevelChanges` to trigger cloning of local metadata.
- CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs
  and special handling of !llvm.dbg.cu.
- Fixed some outdated header docs and left a couple of FIXMEs.

Differential Revision: https://reviews.llvm.org/D96531
2021-02-15 11:56:00 -08:00
Kazu Hirata 910e2d1e57 [llvm] Use llvm::is_contained (NFC) 2021-02-14 08:36:20 -08:00
Teresa Johnson a80232bd5f [LTT] Address post-review comments (NFC)
Implement some post-review cleanup suggestions for D96083.
2021-02-13 15:52:59 -08:00
Hongtao Yu de40f6d623 [CSSPGO] Process functions in a top-down order on a dynamic call graph.
Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining.

The profile top-down order for SCC is also extended to support non-CS profiles.

Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95988
2021-02-11 12:36:59 -08:00
Benjamin Kramer 8fb4a4f7bb [SampleFDO] Silence -Wnon-virtual-dtor warning
There's no polymorphic deletion happening here.
2021-02-10 23:37:15 +01:00
Rong Xu db0d7d0ba9 [SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen
Break SampleProfileLoader into to a base and a derived class.
Base class (SampleProfileLoaderBaseImpl) includes the common
code for IR and MachineIR (CodeGen) sample loader.
It will be templatelized in the later patch.

Inline and Probe related code will remain in the derived class of
SampleProfileLoader and stays in SampleProfile.cpp.

We need to refactor some functions:
(1) getInstWeight() to enable the code sharing -- put the core into
getInstWeightImpl().
(2) emitAnnotation() and propagateWeights() to carve out the code
specific to SampleProfileLoader.
(3) make getInstWeight() and findFunctionSamples() virtual and override
in SampleProfileLoader as they need to access the fields in the derived
class.

Differential Revision: https://reviews.llvm.org/D95832
2021-02-10 13:29:15 -08:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Johannes Doerfert 81429abd83 [Attributor][FIX] Do not create UB by introducing a `noundef undef`
This was reported as PR49104. The reproducer uses varargs but the issue
is the same, we know an argument is dead but can't change the signature
for some reason. The PR49104 situation was: We are in an CG-SCC
traversal and we remove all the uses of an argument and proof it thereby
dead. However, if we do not remove the argument, via signature rewrite,
we need to ensure that the `undef` we introduce at the call site doesn't
clash with a `noundef` attribute.
2021-02-09 13:02:38 -06:00
Kazu Hirata 302313a264 [Transforms] Use range-based for loops (NFC) 2021-02-08 22:33:53 -08:00
Teresa Johnson 3a27933ec2 [LTT] Don't attempt to lower type tests used only by assumes
Type tests used only by assumes were original for devirtualization, but
are meant to be kept through the first invocation of LTT so that they
can be used for additional optimization. In the regular LTO case where
the IR is analyzed we may find a resolution for the type test and end up
rewriting the associated vtable global, which can have implications on
section splitting. Simply ignore these type tests.

Fixes PR48245.

Differential Revision: https://reviews.llvm.org/D96083
2021-02-06 09:02:10 -08:00
Simon Pilgrim f7d07dbb29 IROutliner.cpp - fix Wdocumentation warning. NFCI.
Remove duplicate param
2021-02-05 11:38:09 +00:00
Simon Pilgrim 476b912e7c SampleProfile.cpp - fix Wdocumentation warning. NFCI.
Remove duplicate param
2021-02-05 11:31:17 +00:00
Simon Pilgrim 89edda7084 IROutliner.cpp - fix Wdocumentation warnings. NFCI. 2021-02-05 11:21:00 +00:00
Kazu Hirata be37475897 [Transforms/IPO] Use range-based for loops (NFC) 2021-02-03 20:41:20 -08:00
Rong Xu b8f13db5b7 [SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker
This patch detaches SampleProfileLoader from class
SampleCoverageTracker. We plan to move SampleProfileLoader
to a template class. This would remain SampleCoverageTracker
as a class.
Also make callsiteIsHot() as a file static function.

Differential Revision: https://reviews.llvm.org/D95823
2021-02-03 11:38:04 -08:00
Hongtao Yu 3d89b3cbec [CSSPGO] Introducing distribution factor for pseudo probe.
Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.

This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.

A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.

Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D93264
2021-02-02 11:55:01 -08:00
Wenlei He 1645f465be [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline
Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.

This is resubmit of D95024, with build break and overtighten assertion fixed.

Test Plan:
2021-02-02 07:55:08 -08:00
Adrian Kuegel 48ca6da9d2 Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline"
This reverts commit 9a03058d63.
2021-02-02 11:51:04 +01:00
Adrian Kuegel 3a65ec4bf9 Revert "Fix build break from D95024"
This reverts commit 09cd849fde.
2021-02-02 11:51:04 +01:00
Wenlei He 09cd849fde Fix build break from D95024 2021-02-02 01:01:12 -08:00
Wenlei He 9a03058d63 [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline
Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.

Test Plan:

Differential Revision: https://reviews.llvm.org/D95024
2021-02-02 00:34:06 -08:00
Wenlei He 6bae5973c4 [CSSPGO] Call site prioritized inlining for sample PGO
This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`).

Motivation

With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO.
 - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat.
 - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately.
 - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO.

New FDO Inliner

Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed.
 - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655.
 - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check.
 - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites.
 - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path.

Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO).

Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO.

Tunings and knobs

I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO.

Results

Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it).

Differential Revision: https://reviews.llvm.org/D94001
2021-02-01 23:46:34 -08:00
Hongtao Yu 224fee8219 [CSSPGO] Tweaking inlining with pseudo probes.
Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites.

Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95791
2021-02-01 13:56:40 -08:00
Kazu Hirata 3d1200b9f6 [llvm] Drop unnecessary const from return types (NFC)
Identified with const-return-type.
2021-01-31 10:23:43 -08:00
Kazu Hirata 8ed1636184 [llvm] Use isa instead of dyn_cast (NFC) 2021-01-29 23:23:37 -08:00
Hongtao Yu 7e99bddfea [CSSPGO] Support of CS profiles in extended binary format.
This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95547
2021-01-27 21:29:46 -08:00
Teresa Johnson 1487747e99 [LTO] Prevent devirtualization for symbols dynamically exported
Identify dynamically exported symbols (--export-dynamic[-symbol=],
--dynamic-list=, or definitions needed to preempt shared objects) and
prevent their LTO visibility from being upgraded.
This helps avoid use of whole program devirtualization when there may
be overrides in dynamic libraries.

Differential Revision: https://reviews.llvm.org/D91583
2021-01-27 15:54:13 -08:00
Fangrui Song 54fb3ca96e [ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags
Imported functions and variable get the visibility from the module supplying the
definition.  However, non-imported definitions do not get the visibility from
(ELF) the most constraining visibility among all modules (Mach-O) the visibility
of the prevailing definition.

This patch

* adds visibility bits to GlobalValueSummary::GVFlags
* computes the result visibility and propagates it to all definitions

Protected/hidden can imply dso_local which can enable some optimizations (this
is stronger than GVFlags::DSOLocal because the implied dso_local can be
leveraged for ELF -shared while default visibility dso_local has to be cleared
for ELF -shared).

Note: we don't have summaries for declarations, so for ELF if a declaration has
the most constraining visibility, the result visibility may not be that one.

Differential Revision: https://reviews.llvm.org/D92900
2021-01-27 10:43:51 -08:00
Bjorn Pettersson a9bd3d37bd [NewPM] Add ExtraVectorizerPasses support
As it looks like NewPM generally is using SimpleLoopUnswitch
instead of LoopUnswitch, this patch also use SimpleLoopUnswitch
in the ExtraVectorizerPasses sequence (compared with LegacyPM
which use the LoopUnswitch pass).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D95457
2021-01-26 22:59:10 +01:00
Florian Hahn 35b3989a30
[Passes] Run peeling as part of simple/full loop unrolling.
Loop peeling removes conditions from loop bodies that become invariant
after a small number of iterations. When triggered, this leads to fewer
compares and possibly PHIs in loop bodies, enabling further
optimizations. The current cost-model of loop peeling should be quite
conservative/safe, i.e. only peel if a condition in the loop becomes
known after peeling.

For example, see PR47671, where loop peeling enables vectorization by
removing a PHI the vectorizer does not understand. Granted, the
loop-vectorizer could also be taught about constant PHIs, but loop
peeling is likely to enable other optimizations as well.

This has an impact on quite a few benchmarks from
MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example

    Same hash: 186 (filtered out)
    Remaining: 51
    Metric: loop-vectorize.LoopsVectorized

    Program                                        base   patch  diff
     test-suite...ve-susan/automotive-susan.test     8.00   9.00 12.5%
     test-suite...nal/skidmarks10/skidmarks.test    35.00  31.00 -11.4%
     test-suite...lications/sqlite3/sqlite3.test    41.00  43.00  4.9%
     test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test    25.00  26.00  4.0%
     test-suite...006/450.soplex/450.soplex.test    88.00  89.00  1.1%
     test-suite...TimberWolfMC/timberwolfmc.test   120.00 119.00 -0.8%
     test-suite.../CINT2006/403.gcc/403.gcc.test   215.00 216.00  0.5%
     test-suite...006/447.dealII/447.dealII.test   957.00 958.00  0.1%
     test-suite...ternal/HMMER/hmmcalibrate.test    75.00  75.00  0.0%

    Same hash: 186 (filtered out)
    Remaining: 51
    Metric: loop-vectorize.LoopsAnalyzed

    Program                                        base    patch   diff
     test-suite...ks/Prolangs-C/agrep/agrep.test   440.00  434.00  -1.4%
     test-suite...nal/skidmarks10/skidmarks.test   312.00  308.00  -1.3%
     test-suite...marks/7zip/7zip-benchmark.test   6399.00 6323.00 -1.2%
     test-suite...lications/minisat/minisat.test   134.00  135.00   0.7%
     test-suite...rks/FreeBench/pifft/pifft.test   295.00  297.00   0.7%
     test-suite...TimberWolfMC/timberwolfmc.test   1879.00 1869.00 -0.5%
     test-suite...pplications/treecc/treecc.test   689.00  691.00   0.3%
     test-suite...T2000/300.twolf/300.twolf.test   1593.00 1597.00  0.3%
     test-suite.../Benchmarks/Bullet/bullet.test   1394.00 1392.00 -0.1%
     test-suite...ications/JM/ldecod/ldecod.test   1431.00 1429.00 -0.1%
     test-suite...6/464.h264ref/464.h264ref.test   2229.00 2230.00  0.0%
     test-suite...lications/sqlite3/sqlite3.test   2590.00 2589.00 -0.0%
     test-suite...ications/JM/lencod/lencod.test   2732.00 2733.00  0.0%
     test-suite...006/453.povray/453.povray.test   3395.00 3394.00 -0.0%

Note the -11% regression in number of loops vectorized for skidmarks. I
suspect this corresponds to the fact that those loops are gone now (see
the reduction in number of loops analyzed by LV).

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88471
2021-01-26 13:52:30 +00:00
modimo ce7f9cdb50 [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks
This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only.

The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining.

In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places.

Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay.

Testing:
ninja check-llvm
newly added test correctly replays CGSCC inlining decisions

Reviewed By: mtrofin, wenlei

Differential Revision: https://reviews.llvm.org/D94334
2021-01-25 15:38:57 -08:00
Richard Smith 925ae8c790 Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV"
This reverts commit 53176c1680, which
introduceed a layering violation. LLVM's IR library can't include
headers from Analysis.
2021-01-25 13:53:38 -08:00
Akira Hatanaka 53176c1680 [ObjC][ARC] Annotate calls with attributes instead of emitting retainRV
or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end annotates calls with attribute "clang.arc.rv"="retain"
  or "clang.arc.rv"="claim", which indicates the call is implicitly
  followed by a marker instruction and a retainRV/claimRV call that
  consumes the call result. This is currently done only when the target
  is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the
  annotated calls in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the annotated
  calls. It doesn't remove the attribute on the call since the backend
  needs it to emit the marker instruction. The retainRV/claimRV calls
  are emitted late in the pipeline to prevent optimization passes from
  transforming the IR in a way that makes it harder for the ARC
  middle-end passes to figure out the def-use relationship between the
  call and the retainRV/claimRV calls (which is the cause of PR31925).

- The function inliner removes the autoreleaseRV call in the callee that
  returns the result if nothing in the callee prevents it from being
  paired up with the calls annotated with "clang.arc.rv"="retain/claim"
  in the caller. If the call is annotated with "claim", a release call
  is inserted since autoreleaseRV+claimRV is equivalent to a release. If
  it cannot find an autoreleaseRV call, it tries to transfer the
  attributes to a function call in the callee. This is important since
  ARC optimizer can remove the autoreleaseRV call returning the callee
  result, which makes it impossible to pair it up with the retainRV or
  claimRV call in the caller. If that fails, it simply emits a retain
  call in the IR if the call is annotated with "retain" and does nothing
  if it's annotated with "claim".

- This patch teaches dead argument elimination pass not to change the
  return type of a function if any of the calls to the function are
  annotated with attribute "clang.arc.rv". This is necessary since the
  pass can incorrectly determine nothing in the IR uses the function
  return, which can happen since the front-end no longer explicitly
  emits retainRV/claimRV calls in the IR, and change its return type to
  'void'.

Future work:

- Use the attribute on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the attributes.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-01-25 11:57:08 -08:00
Wei Mi c9cd9a0066 [SampleFDO] Report error when reading a bad/incompatible profile instead of
turning off SampleFDO silently.

Currently sample loader pass turns off SampleFDO optimization silently when
it sees error in reading the profile. This behavior will defeat the tests
which could have caught those bad/incompatible profile problems. This patch
change the behavior to report error.

Differential Revision: https://reviews.llvm.org/D95269
2021-01-25 10:28:23 -08:00
Kazu Hirata 1238378f18 [llvm] Use pop_back_val (NFC) 2021-01-23 10:56:33 -08:00
Xun Li bd3ca6666d [Inlining] Delete redundant optnone/alwaysinline check
The same check is done in InlineCost: 8b0bd54d0e/llvm/lib/Analysis/InlineCost.cpp (L2537-L2552)
Also, doing a check on the callee here is confusing, because anything that deals with callee should be done in the inner loop where we proecss all calls from the same caller.

Differential Revision: https://reviews.llvm.org/D95186
2021-01-21 18:38:10 -08:00
Nikita Popov 65fd034b95 [FunctionAttrs] Infer willreturn for functions without loops
If a function doesn't contain loops and does not call non-willreturn
functions, then it is willreturn. Loops are detected by checking
for backedges in the function. We don't attempt to handle finite
loops at this point.

Differential Revision: https://reviews.llvm.org/D94633
2021-01-21 20:29:33 +01:00
Kazu Hirata e53472de68 [Transforms] Use llvm::append_range (NFC) 2021-01-20 21:35:54 -08:00
Mircea Trofin ccec2cf1d9 Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor"
This reverts commit d97f776be5.

The original problem was due to build failures in shared lib builds. D95079
moved ImportedFunctionsInliningStatistics under Analysis, unblocking
this.
2021-01-20 13:33:43 -08:00
Mircea Trofin 95ce32c787 [NFC] Move ImportedFunctionsInliningStatistics to Analysis
This is related to D94982. We want to call these APIs from the Analysis
component, so we can't leave them under Transforms.

Differential Revision: https://reviews.llvm.org/D95079
2021-01-20 13:18:03 -08:00
Mircea Trofin d97f776be5 Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor"
This reverts commit e8aec763a5.
2021-01-20 11:19:34 -08:00
Mircea Trofin e8aec763a5 [NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor
When using 2 InlinePass instances in the same CGSCC - one for other
mandatory inlinings, the other for the heuristic-driven ones - the order
in which the ImportedFunctionStats would be output-ed would depend on
the destruction order of the inline passes, which is not deterministic.

This patch moves the ImportedFunctionStats responsibility to the
InlineAdvisor to address this problem.

Differential Revision: https://reviews.llvm.org/D94982
2021-01-20 11:07:36 -08:00
David Sherwood 255a507716 [NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp
In places where we call a TTI.getXXCost() function I have changed
the code to use InstructionCost instead of unsigned. This is in
preparation for later on when we will change the TTI interfaces
to return InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D94427
2021-01-20 08:33:59 +00:00
Juneyoung Lee 4479c0c2c0 Allow nonnull/align attribute to accept poison
Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison.
To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null.
This makes many transformations like below legal:

```
%p = gep inbounds %x, 1 ; % p is non-null pointer or poison
call void @f(%p)        ; instcombine converts this to call void @f(nonnull %p)
```

Instead, this semantics makes propagation of `nonnull` to caller illegal.
The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument.
Having `noundef` attribute there re-allows this.

```
define void @f(i8* %p) {       ; functionattr cannot mark %p nonnull here anymore
  call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p.
  ret void
}
```

Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90529
2021-01-20 11:31:23 +09:00
Wei Mi 21b1ad0340 [SampleFDO] Add the support to split the function profiles with context into
separate sections.

For ThinLTO, all the function profiles without context has been annotated to
outline functions if possible in prelink phase. In postlink phase, profile
annotation in postlink phase is only meaningful for function profile with
context. If the profile is large, it is better to split the profile into two
parts, one with context and one without, so the profile reading in postlink
phase only has to read the part with context. To have the profile splitting,
we extend the ExtBinary format to support different section arrangement. It
will be flexible to add other section layout in the future without the need
to create new class inheriting from ExtBinary class.

Differential Revision: https://reviews.llvm.org/D94435
2021-01-19 15:16:19 -08:00
Florian Hahn 83daa49758
[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands.
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.

The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.

This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.

I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.

From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.

There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).

There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.

I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.

Reviewed By: sanwou01

Differential Revision: https://reviews.llvm.org/D94232
2021-01-19 10:15:29 +00:00
Kazu Hirata 23b0ab2acb [llvm] Use the default value of drop_begin (NFC) 2021-01-18 10:16:36 -08:00
Kazu Hirata 19aacdb715 [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-16 09:40:53 -08:00
Mircea Trofin e8049dc3c8 [NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.

Differential Revision: https://reviews.llvm.org/D94825
2021-01-15 17:59:38 -08:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Kazu Hirata 2efcbe24a7 [llvm] Use llvm::drop_begin (NFC) 2021-01-14 20:30:33 -08:00
Kazu Hirata 125ea20d55 [llvm] Use llvm::stable_sort (NFC) 2021-01-13 19:14:43 -08:00
Kazu Hirata 5c1c39e8d8 [llvm] Use *Set::contains (NFC) 2021-01-13 19:14:41 -08:00
Wei Mi 86341247c4 [NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h
to Pass.h.

In some compiler passes like SampleProfileLoaderPass, we want to know which
LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum
class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and
it also cannot represent phase in full LTO. The patch extends it to include
full LTO phases and move it from PassBuilder.h to Pass.h, then it is much
easier for PassBuilder to communiate with each pass about current LTO phase.

Differential Revision: https://reviews.llvm.org/D94613
2021-01-13 15:55:40 -08:00
Andrew Litteken 05b1a15f70 [IROutliner] Adapting to hoisted bitcasts in CodeExtractor
In commit 700d2417d8 the CodeExtractor
was updated so that bitcasts that have lifetime markers that beginning
outside of the region are deduplicated outside the region and are not
used as an output.  This caused a discrepancy in the IROutliner, where
in these cases there were arguments added to the aggregate function
that were not needed causing assertion errors.

The IROutliner queries the CodeExtractor twice to determine the inputs
and outputs, before and after `findAllocas` is called with the same
ValueSet for the outputs causing the duplication. This has been fixed
with a dummy ValueSet for the first call.

However, the additional bitcasts prevent us from using the same
similarity relationships that were previously defined by the
IR Similarity Analysis Pass. In these cases, we check whether the
initial version of the region being analyzed for outlining is still the
same as it was previously.  If it is not, i.e. because of the additional
bitcast instructions from the CodeExtractor, we discard the region.

Reviewers: yroux

Differential Revision: https://reviews.llvm.org/D94303
2021-01-13 11:10:37 -06:00
Kazu Hirata 12fc9ca3a4 [llvm] Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-01-12 21:43:46 -08:00
modimo 2a49b7c64a [Inliner] Change inline remark format and update ReplayInlineAdvisor to use it
This change modifies the source location formatting from:
LineNumber.Discriminator
to:
LineNumber:ColumnNumber.Discriminator

The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff.

The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy.

Testing:
ninja check-llvm

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D94333
2021-01-12 13:43:48 -08:00
Florian Hahn 6cd44b204c
[FunctionAttrs] Derive willreturn for fns with readonly` & `mustprogress`.
Similar to D94125, derive `willreturn` for functions that are `readonly` and
`mustprogress` in FunctionAttrs.

To quote the reasoning from D94125:

    Since D86233 we have `mustprogress` which, in combination with
    `readonly`, implies `willreturn`. The idea is that every side-effect
    has to be modeled as a "write". Consequently, `readonly` means there
    is no side-effect, and `mustprogress` guarantees that we cannot "loop"
    forever without side-effect.

Reviewed By: jdoerfert, nikic

Differential Revision: https://reviews.llvm.org/D94502
2021-01-12 20:02:34 +00:00
Giorgis Georgakoudis 9751705512 [OpenMPOpt][WIP] Expand parallel region merging
The existing implementation of parallel region merging applies only to
consecutive parallel regions that have speculatable sequential
instructions in-between. This patch lifts this limitation to expand
merging with any sequential instructions in-between, except calls to
unmergable OpenMP runtime functions. In-between sequential instructions
in the merged region are sequentialized in a "master" region and any
output values are broadcasted to the following parallel regions and the
sequential region continuation of the merged region.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90909
2021-01-11 08:06:23 -08:00
Florian Hahn c701f85c45
[STLExtras] Use return type from operator* of the wrapped iter.
Currently make_early_inc_range cannot be used with iterators with
operator* implementations that do not return a reference.

Most notably in the LLVM codebase, this means the User iterator ranges
cannot be used with make_early_inc_range, which slightly simplifies
iterating over ranges while elements are removed.

Instead of directly using BaseT::reference as return type of operator*,
this patch uses decltype to get the actual return type of the operator*
implementation in WrappedIteratorT.

This patch also updates a few places to use make use of
make_early_inc_range.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D93992
2021-01-10 14:41:13 +00:00
Kazu Hirata 33bf1cad75 [llvm] Use *Set::contains (NFC) 2021-01-07 20:29:34 -08:00
Arthur Eubanks 7fea561eb1 [CGSCC][Coroutine][NewPM] Properly support function splitting/outlining
Previously when trying to support CoroSplit's function splitting, we
added in a hack that simply added the new function's node into the
original function's SCC (https://reviews.llvm.org/D87798). This is
incorrect since it might be in its own SCC.

Now, more similar to the previous design, we have callers explicitly
notify the LazyCallGraph that a function has been split out from another
one.

In order to properly support CoroSplit, there are two ways functions can
be split out.

One is the normal expected "outlining" of one function into a new one.
The new function may only contain references to other functions that the
original did. The original function must reference the new function. The
new function may reference the original function, which can result in
the new function being in the same SCC as the original function. The
weird case is when the original function indirectly references the new
function, but the new function directly calls the original function,
resulting in the new SCC being a parent of the original function's SCC.
This form of function splitting works with CoroSplit's Switch ABI.

The second way of splitting is more specific to CoroSplit. CoroSplit's
Retcon and Async ABIs split the original function into multiple
functions that all reference each other and are referenced by the
original function. In order to keep the LazyCallGraph in a valid state,
all new functions must be processed together, else some nodes won't be
populated. To keep things simple, this only supports the case where all
new edges are ref edges, and every new function references every other
new function. There can be a reference back from any new function to the
original function, putting all functions in the same RefSCC.

This also adds asserts that all nodes in a (Ref)SCC can reach all other
nodes to prevent future incorrect hacks.

The original hacks in https://reviews.llvm.org/D87798 are no longer
necessary since all new functions should have been registered before
calling updateCGAndAnalysisManagerForPass.

This fixes all coroutine tests when opt's -enable-new-pm is true by
default. This also fixes PR48190, which was likely due to the previous
hack breaking SCC invariants.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D93828
2021-01-06 11:19:15 -08:00
Arthur Eubanks 8cf1cc578d [FuncAttrs] Infer noreturn
A function is noreturn if all blocks terminating with a ReturnInst
contain a call to a noreturn function. Skip looking at naked functions
since there may be asm that returns.

This can be further refined in the future by checking unreachable blocks
and taking into account recursion. It looks like the attributor pass
does this, but that is not yet enabled by default.

This seems to help with code size under the new PM since PruneEH does
not run under the new PM, missing opportunities to mark some functions
noreturn, which in turn doesn't allow simplifycfg to clean up dead code.
https://bugs.llvm.org/show_bug.cgi?id=46858.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D93946
2021-01-05 13:25:42 -08:00
Florian Hahn c367258b5c
[SimplifyCFG] Enabled hoisting late in LTO pipeline.
bb7d3af113 disabled hoisting in SimplifyCFG by default, but enabled it
late in the pipeline. But it appears as if the LTO pipelines got missed.

This patch adjusts the LTO pipelines to also enable hoisting in the
later stages.

Unfortunately there's no easy way to add a test for the change I think.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D93684
2021-01-04 16:26:58 +00:00
Florian Hahn e0905553b4
[ArgPromotion] Delay dead GEP removal until doPromotion.
Currently ArgPromotion removes dead GEPs as part of the legality check
in isSafeToPromoteArgument. If no promotion happens, this means the pass
claims no modifications happened, even though GEPs were removed.

This patch fixes the issue by delaying removal of dead GEPs until
doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and
the code in doPromotion dealing with GEPs is updated to account for
dead GEPs. Once we committed to promotion, it should be safe to
remove dead GEPs.

Alternatively isSafeToPromoteArgument could return an additional boolean
to indicate whether it made changes, but this is quite cumbersome and
there should be no real benefit of weeding out some dead GEPs here if we
do not perform promotion.

I added a test for the case where dead GEPs need to be removed when
promotion happens in 578c5a0c6e.

Fixes PR47477.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93991
2021-01-04 09:51:20 +00:00
Andrew Litteken 5c951623bc [IROutliner] Refactoring errors in the cost model from past patches.
There were was the reuse of a variable that should not have been
occurred due to confusion during committing patches.
2021-01-04 00:11:18 -06:00
Andrew Litteken 05e6ac4eb8 [IROutliner] Removing a duplicate addition, causing overestimates in IROutliner.
There was an extra addition left over from a previous commit for the
cost model, this removes it.
2021-01-03 23:36:28 -06:00
Kazu Hirata ba82c0b315 [llvm] Call *(Set|Map)::erase directly (NFC)
We can erase an item in a set or map without checking its membership
first.
2021-01-03 09:57:47 -08:00
Kazu Hirata 530c5af6a4 [Transforms] Construct SmallVector with iterator ranges (NFC) 2021-01-02 09:24:17 -08:00
Andrew Litteken 1a9eb19af9 [IROutliner] Adding consistent function attribute merging
When combining extracted functions, they may have different function
attributes. We want to make sure that we do not make any assumptions,
or lose any information. This attempts to make sure that we consolidate
function attributes to their most general case.

Tests:
llvm/test/Transforms/IROutliner/outlining-compatible-and-attribute-transfer.ll
llvm/test/Transforms/IROutliner/outlining-compatible-or-attribute-transfer.ll

Reviewers: jdoefert, paquette

Differential Revision: https://reviews.llvm.org/D87301
2020-12-31 12:30:23 -06:00
Fangrui Song a90b42b0fe [ThinLTO] Default -enable-import-metadata to false
The default value is dependent on `-DLLVM_ENABLE_ASSERTIONS={off,on}` (D22167), which is
error-prone. The few tests checking `!thinlto_src_module` can specify -enable-import-metadata explicitly.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D93959
2020-12-31 10:04:21 -08:00
Yuanfang Chen 277ebe46c6 Fix `LLVM_ENABLE_MODULES=On` build
for commit 480936e741.
2020-12-30 10:54:04 -08:00
Andrew Litteken fe431103b6 [IROutliner] Adding option to enable outlining from linkonceodr functions
There are functions that the linker is able to automatically
deduplicate, we do not outline from these functions by default. This
allows for outlining from those functions.

Tests:
llvm/test/Transforms/IROutliner/outlining-odr.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87309
2020-12-30 12:08:04 -06:00
Andrew Litteken 30feb93036 [IROutliner] Adding support for swift errors in the IROutliner
Since some values can be swift errors, we need to make sure that we
correctly propagate the parameter attributes.

Tests found at:
llvm/test/Transforms/IROutliner/outlining-swift-error.ll

Reviewers: jroelofs, paquette

Recommit of: 71867ed5e6

Differential Revision: https://reviews.llvm.org/D87742
2020-12-30 01:17:27 -06:00
Andrew Litteken eeb99c2ac2 Revert "[IROutliner] Adding support for swift errors"
This reverts commit 71867ed5e6.

Reverting for lack of commit messages.
2020-12-30 01:17:27 -06:00
Andrew Litteken 71867ed5e6 [IROutliner] Adding support for swift errors 2020-12-30 01:14:55 -06:00
Andrew Litteken df4a931c63 [IROutliner] Adding OptRemarks to the IROutliner Pass
This prints OptRemarks at each location where a decision is made to not
outline, or to outline a specific section for the IROutliner pass.

Test:
llvm/test/Transforms/IROutliner/opt-remarks.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87300
2020-12-29 15:52:08 -06:00
Andrew Litteken 6df161a2fb [IROutliner] Adding a cost model, and debug option to turn the model off.
This adds a cost model that takes into account the total number of
machine instructions to be removed from each region, the number of
instructions added by adding a new function with a set of instructions,
and the instructions added by handling arguments.

Tests not adding flags:

llvm/test/Transforms/IROutliner/outlining-cost-model.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87299
2020-12-29 12:43:41 -06:00
Andrew Litteken 1e23802507 [IROutliner] Merging identical output blocks for extracted functions.
Many of the sets of output stores will be the same. When a block is
created, we check if there is an output block with the same set of store
instructions. If there is, we map the output block of the region back
to the block, so that the extra argument controlling the switch
statement can be set to the appropriate block value.

Tests:
- llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87298
2020-12-28 21:01:48 -06:00
Andrew Litteken e6ae623314 [IROutliner] Adding support for consolidating functions with different output arguments.
Certain regions can have values introduced inside the region that are
used outside of the region. These may not be the same for each similar
region, so we must create one over arching set of arguments for the
consolidated function.

We do this by iterating over the outputs for each extracted function,
and creating as many different arguments to encapsulate the different
outputs sets. For each output set, we create a different block with the
necessary stores from the value to the output register. There is then
one switch statement, controlled by an argument to the function, to
differentiate which block to use.

Changed Tests for consistency:
llvm/test/Transforms/IROutliner/extraction.ll
llvm/test/Transforms/IROutliner/illegal-assumes.ll
llvm/test/Transforms/IROutliner/illegal-memcpy.ll
llvm/test/Transforms/IROutliner/illegal-memmove.ll
llvm/test/Transforms/IROutliner/illegal-vaarg.ll

Tests to test new functionality:
llvm/test/Transforms/IROutliner/outlining-different-output-blocks.ll
llvm/test/Transforms/IROutliner/outlining-remapped-outputs.ll
llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D87296
2020-12-28 16:17:07 -06:00
Kazu Hirata 8299fb8f25 [Transforms] Use llvm::append_range (NFC) 2020-12-27 09:57:29 -08:00
Craig Topper 897990e614 [IROutliner] Use isa instead of dyn_cast where the casted value isn't used. NFC
Fixes unused variable warnings.
2020-12-23 11:40:15 -08:00
Andrew Litteken b1191c8438 [IROutliner] Adding support for elevating constants that are not the same in each region to arguments
When there are constants that have the same structural location, but not
the same value, between different regions, we cannot simply outline the
region. Instead, we find the constants that are not the same in each
location, and promote them to arguments to be passed into the respective
functions. At each call site, we pass the constant in as an argument
regardless of type.

Added/Edited Tests:

llvm/test/Transforms/IROutliner/outlining-constants-vs-registers.ll
llvm/test/Transforms/IROutliner/outlining-different-constants.ll
llvm/test/Transforms/IROutliner/outlining-different-globals.ll

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D87294
2020-12-23 13:03:05 -06:00
Michael Forster d56982b6f5 Remove unused variables.
Differential Revision: https://reviews.llvm.org/D93635
2020-12-21 16:24:43 +01:00
Andrew Litteken 7c6f28a438 [IROutliner] Deduplicating functions that only require inputs.
Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.

We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D86978
2020-12-19 17:34:34 -06:00
Andrew Litteken b8a2b6af37 Revert "[IROutliner] Deduplicating functions that only require inputs."
Missing reviewers and differential revision in commit message.

This reverts commit 5cdc4f57e5.
2020-12-19 17:33:49 -06:00
Andrew Litteken 5cdc4f57e5 [IROutliner] Deduplicating functions that only require inputs.
Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.

We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
2020-12-19 17:26:29 -06:00
Andrew Litteken c52bcf3a9b [IRSim][IROutliner] Limit to extracting regions that only require
inputs.

Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch limits the
extracted regions to those that only require inputs, and do not have any
 other special cases.

We test that we do not outline the wrong constants in:
test/Transforms/IROutliner/outliner-different-constants.ll
test/Transforms/IROutliner/outliner-different-globals.ll
test/Transforms/IROutliner/outliner-constant-vs-registers.ll

We test that correctly outline in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: paquette, plofti

Differential Revision: https://reviews.llvm.org/D86977
2020-12-19 13:33:54 -06:00
Kazu Hirata 56edfcada9 [Target, Transforms] Use contains (NFC) 2020-12-19 10:43:19 -08:00
Aditya Kumar 1ab4db0f84 [HotColdSplit] Reflect full cost of parameters in split penalty
Make the penalty for splitting a region more accurately reflect the cost
of materializing all of the inputs/outputs to/from the region.

This almost entirely eliminates code growth within functions which
undergo splitting in key internal frameworks, and reduces the size of
those frameworks between 2.6% to 3%.

rdar://49167240

Patch by: Vedant Kumar(@vsk)
Reviewers: hiraditya,rjf,t.p.northover
Reviewed By: hiraditya,rjf

Differential Revision: https://reviews.llvm.org/D59715
2020-12-18 17:06:17 -08:00
Andrew Litteken cea807602a [IRSim][IROutliner] Adding InstVisitor to disallow certain operations.
This adds a custom InstVisitor to return false on instructions that
should not be allowed to be outlined.  These match the illegal
instructions in the IRInstructionMapper with exception of the addition
of the llvm.assume intrinsic.

Tests all the tests marked: illegal-*-.ll with a test for each kind of
instruction that has been marked as illegal.

Reviewers: jroelofs, paquette

Differential Revisions: https://reviews.llvm.org/D86976
2020-12-17 19:33:57 -06:00
Johannes Doerfert 994bb6eb7d [OpenMP][NFC] Provide a new remark and documentation
If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D93439
2020-12-17 14:38:26 -06:00
Andrew Litteken dae34463e3 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Recommit of bf899e8913 fixing memory
leaks.

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-12-17 11:27:26 -06:00
dfukalov 9ed8e0caab [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Hongtao Yu ac068e014b [CSSPGO] Consume pseudo-probe-based AutoFDO profile
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.

Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).

**Adjustment to profile format **

A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.

Differential Revision: https://reviews.llvm.org/D92347
2020-12-16 15:57:18 -08:00
Johannes Doerfert dcaec81211 [OpenMP] Use assumptions during ICV tracking
The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us
to ignore calls that would otherwise prevent ICV tracking.

Once we track more ICVs we might need to distinguish the ones that could
be impacted even with `no_openmp_routines`.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D92050
2020-12-15 16:51:34 -06:00
Johannes Doerfert d08d490a4c [OpenMPOpt][NFC] Clang format 2020-12-15 16:51:34 -06:00
Florian Hahn 7ea3932ab1
[AnnotationRemarks] Also generate annotation remarks when using -O0.
The AnnotationRemarks pass is already run at the end of the module
pipeline. This patch also adds it before bailing out for -O0, so remarks
are also generated with -O0.
2020-12-15 14:46:52 +00:00
Kazu Hirata 215c1b1935 [Transforms] Use is_contained (NFC) 2020-12-12 09:37:49 -08:00
Fangrui Song b5ad32ef5c Migrate deprecated DebugLoc::get to DILocation::get
This migrates all LLVM (except Kaleidoscope and
CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get.

The CodeGen/StackProtector.cpp usage may have a nullptr Scope
and can trigger an assertion failure, so I don't migrate it.

Reviewed By: #debug-info, dblaikie

Differential Revision: https://reviews.llvm.org/D93087
2020-12-11 12:45:22 -08:00
David Sherwood 9b76160e53 [Support] Introduce a new InstructionCost class
This is the first in a series of patches that attempts to migrate
existing cost instructions to return a new InstructionCost class
in place of a simple integer. This new class is intended to be
as light-weight and simple as possible, with a full range of
arithmetic and comparison operators that largely mirror the same
sets of operations on basic types, such as integers. The main
advantage to using an InstructionCost is that it can encode a
particular cost state in addition to a value. The initial
implementation only has two states - Normal and Invalid - but these
could be expanded over time if necessary. An invalid state can
be used to represent an unknown cost or an instruction that is
prohibitively expensive.

This patch adds the new class and changes the getInstructionCost
interface to return the new class. Other cost functions, such as
getUserCost, etc., will be migrated in future patches as I believe
this to be less disruptive. One benefit of this new class is that
it provides a way to unify many of the magic costs in the codebase
where the cost is set to a deliberately high number to prevent
optimisations taking place, e.g. vectorization. It also provides
a route to represent the extremely high, and unknown, cost of
scalarization of scalable vectors, which is not currently supported.

Differential Revision: https://reviews.llvm.org/D91174
2020-12-11 08:12:54 +00:00
Hongtao Yu 705a4c149d [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 17:29:28 -08:00
Mitch Phillips 7ead5f5aa3 Revert "[CSSPGO] Pseudo probe encoding and emission."
This reverts commit b035513c06.

Reason: Broke the ASan buildbots:
  http://lab.llvm.org:8011/#/builders/5/builds/2269
2020-12-10 15:53:39 -08:00
Zequan Wu b5216b2950 [PGO] Enable preinline and cleanup when optimize for size
Differential Revision: https://reviews.llvm.org/D91673
2020-12-10 12:29:17 -08:00
Hongtao Yu b035513c06 [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 09:50:08 -08:00
Xun Li 31e60b9133 [coroutine] should disable inline before calling coro split
This is a rework of D85812, which didn't land.
When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue
test plan: check-llvm, check-clang

In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue.
I believe it's not worth doing just to be able to use one constant string in one place.
Today, there are already 3 possible attribute values for "coroutine.presplit": c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)
If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill.

Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly.

Differential Revision: https://reviews.llvm.org/D92706
2020-12-08 08:53:08 -08:00
Florian Hahn 32825e8636
[ConstraintElimination] Tweak placement in pipeline.
This patch adds the ConstraintElimination pass to the LTO pipeline and
also runs it after SCCP in the function simplification pipeline.

This increases the number of cases we can elimination. Pending further
tuning.
2020-12-07 19:08:40 +00:00
Simon Pilgrim 50dd1dba6e [IPO] Fix operator precedence warning. NFCI.
Check the entire assertion condition before && with the message.
2020-12-07 18:23:54 +00:00
Wenlei He 6b989a1710 [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining
This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon.

Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO.

**Interface**

There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile.

- Query base profile (`getBaseSamplesFor`)
Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function.

- Query context profile (`getContextSamplesFor`)
Context profile is a function's CFG profile for a given calling context. We can query context profile by context string.

- Track inlined context profile (`markContextSamplesInlined`)
When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function.

- Track not-inlined context profile (`promoteMergeContextSamplesTree`)
When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination.

**Implementation**

Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state.

On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes.

**Integration**

`SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`.

Differential Revision: https://reviews.llvm.org/D90125
2020-12-06 11:49:18 -08:00
dfukalov 2ce38b3f03 [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
modimo c1ba991e8d [NFC] Fix typo 2020-12-02 22:23:57 -08:00
Hongtao Yu 24d4291ca7 [CSSPGO] Pseudo probes for function calls.
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against  the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756
2020-12-02 13:45:20 -08:00
Alex Zinenko 240dd92432 [OpenMPIRBuilder] forward arguments as pointers to outlined function
OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.

Reviewed By: jdoerfert, llitchev

Differential Revision: https://reviews.llvm.org/D92189
2020-12-02 14:59:41 +01:00
Mircea Trofin 5fe10263ab [llvm][inliner] Reuse the inliner pass to implement 'always inliner'
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Differential Revision: https://reviews.llvm.org/D91567
2020-11-30 12:03:39 -08:00
Hongtao Yu 64fa8cce22 [CSSPGO] Pseudo probe instrumentation pass
This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86499
2020-11-30 10:16:54 -08:00
Andrew Litteken a8a43b6338 Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner."
Reverting commit due to address sanitizer errors.

> Extracting the similar regions is the first step in the IROutliner.
> 
> Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
> sort them by how many instructions will be removed.  Each
> IRSimilarityCandidate is used to define an OutlinableRegion.  Each
> region is ordered by their occurrence in the Module and the regions that
> are not compatible with previously outlined regions are discarded.
> 
> Each region is then extracted with the CodeExtractor into its own
> function.
> 
> We test that correctly extract in:
> test/Transforms/IROutliner/extraction.ll
> test/Transforms/IROutliner/address-taken.ll
> test/Transforms/IROutliner/outlining-same-globals.ll
> test/Transforms/IROutliner/outlining-same-constants.ll
> test/Transforms/IROutliner/outlining-different-structure.ll
> 
> Reviewers: paquette, jroelofs, yroux
> 
> Differential Revision: https://reviews.llvm.org/D86975

This reverts commit bf899e8913.
2020-11-27 19:55:57 -06:00
Andrew Litteken bf899e8913 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-11-27 19:08:29 -06:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Roman Lebedev a8d74517dc
[PassManager] Run Induction Variable Simplification pass *after* Recognize loop idioms pass, not before
Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.

LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.

Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.

While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91800
2020-11-25 19:20:07 +03:00
Teresa Johnson 6e4c1cf293 [ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends
Previously this option could be used to skip devirtualizations of the
given functions in regular LTO and in the ThinLTO indexing step. This
change allows them to be skipped in the backend as well, which is useful
when debugging WPD in a distributed ThinLTO backend.

Differential Revision: https://reviews.llvm.org/D91812
2020-11-24 09:35:07 -08:00
Arthur Eubanks 932e4f8815 [FunctionAttrs][NPM] Fix handling of convergent
The legacy pass didn't properly detect indirect calls.

We can still remove the convergent attribute when there are indirect
calls. The LangRef says:

> When it appears on a call/invoke, the convergent attribute indicates
that we should treat the call as though we’re calling a convergent
function. This is particularly useful on indirect calls; without this we
may treat such calls as though the target is non-convergent.

So don't skip handling of convergent when there are unknown calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89826
2020-11-23 21:09:41 -08:00
Arthur Eubanks 3c811ce4f3 [NPM] Share pass building options with legacy PM
We should share options when possible.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91741
2020-11-23 13:04:05 -08:00
Arthur Eubanks b77436047a [PGO] Make -disable-preinline work with NPM
Fixes cspgo_profile_summary.ll under NPM.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D91826
2020-11-19 22:58:55 -08:00
Joseph Huber da8bec47ab [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.

Reviewers: jdoerfert grokos

Differential Revision: https://reviews.llvm.org/D87946
2020-11-19 12:01:53 -05:00
Joseph Huber 97e55cfef5 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D89802
2020-11-18 15:28:39 -05:00
Nick Desaulniers f4c6080ab8 Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch"
This reverts commit b7926ce6d7.

Going with a simpler approach.
2020-11-17 17:27:14 -08:00
Florian Hahn 8dbe44cb29 Add pass to add !annotate metadata from @llvm.global.annotations.
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated  using
__attribute__((annotate("_name"))) on functions in Clang.

This has been discussed on llvm-dev as part of
    RFC: Combining Annotation Metadata and Remarks
    http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D91195
2020-11-16 14:57:11 +00:00
Roman Lebedev 6861d938e5
Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.

I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.

This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
2020-11-14 13:12:38 +03:00
Guozhi Wei a20220d25b [AlwaysInliner] Call mergeAttributesForInlining after inlining
Like inlineCallIfPossible and InlinerPass, after inlining mergeAttributesForInlining
should be called to merge callee's attributes to caller. But it is not called in
AlwaysInliner, causes caller's attributes inconsistent with inlined code.

Attached test case demonstrates that attribute "min-legal-vector-width"="512" is
not merged into caller without this patch, and it causes failure in SelectionDAG
when lowering the inlined AVX512 intrinsic.

Differential Revision: https://reviews.llvm.org/D91446
2020-11-13 12:01:35 -08:00
serge-sans-paille 95537f4508 llvmbuildectomy - compatibility with ocaml bindings
Use exact component name in add_ocaml_library.
Make expand_topologically compatible with new architecture.
Fix quoting in is_llvm_target_library.
Fix LLVMipo component name.
Write release note.
2020-11-13 14:35:52 +01:00
Florian Hahn 8bb6347939
Add !annotation metadata and remarks pass.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.

It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.

The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.

To motivate this, consider these specific questions we would like to get answered:

* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?

Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

Reviewed By: thegameg, jdoerfert

Differential Revision: https://reviews.llvm.org/D91188
2020-11-13 13:24:10 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Arthur Eubanks b9406121a0 [NFC] Removed unused variable
Obsolete as of https://reviews.llvm.org/D91046.
2020-11-12 22:24:57 -08:00
Arthur Eubanks d9cbceb041 [CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass
Previously the inliner did a bit of a hack by adding ref edges for all
new edges introduced by performing an inline before calling
updateCGAndAnalysisManagerForPass(). This was because
updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
edges.

This adds handling of non-trivial call edges to
updateCGAndAnalysisManagerForPass().  The inliner called
updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
newly introduced edges (so updateCGAndAnalysisManagerForPass() would
only have to handle promotion), but now it needs to call
updateCGAndAnalysisManagerForCGSCCPass() since
updateCGAndAnalysisManagerForPass() is now handling the new call edges
and function passes cannot add new edges.

We follow the previous path of adding trivial ref edges then letting promotion
handle changing the ref edges to call edges and the CGSCC updates. So
this still does not allow adding call edges that result in an addition
of a non-trivial ref edge.

This is in preparation for better detecting devirtualization. Previously
since the inliner itself would add ref edges,
updateCGAndAnalysisManagerForPass() would think that promotion and thus
devirtualization had happened after any sort of inlining.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91046
2020-11-11 13:43:49 -08:00
Sjoerd Meijer 2ef47910d5 [LoopFlatten] Run it earlier, just before IndVarSimplify
This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.

Differential Revision: https://reviews.llvm.org/D90402
2020-11-10 20:22:41 +00:00
Sanne Wouda dd03881bd5 Add loop distribution to the LTO pipeline
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896
2020-11-10 12:04:32 +00:00
Michael Kruse e5dba2d7e5 [OMPIRBuilder] Start 'Create' methods with lower case. NFC.
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.

This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.

I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.

Reviewed By: mehdi_amini, fghanim

Differential Revision: https://reviews.llvm.org/D91109
2020-11-09 19:35:11 -06:00
Sanne Wouda 2ec26d3a23 Revert "Add loop distribution to the LTO pipeline"
This reverts commit 6e80318eec.
2020-11-03 19:29:27 +00:00
Sanne Wouda 6e80318eec Add loop distribution to the LTO pipeline
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896
2020-11-03 18:54:24 +00:00
Ettore Tiotto 4274cbba1c [PartialInliner]: Handle code regions in a switch stmt cases
This patch enhances computeOutliningColdRegionsInfo() to allow it to
consider regions containing a single basic block and a single
predecessor as candidate for partial inlining.

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89911
2020-11-02 14:32:45 -05:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Johannes Doerfert d39f574dcc [Attributor][FIX] Properly promote arguments pointers to arrays
When we promote pointer arguments we did compute a wrong offset and use
a wrong type for the array case.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
2020-10-29 00:45:32 -05:00
Benjamin Kramer 207cf71fa9 Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API"
This reverts commit d981c7b758 and
a87d7b3d44. Test fails under msan.
2020-10-28 13:58:14 +01:00
Johannes Doerfert d13daa4018 [Attributor] Finalize the CGUpdater after each SCC
This matches the new PM model.
2020-10-27 22:07:56 -05:00
Johannes Doerfert 50d34958df [Attributor][NFC] Introduce a debug counter for `AA::manifest`
This will simplify debugging and tracking down problems.
2020-10-27 22:07:56 -05:00
Johannes Doerfert 1d57b7f503 [Attributor][NFC] Print the right value in debug output 2020-10-27 22:07:55 -05:00
Johannes Doerfert 1c2531c9e1 [Attributor][FIX] Delete all unreachable static functions
Before we used to only mark unreachable static functions as dead if all
uses were known dead. Now we optimistically assume uses to be dead until
proven otherwise.
2020-10-27 22:07:55 -05:00
Johannes Doerfert bfe05b1aff [Attributor][FIX] Do not attach range metadata to the wrong Instruction
If we are looking at a call site argument it might be a load or call
which is in a different context than the call site argument. We cannot
simply use the call site argument range for the call or load.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
2020-10-27 22:07:55 -05:00
Johannes Doerfert 724fcce109 [Attributor][NFC] Clang-format 2020-10-27 22:07:55 -05:00
Johannes Doerfert d504f7b91a [Attributor][NFC] Hoist call out of a lambda
The call is not free, unsure if  this is needed but it does not make it
worse either.
2020-10-27 22:07:54 -05:00
Johannes Doerfert 30e5a1f0be [Attributor][FIX] Properly check uses in the call not uses of the call
In the AANoAlias logic we determine if a pointer may have been captured
before a call. We need to look at other uses in the call not uses of the
call.

The new code is not perfect as it does not allow trivial cases where the
call has multiple arguments but it is at least not unsound and a TODO
was added.
2020-10-27 22:07:54 -05:00
Johannes Doerfert cb813ab66a [Attributor][NFC] Improve time trace output 2020-10-27 22:07:54 -05:00
Joseph Huber a87d7b3d44 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.

Reviewers: jdoervert

Differential Revision: https://reviews.llvm.org/D89802
2020-10-27 16:09:19 -04:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Arthur Eubanks 42f76e193b Reland [AlwaysInliner] Pass callee AAResults to InlineFunction()
Test copied from noalias-calls.ll with small changes.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89609
2020-10-26 20:40:46 -07:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Arthur Eubanks 4af5ba1726 Revert "[AlwaysInliner] Pass callee AAResults to InlineFunction()"
This reverts commit 504fbec7a6.

Test failure.
2020-10-26 20:23:38 -07:00
Arthur Eubanks 504fbec7a6 [AlwaysInliner] Pass callee AAResults to InlineFunction()
Test copied from noalias-calls.ll with small changes.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89609
2020-10-26 20:10:09 -07:00
Hongtao Yu a16cbdd676 [AutoFDO] Remove a broken assert in merging inlinee samples
Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.

To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like

```
A()
{
  B();  // with nested profile B1 and C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2
}
```

For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into

```
A()
{
  C();  // with nested profile C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2.
}
```

Here is what happens next:

	1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite.
	2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count.
	3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on.
        4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.

It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.

The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D90056
2020-10-23 17:42:21 -07:00
Arthur Eubanks ba22c403b2 [Inliner][NPM] Properly pass callee AAResults
Fixes noalias-calls.ll under NPM.

Differential Revision: https://reviews.llvm.org/D89592
2020-10-23 15:37:18 -07:00
Nick Desaulniers b7926ce6d7 [IR] add fn attr for no_stack_protector; prevent inlining on mismatch
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining.  Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956
2020-10-23 11:55:39 -07:00
Caroline Concatto 2415636475 [SVE]Clarify TypeSize comparisons in llvm/lib/Transforms
Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.

This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.

Differential Revision: https://reviews.llvm.org/D89703
2020-10-23 09:15:17 +01:00
Arthur Eubanks 0291e2c933 [Inliner] Run always-inliner in inliner-wrapper
An alwaysinline function may not get inlined in inliner-wrapper due to
the inlining order.

Previously for the following, the inliner would first inline @a() into @b(),

```
define void @a() {
entry:
  call void @b()
  ret void
}

define void @b() alwaysinline {
entry:
  br label %for.cond

for.cond:
  call void @a()
  br label %for.cond
}
```

making @b() recursive and unable to be inlined into @a(), ending at

```
define void @a() {
entry:
  call void @b()
  ret void
}

define void @b() alwaysinline {
entry:
  br label %for.cond

for.cond:
  call void @b()
  br label %for.cond
}
```

Running always-inliner first makes sure that we respect alwaysinline in more cases.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46945.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D86988
2020-10-22 19:16:25 -07:00
Ettore Tiotto e6521ce064 [NFC][PartialInliner]: Clean up code
Make member function const where possible, use LLVM_DEBUG to print debug traces
rather than a custom option, pass by reference to avoid null checking, ...

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89895
2020-10-22 14:40:15 -04:00
Arthur Eubanks 8d9466a385 [BlockExtract][NewPM] Port -extract-blocks to NPM
Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89015
2020-10-21 12:51:11 -07:00
Florian Hahn 88241ffb56 [Passes] Move ADCE before DSE & LICM.
The adjustment seems to have very little impact on optimizations.
The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is
in consumer-typeset and the size there actually decreases by -0.1%, with
not significant changes in the stats.

On its own, it is mildly positive in terms of compile-time, most likely
due to LICM & DSE having to process slightly less instructions. It
should also be unlikely that DSE/LICM make much new code dead.

http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions

With DSE & MemorySSA, it gives some nice compile-time improvements, due
to the fact that DSE can re-use the PDT from ADCE, if it does not make
any changes:

http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D87322
2020-10-21 10:30:56 +01:00
Hans Wennborg 0628bea513 Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting"
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.

> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

This reverts commit 273c299d5d.
2020-10-19 12:31:14 +02:00
Vedant Kumar 273c299d5d [PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
2020-10-15 23:13:33 +00:00
Juneyoung Lee 9b3c2a72e4 [ValueTracking] Use assume's noundef operand bundle
This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219
2020-10-14 20:16:33 +09:00
sstefan1 ce16be253c [Attributor][NFC] Make `createShallowWrapper()` available outside of Attributor
D85703 will need to create shallow wrappers in order to track the spmd icv. We need to make it available.

Differential Revision: https://reviews.llvm.org/D89342
2020-10-14 10:08:59 +02:00
Arthur Eubanks 518ec05a10 [LoopExtract][NewPM] Port -loop-extract to NPM
-loop-extract-single is just -loop-extract on one loop.

-loop-extract depended on -break-crit-edges and -loop-simplify in the
legacy PM, but the NPM doesn't allow specifying pass dependencies like
that, so manually add those passes to the RUN lines where necessary.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89016
2020-10-13 22:55:42 -07:00
Giorgis Georgakoudis 3a6bfcf2f9 [OpenMPOpt] Merge parallel regions
There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way.  This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.

This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83635
2020-10-09 09:59:04 -07:00
Johannes Doerfert 7993d61177 [Attributor] Use smarter way to determine alignment of GEPs
Use same logic existing in other places to deal with base case GEPs.

Add the original Attributor talk example.
2020-10-06 19:31:08 -05:00
Johannes Doerfert c4cfe7a435 [Attributor] Ignore read accesses to constant memory
The old function attribute deduction pass ignores reads of constant
memory and we need to copy this behavior to replace the pass completely.
First step are constant globals. TBAA can also describe constant
accesses and there are other possibilities. We might want to consider
asking the alias analyses that are available but for now this is simpler
and cheaper.
2020-10-06 19:31:07 -05:00
Johannes Doerfert 3f540c05df [Attributor] Give up early on AANoReturn::initialize
If the function is not assumed `noreturn` we should not wait for an
update to mark the call site as "may-return".

This has two kinds of consequences:
  - We have less iterations in many tests.
  - We have less deductions based on "known information" (since we ask
    earlier, point 1, and therefore assumed information is not "known"
    yet).
The latter is an artifact that we might want to tackle properly at some
point but which is not easily fixable right now.
2020-10-06 19:31:07 -05:00
Johannes Doerfert 4a7a988442 [Attributor][FIX] Move assertion to make it not trivially fail
The idea of this assertion was to check the simplified value before we
assign it, not after, which caused this to trivially fail all the time.
2020-10-06 09:32:18 -05:00
Johannes Doerfert 04f6951397 [Attributor][FIX] Dead return values are not `noundef`
When we assume a return value is dead we might still visit return
instructions via `Attributor::checkForAllReturnedValuesAndReturnInsts(..)`.
When we do so the "returned value" is potentially simplified to `undef`
as it is the assumed "returned value". This is a problem if there was a
preexisting `noundef` attribute that will only be removed as we manifest
the `undef` return value. We should not use this combination to derive
`unreachable` though. Two test cases fixed.
2020-10-06 09:32:18 -05:00
Johannes Doerfert 957094e31b [Attributor][NFC] Ignore benign uses in AAMemoryBehaviorFloating
In AAMemoryBehaviorFloating we used to track benign uses in a SetVector.
With this change we look through benign uses eagerly to reduce the
number of elements (=Uses) we look at during an update.

The test does actually not fail prior to this commit but I already wrote
it so I kept it.
2020-10-06 09:32:18 -05:00
Vedant Kumar 9afb1c566e Revert "Outline non returning functions unless a longjmp"
This reverts commit 20797989ea.

This patch (https://reviews.llvm.org/D69257) cannot complete a stage2
build due to the change:

```
CI->getCalledFunction()->getName().contains("longjmp")
```

There are several concrete issues here:

  - The callee may not be a function, so `getCalledFunction` can assert.
  - The called value may not have a name, so `getName` can assert.
  - There's no distinction made between "my_longjmp_test_helper" and the
    actual longjmp libcall.

At a higher level, there's a serious layering problem here. The
splitting pass makes policy decisions in a general way (e.g. based on
attributes or profile data). Special-casing certain names breaks the
layering. It subverts the work of library maintainers (who may now need
to opt-out of unexpected optimization behavior for any affected
functions) and can lead to inconsistent optimization behavior (as not
all llvm passes special-case ".*longjmp.*" in the same way).

The patch may need significant revision to address these issues.

But the immediate issue is that this crashes while compiling llvm's unit
tests in a stage2 build (due to the `getName` problem).
2020-10-05 14:10:25 -07:00
Roman Lebedev 03bd5198b6
[OldPM] Pass manager: run SROA after (simple) loop unrolling
I have stumbled into this pretty accidentally, when rewriting
some spaghetti-like code into something more structured,
which involved using some `std::array<>`s. And to my surprise,
the `alloca`s remained, causing about `+160%` perf regression.

https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions
suggests that this has geomean compile-time cost of `+0.08%`.

Note that D68593 / cecc0d27ad
already did this chage for NewPM, but left OldPM in a pessimized state.

This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 | PR40011 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=42794 | PR42794 ]] and probably some other reports.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D87972
2020-10-04 11:53:50 +03:00
Arthur Eubanks 7468afe9ca [DAE] MarkLive in MarkValue(MaybeLive) if any use is live
While looping through all args or all return values, we may mark a use
of a later iteration as live. Previously when we got to that later value
it would ignore that and continue adding to Uses instead of marking it
live. For example, when looping through arg#0 and arg#1,
MarkValue(arg#0, Live) may cause some use of arg#1 to be live, but
MarkValue(arg#1, MaybeLive) will not notice that and continue adding
into Uses.

Now MarkValue(RA, MaybeLive) will MarkLive(RA) if any use is live.

Fixes PR47444.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D88529
2020-10-02 10:55:08 -07:00
Arthur Eubanks eb55735073 Reland [AlwaysInliner] Update BFI when inlining
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88324
2020-10-02 10:46:57 -07:00
Arthur Eubanks 9b8c0b8b46 Revert "[AlwaysInliner] Update BFI when inlining"
This reverts commit b1bf24667f.
2020-10-02 10:34:51 -07:00
Arthur Eubanks b1bf24667f [AlwaysInliner] Update BFI when inlining
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88324
2020-10-02 10:26:34 -07:00
Joseph Huber 82453e759c [OpenMP] Add Missing Runtime Call for Globalization Remarks
Summary:
Add a missing runtime call to perform data globalization checks.

Reviewers: jdoerfert

Subscribers: guansong hiraditya llvm-commits sstefan1 yaxunl

Tags: #LLVM #OpenMP

Differential Revision: https://reviews.llvm.org/D88621
2020-10-01 21:19:53 -04:00
Sjoerd Meijer d53b4bee0c [LoopFlatten] Add a loop-flattening pass
This is a simple pass that flattens nested loops.  The intention is to optimise
loop nests like this, which together access an array linearly:

  for (int i = 0; i < N; ++i)
    for (int j = 0; j < M; ++j)
      f(A[i*M+j]);

into one loop:

  for (int i = 0; i < (N*M); ++i)
    f(A[i]);

It can also flatten loops where the induction variables are not used in the
loop. This can help with codesize and runtime, especially on simple cpus
without advanced branch prediction.

This is only worth flattening if the induction variables are only used in an
expression like i*M+j. If they had any other uses, we would have to insert a
div/mod to reconstruct the original values, so this wouldn't be profitable.

This partially fixes PR40581 as this pass triggers on one of the two cases. I
will follow up on this to learn LoopFlatten a few more (small) tricks. Please
note that LoopFlatten is not yet enabled by default.

Patch by Oliver Stannard, with minor tweaks from Dave Green and myself.

Differential Revision: https://reviews.llvm.org/D42365
2020-10-01 13:54:45 +01:00
Arthur Eubanks 460dda071e [WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass
The legacy pass's default constructor sets UseCommandLine = true and
goes down a separate testing route. Match that in the NPM pass.

This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D88588
2020-09-30 17:27:37 -07:00
Daniel Kiss c5a4900e1a [AArch64] Add BTI to CFI jumptables.
With branch protection the jump to the jump table entries requires a landing pad.

Reviewed By: eugenis, tamas.petz

Differential Revision: https://reviews.llvm.org/D81251
2020-09-29 13:50:23 +02:00
sstefan1 cb9cfa0d2f [OpenMPOpt][Fix] Only initialize ICV initial values once.
Reviewers: jdoerfert, ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D88441
2020-09-29 12:22:58 +02:00
Nikita Popov 9b959b59df [LVI] Require context instruction in external API (NFCI)
Require CxtI in getConstant() and getConstantRange() APIs.
Accordingly drop the BB parameter, as it is implied by
CxtI->getParent().

This makes sure we don't forget to pass the context instruction,
and makes the API contract clearer (also clean up the comments to
that effect -- the value holds at the context instruction, not
the end of the block).
2020-09-27 18:07:24 +02:00
Arthur Eubanks 83e3ea2cfc [LowerTypeTests][NewPM] Add constructor that uses command line flags
This matches the legacy PM pass by having one constructor use command
line flags, and the other use parameters to the pass.

This fixes all tests under Transforms/LowerTypeTests using NPM.

Reviewed By: ychen, pcc

Differential Revision: https://reviews.llvm.org/D87845
2020-09-25 17:39:59 -07:00
Joseph Huber a22814194e [OpenMP] OpenMPOpt Support for Globalization Remarks
Summary:
This patch add support for printing analysis messages relating to data
globalization on the GPU. This occurs when data is shared between the
threads in a GPU context and must be pushed to global or shared memory.

Reviewers: jdoerfert

Subscribers: guansong hiraditya llvm-commits ormris sstefan1 yaxunl

Tags: #OpenMP #LLVM

Differential Revision: https://reviews.llvm.org/D88243
2020-09-24 18:23:12 -04:00
Hamilton Tobon Mosquera bd31abc1d0 [OpenMPOpt] Refactored "issue" and "wait" declarations for data map runtime call.
Refactored __tgt_target_data_begin_mapper_<issue|wait> to receive the handle as an input/output argument.
This given the compiler warning of returning the handle as copy.

Differential Revision: https://reviews.llvm.org/D88029
2020-09-22 10:50:17 -05:00
Arthur Eubanks 3bf703fb6d [AlwaysInliner] Emit optimization remarks
To match the normal inliner in preparation for https://reviews.llvm.org/D86988.

Also change a FIXME to an assert.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88067
2020-09-21 22:09:28 -07:00
Fangrui Song 871d03a675 [FunctionAttrs] Inline setDoesNotRecurse() and delete it. NFC
It always returns true, which may lead to confusion. Inline it because it is
trivial and only called twice.
2020-09-19 22:24:52 -07:00
Fangrui Song 0526713aa8 [FunctionAttrs] Remove redundant check. NFC 2020-09-19 20:46:18 -07:00
Fangrui Song 6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Dangeti Tharun kumar 01e2b394ee [Partial Inliner] Compute intrinsic cost through TTI
https://bugs.llvm.org/show_bug.cgi?id=45932

assert(OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined function cost should be no less than the outlined region") getting triggered in computeBBInlineCost.

Intrinsics like "assume" are considered regular function calls while computing costs.
This patch enables computeBBInlineCost to queries TTI for intrinsic call cost.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87132
2020-09-16 15:12:31 +01:00
Arthur Eubanks ba12e77ec1 [NewPM] Port strip* passes to NPM
strip-nondebug and strip-debug-declare have no existing associated tests

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87639
2020-09-15 18:25:12 -07:00
Florian Hahn 3d42d54955 [ConstraintElimination] Add constraint elimination pass.
This patch is a first draft of a new pass that adds a more flexible way
to eliminate compares based on more complex constraints collected from
dominating conditions.

In particular, it aims at simplifying conditions of the forms below
using a forward propagation approach, rather than instcomine-style
ad-hoc backwards walking of def-use chains.

    if (x < y)
      if (y < z)
        if (x < z) <- simplify

or

    if (x + 2 < y)
        if (x + 1 < y) <- simplify assuming no wraps

The general approach is to collect conditions and blocks, sort them by
dominance and then iterate over the sorted list. Conditions are turned
into a linear inequality and add it to a system containing the linear
inequalities that hold on entry to the block. For blocks, we check each
compare against the system and see if it is implied by the constraints
in the system.

We also keep a stack of processed conditions and remove conditions from
the stack and the constraint system once they go out-of-scope (= do not
dominate the current block any longer).

Currently there still are the least the following areas for improvements

* Currently large unsigned constants cannot be added to the system
  (coefficients must be represented as integers)
* The way constraints are managed currently is not very optimized.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D84547
2020-09-15 19:31:11 +01:00
Arthur Eubanks f3d8344854 [PruneEH][NFC] Use CallGraphUpdater in PruneEH
In preparation for porting the pass to NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87632
2020-09-14 14:43:19 -07:00
Ettore Tiotto 6b13cfe739 [ArgumentPromotion]: Copy function metadata after promoting arguments
The argument promotion pass currently fails to copy function annotations
over to the modified function after promoting arguments.
This patch copies the original function annotation to the new function.

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D86630
2020-09-10 13:08:57 -04:00
Juneyoung Lee 1b9884df8d Enable InsertFreeze flag of JumpThreading when used in LTO
This patch enables inserting freeze when JumpThreading converts a select to
a conditional branch when it is run in LTO.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85534
2020-09-10 19:05:49 +09:00
Johannes Doerfert d445b6dfec [Attributor] Cleanup `::initialize` of various AAs
This commit cleans up the ::initialize method of various AAs in the
following ways:
  - If an associated function is required, give up on declarations.
    This was discovered as a real problem when lots of llvm.dbg.XXX
    call sites were assumed `noreturn` until proven otherwise. That
    does not make any sense and caused huge regressions and missed
    deductions.
  - Require more associated declarations for function interface AAs.
  - Use the IRAttribute::initialize to determine if function interface
    AAs can be used in IPO, don't replicate the checks (especially
    isFunctionIPOAmendable) all over the place. Arguably the function
    declaration check should be moved to some central place to.
2020-09-09 01:38:25 -05:00
Johannes Doerfert 849146ba93 [Attributor] Associate the callback callee with a call site argument (if any)
If we have a callback, call site arguments were already associated with
the callback callee. Now we also associate the function with the
callback callee, thus we know ensure that the following holds true (if
all return nonnull):
   `getAssociatedArgument()->getParent() == getAssociatedFunction()`

To test this an early exit from
  `AAMemoryBehaviorCallSiteArgument::initialize``
is included as well. Without the change to getAssociatedFunction() this
kind of early exit for declarations would cause callback call site
arguments to miss out.
2020-09-09 00:52:17 -05:00
Johannes Doerfert cefd2a2c70 [Attributor] Cleanup `IRPosition::getArgNo` usages
As we handle callback calls we need to disambiguate the call site
argument number from the callee argument number. While always equal in
non-callback calls, a callback comes with a partial parameter-argument
mapping so there is no implicit correspondence. Here we split
`IRPosition::getArgNo()` into two public functions, `getCallSiteArgNo()`
and `getCalleeArgNo()`. Usages are adjusted to pick the right one for
their purpose. This fixed some problems that would have been exposed as
we more aggressively optimize callbacks.
2020-09-09 00:52:17 -05:00
Johannes Doerfert c0ab901bdd [Attributor] Selectively look at the callee even when there are operand bundles
While operand bundles carry unpredictable semantics, we know some of
them and can therefore "ignore" them. In this case we allow to look at
the declaration of `llvm.assume` when asked for the attributes at a call
site. The assume operand bundles we have do not invalidate the
declaration attributes.

We cannot test this in isolation because the llvm.assume attributes are
determined by the parser. However, a follow up patch will provide test
coverage.
2020-09-09 00:52:17 -05:00
Johannes Doerfert d5d75f61e5 [Attributor] Provide a command line option that limits recursion depth
In `MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.cpp` we initialized
attributes until stack frame ~35k caused space to run out. The initial
size 1024 is pretty much random.
2020-09-09 00:47:02 -05:00
Johannes Doerfert 711bf7dcf9 [Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions
The CloneFunctionInto has implicit requirements with regards to the
linkage and visibility of the function. We now update these after we did
the CloneFunctionInto on the copy with the same linkage and visibility
as the original.
2020-09-07 23:38:09 -05:00
Johannes Doerfert e6208849c8 [Attributor][NFC] Change variable spelling 2020-09-07 23:38:09 -05:00
Johannes Doerfert 8637acac5a [Attributor][NFC] Clang tidy: no else after continue 2020-09-07 23:38:08 -05:00
Johannes Doerfert ff70c25d76 [Attributor][NFC] Expand `auto` types (clang-fix-it) 2020-09-07 23:38:08 -05:00
Johannes Doerfert 79651265b2 [Attributor][FIX] Properly return changed if the IR was modified
Deleting or replacing anything is certainly a modification. This caused
a later assertion in IPSCCP when compiling 400.perlbench with the new PM.
I'm not sure how to test this.
2020-09-07 23:38:08 -05:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Wei Wang 4eef14f978 [OpenMPOpt] Assume indirect call always changes ICV
When checking call sites, give special handling to indirect call, as the
callee may be unknown and can lead to nullptr dereference later. Assume
conservatively that the ICV always changes in such case.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D87104
2020-09-04 09:05:32 -07:00
Florian Hahn 6de51189b0 [PassManager] Move load/store motion pass after DSE in LTO pipeline.
As far as I am aware, the placement of MergedLoadStoreMotion in the
pipeline is not heavily tuned currently. It seems to not matter much if
we do it after DSE in the LTO pipeline (no binary changes for -O3 -flto
on MultiSource/SPEC2000/SPEC2006). Moving it after DSE however has a
major benefit: MemorySSA is constructed by LICM and is consumed by DSE,
so if MergedLoadStoreMotion happens after DSE, we do not need to
preserve MemorySSA in it.

If there are any concerns with this move, I can also update
MergedLoadStoreMotion to preserve MemorySSA.

This patch together with D86651 (preserve MemSSA in MemCpyOpt) and
D86534 (preserve MemSSA in GVN) are the remaining patches to bring down
compile-time for DSE + MemorySSA to the levels outlined in
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

Once they land, we should be able to start with flipping the switch on
enabling DSE + MmeorySSA.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86967
2020-09-03 13:47:50 +01:00
David Stenberg 6d36b22b21 [GlobalOpt] Fix an incorrect Modified status
When marking a global variable constant, and simplifying users using
CleanupConstantGlobalUsers(), the pass could incorrectly return false if
there were still some uses left, and no further optimizations was done.

This was caught using the check introduced by D80916.

This fixes PR46749.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D85837
2020-09-02 15:00:45 +02:00
Shinji Okumura 5d13479574 [Attributor] Make use of AANoUndef in AAUndefinedBehavior
This patch makes it possible for AAUB to use information from AANoUndef.
This is the next patch of D86983

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86984
2020-09-02 16:08:03 +09:00
Shinji Okumura 7558e9e5a2 [Attributor] Fix AANoUndef initialization
When the associated value is undef, we immediately forced to indicate a pessimistic fixpoint so far.
This patch changes the initialization to check the attribute given in IR at first and to indicate an optimistic fixpoint when it is given.
This change will enable us to catch , for example, the following case in AAUB.
```
call void @foo(i32 noundef undef)
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86983
2020-09-02 15:40:43 +09:00
Hamilton Tobon Mosquera 1d3d9b9cd8 [OpenMPOpt][NFC] Moving constants as struct static attributes 2020-08-31 19:05:00 -05:00
Hamilton Tobon Mosquera 8931add617 [OpenMPOpt][HideMemTransfersLatency] Get values stored in offload arrays
getValuesInOffloadArrays goes through the offload arrays in __tgt_target_data_begin_mapper getting the values stored in them before the call is issued.

call void @__tgt_target_data_begin_mapper(arg0, arg1,
    i8** %offload_baseptrs, i8** %offload_ptrs, i64* %offload_sizes,
...)

Diferential Revision: https://reviews.llvm.org/D86300
2020-08-31 15:33:05 -05:00
sstefan1 5dfd7cc46c Reland [OpenMPOpt] ICV tracking for calls
The problem with module slice has been addressed in D86319

Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.

Differential Revision: https://reviews.llvm.org/D85544
2020-08-30 11:27:48 +02:00
sstefan1 8d8ce85b23 [Attributor] Introduce module slice.
Summary:
The module slice describes which functions we can analyze and transform
while working on an SCC as part of the Attributor-CGSCC pass. So far we
simply restricted it to the SCC.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D86319
2020-08-30 10:30:44 +02:00
Shinji Okumura a7ca9e09bd [Attributor] Fix callsite check in AAUndefinedBehavior
This is the next patch of D86842
When we check `noundef` attribute violation at callsites, we do not have to require `nonnull` in the following two cases.
1. An argument is known to be simplified to undef
2. An argument is known to be dead

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86845
2020-08-30 13:17:02 +09:00
Shinji Okumura 7082381735 [Attributor][NFC] Fix dependency type in AAUndefinedBehaviorImpl::updateImpl
This patch fixes wrong dependency type in AAUB.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86842
2020-08-30 12:34:50 +09:00
Shinji Okumura 7a15dfd056 [Attributor] Fix AANoUndef identification
Even though `noundef` IR attribute might be attached to non-void type values, AANoUndef is mistakenly identified for pointer type values only.
This patch fixes that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86737
2020-08-30 05:39:25 +09:00
Shinji Okumura 1364d856f4 [Attributor][NFC] Do not manifest noundef for positions to be changed to undef
This patch fixes AANoUndef manifestation.
We should not manifest noundef for positions that will be changed to undef.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86835
2020-08-30 03:23:41 +09:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Benjamin Kramer dce72dc870 [FunctionAttrs] Bulk remove attributes. NFC. 2020-08-28 12:56:19 +02:00
Shinji Okumura 50ebd1afa9 [Attributor] Do not manifest noundef for dead positions
Even if noundef is deduced for a position, we should not manifest it when the position is dead.
This is because the associated values with dead positions are replaced with undef values by AAIsDead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86565
2020-08-28 05:58:18 +09:00
Shinji Okumura c5e6872ec6 [Attributor] Guarantee getAAFor not to update AA in the manifestation stage
If we query an AA with `Attributor::getAAFor` in `AbstractAttribute::manifest`, the AA may be updated.
This patch makes use of the phase flag in Attributor, and handle `getAAFor` behavior according to the flag.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86635
2020-08-28 04:07:42 +09:00
Shinji Okumura 7a68f0f1e0 [Attributor] Add a phase flag to Attributor
Add a new flag that indicates which stage in the process we are in.
This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86678
2020-08-28 01:16:38 +09:00
serge-sans-paille 4e29d25669 Fix OpenMP deduplicateRuntimeCalls return status
Differential Revision: https://reviews.llvm.org/D86705
2020-08-27 15:01:04 +02:00
serge-sans-paille 5621571fc7 Fix Attributor return status
Differential Revision: https://reviews.llvm.org/D86703
2020-08-27 15:01:04 +02:00
Shinji Okumura 6c25eca614 [Attributor] Add flag for undef value to the state of AAPotentialValues
Currently, an undef value is reduced to 0 when it is added to a set of potential values.
This patch introduces a flag for under values. By this, for example, we can merge two states `{undef}`, `{1}` to `{1}` (because we can reduce the undef to 1).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85592
2020-08-27 16:30:29 +09:00
Wei Mi c67ccf5faf [SampleFDO] Enhance profile remapping support for searching inline instance
and indirect call promotion candidate.

Profile remapping is a feature to match a function in the module with its
profile in sample profile if the function name and the name in profile look
different but are equivalent using given remapping rules. This is a useful
feature to keep the performance stable by specifying some remapping rules
when sampleFDO targets are going through some large scale function signature
change.

However, currently profile remapping support is only valid for outline
function profile in SampleFDO. It cannot match a callee with an inline
instance profile if they have different but equivalent names. We found
that without the support for inline instance profile, remapping is less
effective for some large scale change.

To add that support, before any remapping lookup happens, all the names
in the profile will be inserted into remapper and the Key to the name
mapping will be recorded in a map called NameMap in the remapper. During
name lookup, a Key will be returned for the given name and it will be used
to extract an equivalent name in the profile from NameMap. So with the help
of the NameMap, we can translate any given name to an equivalent name in
the profile if it exists. Whenever we try to match a name in the module to
a name in the profile, we will try the match with the original name first,
and if it doesn't match, we will use the equivalent name got from remapper
to try the match for another time. In this way, the patch can enhance the
profile remapping support for searching inline instance and searching
indirect call promotion candidate.

In a planned large scale change of int64 type (long long) to int64_t (long),
we found the performance of a google internal benchmark degraded by 2% if
nothing was done. If existing profile remapping was enabled, the performance
degradation dropped to 1.2%. If the profile remapping with the current patch
was enabled, the performance degradation further dropped to 0.14% (Note the
experiment was done before searching indirect call promotion candidate was
added. We hope with the remapping support of searching indirect call promotion
candidate, the degradation can drop to 0% in the end. It will be evaluated
post commit).

Differential Revision: https://reviews.llvm.org/D86332
2020-08-26 11:07:35 -07:00
Shinji Okumura 3050713798 [Attributor] Provide an edge-based interface in AAIsDead
This patch produces an edge-based interface in AAIsDead.
By this, we can query a set of basic blocks that are directly reachable from a given basic block.
This is specifically useful for implementation of AAReachability.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85547
2020-08-26 16:57:52 +09:00
Shinji Okumura 05390440a2 [Attributor][NFC] Clang format 2020-08-25 19:32:58 +09:00
Fangrui Song 44ee9d070a Revert D85812 "[coroutine] should disable inline before calling coro split"
This reverts commit 2e43acfed8.

LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the
library which contains SampleProfile.cpp). It is inappropriate for
SampleProfile.cpp to depent on Coroutines.h (circular dependency).

The test inverted dependencies as well:
llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.
2020-08-24 11:41:05 -07:00
dongAxis 2e43acfed8 [coroutine] should disable inline before calling coro split
summary:
When callee coroutine function is inlined into caller coroutine
function before coro-split pass, llvm will emits "coroutine should
have exactly one defining @llvm.coro.begin". It seems that coro-early
pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute
"coroutine.presplit" (it means the function has not been splited) to
fix this issue

TestPlan: check-llvm

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D85812
2020-08-24 22:22:08 +08:00
Roman Lebedev 503deec218
Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d8.
2020-08-22 00:33:22 +03:00
kuterd 65fcc0ee31 [Attributor] Function seed allow list
-  Adds a command line option to seed only selected functions.
  - Makes seed allow listing exclusive to assertions enabled builds.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D86129
2020-08-21 23:55:26 +03:00
Shinji Okumura e21a22a7a8 [Attributor] fix AANoUndef initialization
Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position.
 This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86361
2020-08-22 05:06:14 +09:00
Shinji Okumura 835cfa5def [Attributor] Handle CallBase case in AAValueConstantRange::initialize
Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case.
That is problematic when we propagate a range from call site returned position to floating position.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86196
2020-08-20 20:15:19 +09:00
David Stenberg 8206257cb8 [GlobalOpt] Fix an incorrect Modified status
When removing a non-constant store to a global in
CleanupPointerRootUsers(), the GlobalOpt pass could incorrectly return
false.

This was caught using the check introduced by D80916.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86149
2020-08-20 11:52:09 +02:00
Evgeny Leviant d5b701b972 [ThinLTO] Import globals recursively
Differential revision: https://reviews.llvm.org/D73698
2020-08-20 12:13:43 +03:00
Johannes Doerfert 012819f301 [Attributor][FIX] Update the call graph properly when internalizing functions
The internal version is now part of the SCC, make sure to perform this
update.
2020-08-20 01:44:58 -05:00
Johannes Doerfert 3edea15f9a [Attributor] Simplify comparison against constant null pointer
Comparison against null is a common pattern that usually is followed by
error handling code and the likes. We now use AANonNull to simplify
these comparisons optimistically in order to make more code dead early
on.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D86145
2020-08-20 01:44:58 -05:00
Johannes Doerfert d01ad217ba [Attributor][FIX] Do not use cyclic arguments for `nonnull`
`AADereferenceable::getAssumedDereferenceableBytes()` is actually
deducing `dereferenceable_or_null`. We should not use that information
to deduce `nonnull`, since it doesn't imply `nonnull`.
2020-08-20 01:44:58 -05:00
Johannes Doerfert a49dae0e38 [Attributor][AAIsDead][NFC] Skip uninteresting instructions early 2020-08-20 01:44:58 -05:00
Johannes Doerfert 08f33756e6 [Attributor][NFC] Extract functionality into own member 2020-08-20 01:44:58 -05:00
Johannes Doerfert 1de70a724e Revert "[OpenMPOpt] ICV tracking for calls"
This commits breaks certain OpenMP codes (on power) because it expanded
the Attributor scope without telling the Attributor about the SCC
extend. See: https://reviews.llvm.org/D85544#2227611

This reverts commit b0b32e6490.
2020-08-20 00:00:35 -05:00
Kyungwoo Lee 7a028fe702 Force Remove Attribute
-force-attribute adds an attribute to function via command-line.
However, there was no counter-part to remove an attribute.  This patch
adds -force-remove-attribute that removes an attribute from function.

Differential Revision: https://reviews.llvm.org/D85586
2020-08-19 17:30:13 -04:00
Hamilton Tobon Mosquera bd2fa1819b [OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of __tgt_target_data_begin_mapper
canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point.

Differential Revision: https://reviews.llvm.org/D86155
2020-08-19 11:42:22 -05:00
sstefan1 b0b32e6490 [OpenMPOpt] ICV tracking for calls
Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.

Differential Revision: https://reviews.llvm.org/D85544
2020-08-19 11:43:12 +02:00
Shinji Okumura 5e361e2aa4 [Attributor] Deduce noundef attribute
This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85184
2020-08-18 18:05:54 +09:00
Johannes Doerfert 8abd69aa9e [Attributor] Bail early if AAMemoryLocation cannot derive anything
Before this change we looked through all memory operations in a function
even if the first was an unknown call that could do anything. This did
cost a lot of time but there is little use to do so. We also avoid
creating AAs for things that we would have looked at in case no other AA
will; that is the reason for the test changes.

Running only the attributor-cgscc pass on a IR version of
`llvm-test-suite/MultiSource/Applications/SPASS/clause.c` reduced the
time we spend in `AAMemoryLocation::update` from 4% total to
0.9% (disclaimer: no accurate measurements).
2020-08-17 23:36:36 -05:00
Johannes Doerfert 1d99c3d707 [Attributor] We (should) keep the CG updated so we can mark it as preserved 2020-08-17 23:36:36 -05:00
Johannes Doerfert 858c75f7d1 [Attributor][NFC] Directly return proper type to avoid casts 2020-08-17 23:36:36 -05:00
Johannes Doerfert b27bdf955a [Attributor][FIX] Handle function pointers properly in AANonNull
Before we tired to create a dominator tree for a declaration when we
wanted to determine if the function pointer is `nonnull`. We now avoid
looking at global values if `Value::getPointerDereferenceableBytes` not
already determined `nonnull`.
2020-08-17 23:36:35 -05:00
Hamilton Tobon Mosquera 496f8e5b36 [OpenMPOpt][HideMemTransfersLatency] Split __tgt_target_data_begin_mapper into its "issue" and "wait" counterparts.
WIP that tries to hide the latency of runtime calls that involve host to
device memory transfers by splitting them into their "issue" and "wait"
versions. The "issue" is moved upwards as much as possible. The "wait" is
moved downards as much as possible. The "issue" issues the memory transfer
asynchronously, returning a handle. The "wait" waits in the returned
handle for the memory transfer to finish. We still lack of the movement.
2020-08-17 20:56:10 -05:00
Johannes Doerfert 19bd4ef157 [Attributor] Properly use the call site argument position 2020-08-17 18:21:09 -05:00
Johannes Doerfert 5dfc207c53 [Attributor][FIX] Do not request an AANonNull for non-pointer types 2020-08-17 18:21:08 -05:00
Wenlei He 577e58bcc7 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

This is a resubmit of https://reviews.llvm.org/D83743
2020-08-15 20:17:21 -07:00
Luofan Chen 266949b2bc [Attributor][NFC] Format code 2020-08-16 00:00:45 +08:00
Luofan Chen b7448a348b [Attributor][NFC] Use indexes instead of iterator
When adding elements when iterating, the iterator will become
valid, which could cause errors. This fixes the issue by using
indexes instead of iterator.
2020-08-15 23:09:46 +08:00
Luofan Chen 87a85f3d57 [Attributor] Use internalized version of non-exact functions
This patch internalize non-exact functions and replaces of their uses
with the internalized version. Doing this enables the analysis of
non-exact functions.

We can do this because some non-exact functions with the same name
whose linkage is `linkonce_odr` or `weak_odr` should have the same
semantics, so we can safely internalize and replace use of them (the
result of the other version of this function should be the same.).
Note that not all functions can be internalized, e.g., function with
`linkonce` or `weak` linkage.

For now when specified in commandline, we internalize all functions
that meet the requirements without calculating the cost of such
internalzation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84167
2020-08-15 20:23:38 +08:00
Shinji Okumura 5f55a8193c [Attributor] Implement AAPotentialValues
This patch provides an implementation of `AAPotentialValues`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85632
2020-08-14 20:51:14 +09:00
Aditya Kumar f902a7eccf [HotColdSplit] Fix variable name spelling 2020-08-12 22:50:08 -07:00
Kyungwoo Lee d73be5af0a [NFC] Factor out hasForceAttributes
This is a preparation for https://reviews.llvm.org/D85586.

Differential Revision: https://reviews.llvm.org/D85793
2020-08-12 02:16:57 -04:00
Amy Huang 54b6cca0f2 [globalopt] Change so that emitting fragments doesn't use the type size of DIVariables
When turning on -debug-info-kind=constructor we ran into a "fragment covers
entire variable" error during thinlto. The fragment is currently always
emitted if there is no type size, but sometimes the variable has a
forward declared struct type which doesn't have a size.

This changes the code to get the type size from the GlobalVariable instead.

Differential Revision: https://reviews.llvm.org/D85572
2020-08-11 14:50:56 -07:00
Shinji Okumura 06eee8748f [Attributor][NFC] Connect AAPotentialValues with AAValueSimplify
This patch enables `AAValueSimplify` to use information from `AAPotentialValues`

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85668
2020-08-11 15:52:02 +09:00
Wei Mi 4cd8e9b169 [SampleFDO] Stop letting findCalleeFunctionSamples return unrelated profiles
for invoke instructions.

We see a warning of "No debug information found in function foo: Function
profile not used" in a case. The function foo is called by an invoke
instruction. It has no debug information because it has attribute((nodebug))
in the definition. It shouldn't have profile instance in the sample profile
but compiler thinks it does, that turns out to be a compiler bug in
findCalleeFunctionSamples. The bug is exposed when sample-profile-merge-inlinee
is enabled recently.

Currently in findCalleeFunctionSamples, CalleeName is unset and is empty for
invoke instruction. For empty CalleeName, findFunctionSamplesAt will treat
the call as an indirect call and will return any inline instance profile at
the same location as the instruction. That leads to a wrong profile being
returned to function foo.

The patch set CalleeName when the instruction is an invoke.

Differential Revision: https://reviews.llvm.org/D85664
2020-08-10 12:41:09 -07:00
Aditya Kumar 53ac144848 [HotColdSplit] Add options for splitting cold functions in separate section
Add support for (if enabled) splitting cold functions into a separate section
in order to further boost locality of hot code.

Authored by: rjf (Ruijie Fang)
Reviewed by: hiraditya,rcorcs,vsk

Differential Revision: https://reviews.llvm.org/D85331
2020-08-09 08:48:12 -07:00
Shinji Okumura c575ba28de [Attributor] AAPotentialValues Interface
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83283
2020-08-07 17:35:12 +09:00
Shinji Okumura f13f2e16f0 [Attributor] Check violation of returned position nonnull and noundef attribute in AAUndefinedBehavior
This patch is a follow up of D84733.
If a function has noundef attribute in returned position, instructions that return undef or poison value cause UB.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85178
2020-08-07 12:02:42 +09:00
Shinji Okumura ffe0066b62 [Attributor][NFC] Clang format 2020-08-04 09:04:12 +09:00
Florian Hahn 1e392fc445 [ArgPromotion] Replace all md uses of promoted values with undef.
Currently, ArgPromotion may leave metadata uses of promoted values,
which will end up in the wrong function, creating invalid IR.

PR33641 fixed this for dead arguments, but it can be also be triggered
arguments with users that are promoted (see the updated test case).

We also have to drop uses to them after promoting them. We need to do
this after dealing with the non-metadata uses, so I also moved the empty
use case to the loop that deals with updating the arguments of the new
function.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D85127
2020-08-03 19:31:53 +01:00
Shinji Okumura 434cf2ded3 [Attributor] Check nonnull attribute violation in AAUndefinedBehavior
This patch makes it possible to handle nonnull attribute violation at callsites in AAUndefinedBehavior.
If null pointer is passed to callee at a callsite and the corresponding argument of callee has nonnull attribute, the behavior of the callee is undefined.
In this patch, violations of argument nonnull attributes is only handled.
But violations of returned nonnull attributes can be handled and I will implement that in a follow-up patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84733
2020-08-03 17:12:50 +09:00
Florian Hahn 599955eb56 Recommit "[IPConstProp] Remove and move tests to SCCP."
This reverts commit 59d6e814ce.

The cause for the revert (3 clang tests running opt -ipconstprop) was
fixed by removing those lines.
2020-08-02 22:23:54 +01:00
Shinji Okumura 376b64926b Revert "[Attributor] AAPotentialValues Interface"
The commit cause build failure.
2020-08-02 22:49:52 +09:00
Shinji Okumura d3f01b6681 [Attributor] AAPotentialValues Interface
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83283
2020-08-02 19:12:17 +09:00
AK 20797989ea Outline non returning functions unless a longjmp
__assert_fail, abort, exit etc. are cold.
TODO: outline throw

Authored by: rjf (Ruijie Fang)
Reviewed by: hiraditya,tejohnson,fhahn

Differential Revision: https://reviews.llvm.org/D69257
2020-08-01 22:16:14 -07:00
Teresa Johnson 1479cdfe4f [ThinLTO] Compile time improvement to propagateAttributes
I found that propagateAttributes was ~23% of a thin link's run time
(almost 4x higher than the second hottest function). The main reason is
that it re-examines a global var each time it is referenced. This
becomes unnecessary once it is marked both non read only and non write
only. I added a set to avoid doing redundant work, which dropped the
runtime of that thin link by almost 15%.

I made a smaller efficiency improvement (no measurable impact) to skip
all summaries for a VI if the first copy is dead. I added an assert to
ensure that all copies are dead if any is. The code in
computeDeadSymbols marks all summaries for a VI as live. There is one
corner case where it was skipping marking an alias as live, that I
fixed. However, since the code earlier marked all copies of a preserved
GUID's VI as live, and each 'visit' marks all copies live, the only case
where this could make a difference is summaries that were marked live
when they were built initially, and that is only a few special compiler
generated symbols and inline assembly symbols, so it likely is never
provoked in practice.

Differential Revision: https://reviews.llvm.org/D84985
2020-07-31 10:54:02 -07:00
Hongtao Yu d23c1d6a8d [AutoFDO] Avoid merging inlinee samples multiple times
A function call can be replicated by optimizations like loop unroll and jump threading and the replicates end up sharing the sample nested callee profile. Therefore when it comes to merging samples for uninlined callees in the sample profile inliner, a callee profile can be merged multiple times which will cause an assert to fire.

This change avoids merging same callee profile for duplicate callsites by filtering out callee profiles with a non-zero head sample count.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D84997
2020-07-31 09:30:05 -07:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Wei Mi 836991d367 Fix a crash when the sample profile uses md5 and -sample-profile-merge-inlinee
is enabled.

When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be
created during profile merge without GUIDToFuncNameMap being initialized.
That will occasionally cause compiler crash. The patch fixes it.

Differential Revision: https://reviews.llvm.org/D84994
2020-07-30 21:21:06 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
kuterd 49def10e02 [Attributor] Add time trace support.
This patch addes time trace functionality to have a better understanding
of the analysis times.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84980
2020-07-31 03:08:50 +03:00
Simon Pilgrim 6316b0023e Attributor.h - remove unnecessary includes. NFCI.
Fix implicit cpp include dependencies.
2020-07-30 15:26:41 +01:00
Florian Hahn 59d6e814ce Revert "[IPConstProp] Remove and move tests to SCCP."
This reverts commit e77624a3be.

Looks like some clang tests manually invoke -ipconstprop via opt.....
2020-07-30 13:06:54 +01:00
Florian Hahn e77624a3be [IPConstProp] Remove and move tests to SCCP.
As far as I know, ipconstprop has not been used in years and ipsccp has
been used instead. This has the potential for confusion and sometimes
leads people to spend time finding & reporting bugs as well as
updating it to work with the latest API changes.

This patch moves the tests over to SCCP. There's one functional difference
I am aware of: ipconstprop propagates for each call-site individually, so
for functions that are called with different constant arguments it can sometimes
produce better results than ipsccp (at much higher compile-time cost).But
IPSCCP can be thought to do so as well for internal functions and as mentioned
earlier, the pass seems unused in practice (and there are no plans on working
towards enabling it anytime).

Also discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143773.html

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84447
2020-07-30 12:36:27 +01:00
Roman Lebedev 1d51dc38d8
[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108
2020-07-29 20:05:30 +03:00
Arthur Eubanks 2ca6c422d2 [FunctionAttrs] Rename functionattrs -> function-attrs
To match NewPM pass name, and also for readability.
Also rename rpo-functionattrs -> rpo-function-attrs while we're here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84694
2020-07-28 09:09:13 -07:00
Luofan Chen 5ee07dc53f [Attributor] Track AA dependency using dependency graph
This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D78861
2020-07-28 18:02:49 +08:00
Roman Lebedev 351d234d86
[OpenMPOpt] Most SCC's are uninteresting, don't waste time on them (up to 16x faster)
Summary:
This seems obvious in hindsight, but the result is surprising.
I've measured compile-time of `-openmpopt` pass standalone
on RawSpeed unity build, and while there is some OpenMP stuff,
most is not OpenMP. But nonetheless the pass does a lot of costly
preparations before ever trying to look for OpenMP stuff in SCC.

Numbers (n=25): 0.094624s  ->  0.005976s, an -93.68% improvement, or ~16x

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, hiraditya, guansong, llvm-commits, sstefan1

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84689
2020-07-27 23:36:34 +03:00
Kazu Hirata 902cbcd59e Use llvm::is_contained where appropriate (NFC)
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.

Reviewers: efriedma, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84489
2020-07-27 10:20:44 -07:00
Shinji Okumura 697c6d8907 [Attributor] Cache query results for isPotentiallyReachable in AAReachability
Summary:
This is the next patch of [[ https://reviews.llvm.org/D76210 | D76210 ]].
This patch made a map in `InformationCache` for caching results.

Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, kuter, bbn, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83246
2020-07-23 20:49:28 +09:00
Sjoerd Meijer 5567c62afa [Matrix] Add LowerMatrixIntrinsics to the NPM
Pass LowerMatrixIntrinsics wasn't running yet running under the new pass
manager, and this adds LowerMatrixIntrinsics to the pipeline (to the
same place as where it is running in the old PM).

Differential Revision: https://reviews.llvm.org/D84180
2020-07-22 09:47:53 +01:00
Florian Hahn dc1087d408 [Matrix] Add minimal lowering pass that only requires TTI.
This patch adds a new variant of the matrix lowering pass that only does
a minimal lowering and only depends on TTI. The main purpose of this pass
is to have a pass with minimal dependencies to run as part of the backend
pipeline.

At the moment, the only difference to the regular lowering pass is that it
does not support remarks. But in subsequent patches add support for tiling
to the lowering pass which will require more analysis, which we do not want
to run in the backend, as the lowering should happen in the middle-end in
practice and running it in the backend is mostly for convenience when
running llc.

Reviewers: anemet, Gerolf, efriedma, hfinkel

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76867
2020-07-20 11:16:11 +01:00
Wenlei He d41d952be9 Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks"
This reverts commit 2d6ecfa168.
2020-07-19 08:49:04 -07:00
Wenlei He 2d6ecfa168 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
Summary:
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

Subscribers: mgorny, aprantl, hiraditya, llvm-commits

Tags: #llvm

Resubmit for https://reviews.llvm.org/D84086
2020-07-19 08:21:05 -07:00
Eric Christopher ae08dbc673 Temporarily Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks"
as it is failing the inline-replay.ll test as well as sanitizers/Werror
from returning a stack local variable.

This reverts commit 029946b112.
2020-07-17 14:58:01 -07:00