Commit Graph

17930 Commits

Author SHA1 Message Date
Alexey Bataev 568c874117 [SLP]Improve and simplify extendSchedulingRegion.
We do not need to scan further if the upper end or lower end of the
basic block is reached already and the instruction is not found. It
means that the instruction is definitely in the lower part of basic
block or in the upper block relatively.
This should improve compile time for the very big basic blocks.

Differential Revision: https://reviews.llvm.org/D99266
2021-03-25 05:31:58 -07:00
Sameer Sahasrabuddhe b92c8c22b9 [NewPM] Disable non-trivial loop-unswitch on targets with divergence
Unswitching a loop on a non-trivial divergent branch is expensive
since it serializes the execution of both version of the
loop. But identifying a divergent branch needs divergence analysis,
which is a function level analysis.

The legacy pass manager handles this dependency by isolating such a
loop transform and rerunning the required function analyses. This
functionality is currently missing in the new pass manager, and there
is no safe way for the SimpleLoopUnswitch pass to depend on
DivergenceAnalysis. So we conservatively assume that all non-trivial
branches are divergent if the target has divergence.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D98958
2021-03-25 11:27:10 +00:00
Sanjay Patel adf42dff42 [ValueTracking] peek through min/max to find isKnownToBeAPowerOfTwo
This is similar to the select logic just ahead of the new code.
Min/max choose exactly one value from the inputs, so if both of
those are a power-of-2, then the result must be a power-of-2.

This might help with D98152, but we likely still need other
pieces of the puzzle to avoid regressions.

The change in PatternMatch.h is needed to build with clang.
It's possible there is a better way to deal with the 'const'
incompatibities.

Differential Revision: https://reviews.llvm.org/D99276
2021-03-24 17:54:38 -04:00
Congzhe Cao 829c1b6443 [LoopInterchange] fix tightlyNested() in LoopInterchange legality
This is yet another attempt to fix tightlyNested().

Add checks in tightlyNested() for the inner loop exit block,
such that 1) if there is control-flow divergence in between the inner
loop exit block and the outer loop latch, or 2) if the inner loop exit
block contains unsafe instructions, tightlyNested() returns false.

The reasoning behind is that after interchange, the original inner loop
exit block, which was part of the outer loop, would be put into the new
inner loop, and will be executed different number of times before and
after interchange. Thus it should be dealt with appropriately.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D98263
2021-03-24 15:49:25 -04:00
Gulfem Savrun Yeniceri 5fbe1fdf17 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5fd001a5ff
because it broke clang-with-thin-lto-ubuntu bot.
2021-03-24 18:59:33 +00:00
Craig Topper 512bae81cc [RISCV] Add basic cost modelling for fixed vector gather/scatter.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99142
2021-03-24 11:14:14 -07:00
Craig Topper f24f09d256 [RISCV] Add TTI support for cpop with Zbb
This will tell loop idiom recognize that it can make popcount loops countable
using the ctpop intrinsic. I didn't bother checking for illegal types.
Type legalization knows how to split a ctpop into multiple ctops added together.
Assuming we only receive reasonable integer bit widths, a few cpop instructions
added together is probably better than the loop.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99203
2021-03-24 10:58:42 -07:00
Thomas Preud'homme c5d53efeff [test] Fix mix of variable use/def and regex match
LLVM test Transforms/GlobalSplit/basic.ll mixes variable definition and
variable use with regex matching of end of line. Mixing end of line
matching with variable definition will work but not record the end of
line in the string variable. Mixing end of line with variable use will
ignore end of line and cause an error once D98691 is landed.

This commit moves the end of line matching out of the string subtitution
blocks.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D98854
2021-03-24 17:58:16 +00:00
Gulfem Savrun Yeniceri 5fd001a5ff [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-24 17:31:18 +00:00
Thomas Preud'homme 3b52c04e82 Make FindAvailableLoadedValue TBAA aware
FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run
the alias analysis when searching for an equivalent value. However,
FindAvailablePtrLoadStore() calls the alias analysis framework with a
memory location for the load constructed from an address and a size,
which thus lacks TBAA metadata info. This commit modifies
FindAvailablePtrLoadStore() to accept an optional memory location as
parameter to allow FindAvailableLoadedValue() to create it based on the
load instruction, which would then have TBAA metadata info attached.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99206
2021-03-24 17:20:26 +00:00
Thomas Preud'homme 60e12a2279 [NFC][Loads] Add a testcase for TBAA aware FindAvailableLoadedValue
(D99206)
2021-03-24 17:03:19 +00:00
David Green 14b2ec934e [ARM] Enable UpperBound unrolling for all loops
This UpperBound unrolling was already enabled so long as a series of
conditions in ARMTTIImpl::getUnrollingPreferences pass. This just always
enables it as it can help fully unroll loops that would not otherwise
pass those tests.

Differential Revision: https://reviews.llvm.org/D99174
2021-03-24 16:39:21 +00:00
Sanjay Patel a8708708cf [InstSimplify] add tests for min/max intrinsic analysis; NFC 2021-03-24 12:21:59 -04:00
Sanjay Patel 92417ebbd1 [InstCombine] add tests for sub of umin; NFC
Potential missing fold noted in D98152
2021-03-24 10:54:03 -04:00
Joseph Huber 8140d0ec4a [OpenMP] Change OMPIRBuilder to append function attributes
Summary:
Currently the OMPIRBuilder overwrites the function's existing attributes
when it assigns the ones defined in OMPKinds.def. This changes the
behaviour to append the current function's attributes with them instead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D98740
2021-03-24 09:08:29 -04:00
Roman Lebedev 760f4c2069
[NFC][PhaseOrdering] Add a testcase for additional LICM before LoopRotate (D99249/D99204) 2021-03-24 13:24:09 +03:00
Ta-Wei Tu 4d9d736875 [NFC] Improve debug message and test description in 4c1f74a 2021-03-24 18:21:13 +08:00
Ta-Wei Tu 4c1f74a76c [LoopFlatten] Fix invalid assertion (PR49571)
The `InductionPHI` is not necessarily the increment instruction, as
demonstrated in pr49571.ll.
This patch removes the assertion and instead bails out from the
`LoopFlatten` pass if that happens.

This fixes https://bugs.llvm.org/show_bug.cgi?id=49571

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99252
2021-03-24 18:08:27 +08:00
Alexey Bataev 99203f2004 [Analysis]Add getPointersDiff function to improve compile time.
Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967
2021-03-23 14:25:36 -07:00
Alexey Bataev f1b47ad278 Revert "[Analysis]Add getPointersDiff function to improve compile time."
This reverts commit 065a14a12d to
investigate and fix crash in SLP vectorizer.
2021-03-23 13:17:54 -07:00
Alexey Bataev 065a14a12d [Analysis]Add getPointersDiff function to improve compile time.
Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967
2021-03-23 12:58:42 -07:00
Craig Topper 4c38c35c8d [ValueTracking] Teach canCreateUndefOrPoison that ctpop does not create undef or poison.
This select of ctpop with 0 pattern can get left behind after
loop idiom recognize converts a loop to ctpop. LLVM 10 was able
to optimize this, but LLVM 11 and later is not. The difference
seems to be that some select transforms are now limited based
on canCreateUndefOrPoison.

Teaching canCreateUndefOrPoison about ctpop restores the
LLVM 10 codegen.

Differential Revision: https://reviews.llvm.org/D99207
2021-03-23 12:42:18 -07:00
Sanjay Patel 9d45daf465 [PhaseOrdering] add AVX attribute to make test less fragile; NFC
This doesn't change anything currently, but as discussed in
D98981 and D98152, some tests may fail to vectorize because
the cost model becomes more accurate as we switch over to
using min/max intrinsics.
2021-03-23 11:34:33 -04:00
Florian Hahn 7fb6d9f958
[LV] Add 'fast' flag to test to make sure it will be vectorized.
This makes the test more robust with respect to when LV checks if the
floating point instructions in a loop can be vectorized.
2021-03-23 15:32:23 +00:00
Roman Lebedev b5822026dd
[SimplifyCFG] 'Fold branch to common dest': don't overestimate the cost
`FoldBranchToCommonDest()` has a certain budget (`-bonus-inst-threshold=`)
for bonus instruction duplication. And currently it calculates the cost
as-if it will actually duplicate into each predecessor.

But ignoring the budget, it won't always duplicate into each predecessor,
there are some correctness and profitability checks.
So when calculating the cost, we should first check into which blocks
will we *actually* duplicate, and only then use that block count
to do budgeting.
2021-03-23 18:30:26 +03:00
Roman Lebedev a866f72eb2
[NFC][SimplifyCFG] 'Fold branch to common dest': add test for cost overestimation
We should not count the cost of duplication into predecessors into which
we won't ultimately duplicate.
2021-03-23 18:30:26 +03:00
Roman Lebedev 514bc01ca3
[SimplifyCFG] FoldBranchToCommonDest(): properly handle same-block external uses (PR49510/PR49689)
We clone bonus instructions to the end of the predecessor block,
and then use `SSAUpdater::RewriteUseAfterInsertions()`.
But that only deals with the cases where the use-to-be-rewritten
are either in different block from the def, or come after the def.

But in some loop cases, the external use may be in the beginning of
predecessor block, before the newly cloned bonus instruction.
`SSAUpdater::RewriteUseAfterInsertions()` does not deal with that.
Notably, the external use can't happen to be both in the same block
and *after* the newly-cloned instruction, because of the fold preconditions.

To properly handle these cases, when the use is in the same block,
we should instead use `SSAUpdater::RewriteUse()`.
TBN, they do the same thing for PHI users.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49510
Likely Fixes https://bugs.llvm.org/show_bug.cgi?id=49689
2021-03-23 17:37:28 +03:00
Sanjay Patel 1bf8f9e228 [SimplifyCFG] use profile metadata to refine merging branch conditions
2nd try (original: 27ae17a6b0) with fix/test for crash. We must make
sure that TTI is available before trying to use it because it is not
required (might be another bug).

Original commit message:

This is one step towards solving:
https://llvm.org/PR49336

In that example, we disregard the recommended usage of builtin_expect,
so an expensive (unpredictable) branch is folded into another branch
that is guarding it.
Here, we read the profile metadata to see if the 1st (predecessor)
condition is likely to cause execution to bypass the 2nd (successor)
condition before merging conditions by using logic ops.

Differential Revision: https://reviews.llvm.org/D98898
2021-03-23 10:19:37 -04:00
Sanjay Patel 3c8473ba53 [SLP] allow matching integer min/max intrinsics as reduction ops
As noted in D98152, we need to patch SLP to avoid regressions when
we start canonicalizing to integer min/max intrinsics.
Most of the real work to make this possible was in:
7202f47508

Differential Revision: https://reviews.llvm.org/D98981
2021-03-23 08:56:44 -04:00
Nashe Mncube 5d929794a8 [llvm-opt] Bug fix within combining FP vectors
A bug was found within InstCombineCasts where a function call
is only implemented to work with FixedVectors. This caused a
crash when a ScalableVector was passed to this function.
This commit introduces a regression test which recreates the
failure and a bug fix.

Differential Revision: https://reviews.llvm.org/D98351
2021-03-23 12:13:41 +00:00
Florian Hahn e43e8e9138
[AnnotationRemarks] Use subprogram location for summary remarks.
The summary remarks are generated on a per-function basis. Using the
first instruction's location is sub-optimal for 2 reasons:
  1. Sometimes the first instruction is missing !dbg
  2. The location of the first instruction may be mis-leading.

Instead, just use the location of the function directly.
2021-03-23 12:05:41 +00:00
David Green 003fab9e8d [ARM] Additional Upper bound unrolling test. NFC 2021-03-23 12:00:40 +00:00
Florian Hahn 4ed0a5506a
[AnnotationRemarks] Add test for annotation remarks with dbg locations.
The test illustrates that we not pick the debug location from the
function directly. This will be fixed in a follow-up patch.
2021-03-23 11:52:27 +00:00
Stefan Gränitz 581adb4f1a Temporarily revert "[lli] Make -jit-kind=orc the default JIT engine"
This reverts commit eaee4f2696.
2021-03-23 12:01:30 +01:00
Florian Hahn f759d512c8
[VPlan] Include name when printing after 93a9d2de8f.
The name is included when printing in DOT mode. Also print it in non-DOT
mode after 93a9d2de8f.

This will become more important to distinguish different plans once
VPlans are gradually refined.
2021-03-23 09:50:14 +00:00
Stefan Gränitz eaee4f2696 [lli] Make -jit-kind=orc the default JIT engine
MCJIT served well as the default JIT engine in lli for a long time, but the code is getting old and maintenance efforts don't seem to be in sight. In the meantime Orc became mature enough to fill that gap. The newly added greddy mode is very similar to the execution model of MCJIT. It should work as a drop-in replacement for common JIT tasks.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D98931
2021-03-23 10:22:34 +01:00
Juneyoung Lee 960a767368 Reland "[InstCombine] Add simplification of two logical and/ors"
This relands 07c3b97e18 (D96945) which was reverted by
commit f49354838e.
The two-stage compilation successfully tests passes on my machine.
2021-03-23 16:24:50 +09:00
Serguei Katkov 9fec382601 [RS4GC] Fix hang on infinite loop
meetBDVState utility may sets the base pointer for the conflict state.
At this moment the base for conflict state does not have any meaning but
is used in comparison of BDV states. This comparison is used as an indicator
of progress done on iteration and RS4GC pass uses infinite loop to reach
fixed point.
As a result for added test on each iteration state for some phi nodes is updated
with other base value for conflict state and it indicates as a progress while
for conflict state there is no any progress more possible.
In reality the base value is transferred from one state to another and pass
detects the progress on these states.

The test is very fragile. The traversal order of states and operands of phi nodes
plays important role.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99058
2021-03-23 12:54:51 +07:00
Gulfem Savrun Yeniceri e3a6d70c68 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 78a65cd945 which
caused buildbot failures.
2021-03-23 00:43:16 +00:00
Juneyoung Lee 5c2e50b5d2 Reland "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe"
This relands commit 99108c791d (D95026) which was
reverted by 8d5a981a13 because the underlying
problem (https://llvm.org/pr49495) is fixed.
2021-03-23 09:19:53 +09:00
Gulfem Savrun Yeniceri 78a65cd945 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-22 22:09:02 +00:00
Roman Lebedev bde995c9c2
[NFC][SROA] Add some more tests for speculation around PHI's 2021-03-23 00:51:18 +03:00
Roman Lebedev 046bb8ea7c
[NFC][InstCombine] Autogenerate some checklines being affected by upcoming change 2021-03-23 00:51:18 +03:00
Sanjay Patel 95f7f7c21b Revert "[SimplifyCFG] use profile metadata to refine merging branch conditions"
This reverts commit 27ae17a6b0.
There are bot failures that end with:
 #4 0x00007fff7ae3c9b8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #5 0x00007fff84e504d8 (linux-vdso64.so.1+0x4d8)
 #6 0x00007fff7c419a5c llvm::TargetTransformInfo::getPredictableBranchThreshold() const (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMAnalysis.so.13git+0x479a5c)

...but not sure how to trigger that yet.
2021-03-22 17:48:06 -04:00
Nikita Popov 7e18cd887c [InstCombine] Whitelist non-refining folds in SimplifyWithOpReplaced
This is an alternative to D98391/D98585, playing things more
conservatively. If AllowRefinement == false, then we don't use
InstSimplify methods at all, and instead explicitly implement a
small number of non-refining folds. Most cases are handled by
constant folding, and I only had to add three folds to cover
our unit tests / test-suite. While this may lose some optimization
power, I think it is safer to approach from this direction, given
how many issues this code has already caused.

Differential Revision: https://reviews.llvm.org/D99027
2021-03-22 22:12:56 +01:00
Nikita Popov ca28e32359 [IR] Mark assume/annotation as InaccessibleMemOnly
These intrinsics don't need to be marked as arbitrary writing,
it's sufficient to write inaccessible memory (aka "side effect")
to preserve control dependencies. This means less special-casing
in BasicAA. This is intended as an alternative to D98925.

Differential Revision: https://reviews.llvm.org/D99022
2021-03-22 22:01:03 +01:00
Sanjay Patel 27ae17a6b0 [SimplifyCFG] use profile metadata to refine merging branch conditions
This is one step towards solving:
https://llvm.org/PR49336

In that example, we disregard the recommended usage of builtin_expect,
so an expensive (unpredictable) branch is folded into another branch
that is guarding it.
Here, we read the profile metadata to see if the 1st (predecessor)
condition is likely to cause execution to bypass the 2nd (successor)
condition before merging conditions by using logic ops.

Differential Revision: https://reviews.llvm.org/D98898
2021-03-22 16:49:21 -04:00
Sanjay Patel c21016715f [SimplifyCFG] adjust test branchweights; NFC
This will check the boundary conditions of the
revised change proposed in D98898.
2021-03-22 15:55:34 -04:00
Philip Reames 45940dbc0c Tweak a test so it actually gets autogened 2021-03-22 11:32:32 -07:00
Florian Hahn 42ec7a6f08
[VPlan] Add CHECK-LABEL to test/Transforms/LoopVectorize/vplan-printing.ll.
This patch adds CHECK-LABEL lines to
llvm/test/Transforms/LoopVectorize/vplan-printing.ll in order to make
failures slightly easier to diagnose.
2021-03-22 18:29:38 +00:00
Philip Reames f24175fcb9 Autogen some tests for ease of update 2021-03-22 11:06:29 -07:00
Philip Reames 854de7c4d0 [tests] Refresh a bunch of autogen test to adjust for format changes 2021-03-22 10:41:39 -07:00
Bjorn Pettersson 688cdddafb [SLP] Honor min/max regsize and min/max VF in vectorizeStores
Make sure we use PowerOf2Floor instead of PowerOf2Ceil when
calculating max number of elements that fits inside a vector
register (otherwise we could end up creating vectors larger
than the maximum vector register size).

Also make sure we honor the min/max VF (as given by TTI or
cmd line parameters) when doing vectorizeStores.

Reviewed By: anton-afanasyev

Differential Revision: https://reviews.llvm.org/D97691
2021-03-22 17:29:35 +01:00
Bjorn Pettersson 2f8f01dcb3 [SLP] Add test case showing shortcoming in honoring max reg size 2021-03-22 17:29:35 +01:00
Florian Hahn 46b055287b
[ConstraintElimination] Add gep tests without inbounds.
Add a set of interesting test cases for GEPs without inbounds for
upcoming patches.
2021-03-22 12:23:30 +00:00
Florian Hahn cb3b5f0770
[ConstraintElimination] Add multi-dimension GEP tests.
Add a set of interesting test cases with multi-dimensional GEPs for
upcoming patches.
2021-03-22 10:37:12 +00:00
Max Kazantsev 8fab9f824f [IndVars] Sharpen context in eliminateIVComparison
When eliminating comparisons, we can use common dominator of
all its users as context. This gives better results when ICMP is not
computed right before the branch that uses it.

Differential Revision: https://reviews.llvm.org/D98924
Reviewed By: lebedev.ri
2021-03-22 11:55:57 +07:00
Nikita Popov 9f864d2025 Reapply [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).

Adjust the type check to consider vectors as well. The previous
version of the patch dropped the type check entirely, but it
turns out that getAggregateElement() does require the constant
to be an aggregate in some edge cases: For Poison/Undef the
getNumElements() API is called, without checking in advance that
we're dealing with an aggregate. Possibly the implementation should
avoid doing that, but for now I'm adding an assert so the next
person doesn't fall into this trap.
2021-03-21 17:48:21 +01:00
Nikita Popov 59dbf4d516 [InstSimplify] Add load of undef aggregate test (NFC)
To make sure this doesn't crash the following commit.
2021-03-21 17:42:26 +01:00
Nikita Popov b32f5d5045 [InstSimplify] Regenerate test checks (NFC) 2021-03-21 17:41:21 +01:00
Nikita Popov ece1403aca [InstSimplify] Add additional select operand replacement tests (NFC)
This tests for binops with identity elements.
2021-03-21 15:30:30 +01:00
Jeroen Dobbelaere 77080a1eb6 Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types.
Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed.
(See D91250, D48541)

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D91661
2021-03-20 11:37:09 +01:00
Sanjay Patel 2fc47afed2 [SLP] remove unnecessary characters in test; NFC
Glitch that crept in with 62f9c3358b
2021-03-19 15:09:53 -04:00
Sanjay Patel 62f9c3358b [SLP] add tests for min/max reductions that use intrinsics; NFC 2021-03-19 15:06:16 -04:00
David Green a2e0312cda [ARM] Tone down the MVE scalarization overhead
The scalarization overhead was set deliberately high for MVE, whilst the
codegen was new. It helps protect us against the negative ramifications
of mixing scalar and vector instructions. This decreases that,
especially for floating point where the cost of extracting/inserting
lane elements can be low. For integer the cost is still fairly high due
to the cross-register-bank copy, but is no longer n^2 in the length of
the vector.

In general, this will decrease the cost of scalarizing floats and long
integer vectors. i64 increase in cost, having a high cost before and
after this patch. For floats this allows up to start doing things like
vectorizing fdiv instructions, even if they are scalarized.

Differential Revision: https://reviews.llvm.org/D98245
2021-03-19 18:30:11 +00:00
Andrei Elovikov 93a9d2de8f [VPlan] Add plain text (not DOT's digraph) dumps
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
   inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
   LIT test would become too obscure. I can imagine that we'd want to CHECK
   against VPlan dumps after multiple transformations instead. That would be
   easier with plain text dumps than with DOT format.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96628
2021-03-19 10:50:12 -07:00
Jeroen Dobbelaere 04790d9cfb Support intrinsic overloading on unnamed types
This patch adds support for intrinsic overloading on unnamed types.

This fixes PR38117 and PR48340 and will also be needed for the Full Restrict Patches (D68484).

The main problem is that the intrinsic overloading name mangling is using 's_s' for unnamed types.
This can result in identical intrinsic mangled names for different function prototypes.

This patch changes this by adding a '.XXXXX' to the intrinsic mangled name when at least one of the types is based on an unnamed type, ensuring that we get a unique name.

Implementation details:
- The mapping is created on demand and kept in Module.
- It also checks for existing clashes and recycles potentially existing prototypes and declarations.
- Because of extra data in Module, Intrinsic::getName needs an extra Module* argument and, for speed, an optional FunctionType* argument.
- I still kept the original two-argument 'Intrinsic::getName' around which keeps the original behavior (providing the base name).
-- Main reason is that I did not want to change the LLVMIntrinsicGetName version, as I don't know how acceptable such a change is
-- The current situation already has a limitation. So that should not get worse with this patch.
- Intrinsic::getDeclaration and the verifier are now using the new version.

Other notes:
- As far as I see, this should not suffer from stability issues. The count is only added for prototypes depending on at least one anonymous struct
- The initial count starts from 0 for each intrinsic mangled name.
- In case of name clashes, existing prototypes are remembered and reused when that makes sense.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D91250
2021-03-19 14:34:25 +01:00
Clement Courbet 926cca9679 [InstCombine] Add unit test with @llvm.annotation.
In preparation for https://reviews.llvm.org/D98925
2021-03-19 08:49:40 +01:00
Max Kazantsev a1d6c652e3 [Test] Precommit one more test 2021-03-19 14:26:03 +07:00
Max Kazantsev 4ee4f9bf4a [Test] Precommit test 2021-03-19 14:17:35 +07:00
Max Kazantsev 16370e02a7 [IndVars] Provide eliminateIVComparison with context
We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.

Observed ~0.5% negative compile time impact.

Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri
2021-03-19 12:28:22 +07:00
Max Kazantsev fff1363ba0 [SCEV] Add false->any implication
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.

In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.

Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
2021-03-19 11:29:48 +07:00
Sanjay Patel 92068d6c31 [SimplifyCFG] add tests for branch cond merging with prof metadata; NFC
See PR49336.
2021-03-18 15:39:50 -04:00
Mehdi Amini 3614df3537 Revert "[VPlan] Add plain text (not DOT's digraph) dumps"
This reverts commit 6b053c9867.
The build is broken:

ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const
>>> referenced by LoopVectorize.cpp
>>>               LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a
2021-03-18 19:20:39 +00:00
Andrei Elovikov 6b053c9867 [VPlan] Add plain text (not DOT's digraph) dumps
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
   inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
   LIT test would become too obscure. I can imagine that we'd want to CHECK
   against VPlan dumps after multiple transformations instead. That would be
   easier with plain text dumps than with DOT format.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96628
2021-03-18 11:33:39 -07:00
Wei Mi 14756b70ee [SampleFDO] Don't mix up the existing indirect call value profile with the new
value profile annotated after inlining.

In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we
use the magic number -1 in the value profile to avoid repeated indirect call
promotion to the same target for an indirect call. Function updateIDTMetaData
is used to mark an target as being promoted in the value profile with the
magic number. updateIDTMetaData is also used to update the value profile
when an indirect call is inlined and new inline instance profile should be
applied. For the second case, currently updateIDTMetaData mixes up the
existing value profile of the indirect call with the new profile, leading
to the problematic senario that a target count is larger than the total count
in the value profile.

The patch fixes the problem. When updateIDTMetaData is used to update the
value profile after inlining, all the values in the existing value profile
will be dropped except the values with the magic number counts.

Differential Revision: https://reviews.llvm.org/D98835
2021-03-18 09:54:34 -07:00
Alexey Bataev b3ced9852c [SLP]Fix crash on extending scheduling region.
If SLP vectorizer tries to extend the scheduling region and runs out of
the budget too early, but still extends the region to the new ending
instructions (i.e., it was able to extend the region for the first
instruction in the bundle, but not for the second), the compiler need to
recalculate dependecies in full, just like if the extending was
successfull. Without it, the schedule data chunks may end up with the
wrong number of (unscheduled) dependecies and it may end up with the
incorrect function, where the vectorized instruction does not dominate
on the extractelement instruction.

Differential Revision: https://reviews.llvm.org/D98531
2021-03-18 06:11:08 -07:00
Sanjay Patel c8893f3b78 [LoopVectorize] relax FMF constraint for FP induction
This makes the induction part of the loop vectorizer match the reduction part.
We do not need all of the fast-math-flags. For example, there are some that
clearly are not in play like arcp or afn.

If we want to make FMF constraints consistent across the IR optimizer, we
might want to add nsz too, but that's up for debate (users can't expect
associative FP math and preservation of sign-of-zero at the same time?).

The calling code was fixed to avoid miscompiles with:
1bee549737

Differential Revision: https://reviews.llvm.org/D98708
2021-03-18 08:11:22 -04:00
Philip Reames 7c7f4676cd [LICM] Fix a crash when sinking instructions w/token operands
It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not.

This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well.

Differential Revision: https://reviews.llvm.org/D98728
2021-03-17 11:18:46 -07:00
LemonBoy 4f024938e4 [LoopVectorize] Refine hasIrregularType predicate
The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector.
The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32.

The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits).

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D97465
2021-03-17 17:03:47 +01:00
David Green 3c25c40d51 [LV] Account for the cost of predication of scalarized load/store
This adds the cost of an i1 extract and a branch to the cost in
getMemInstScalarizationCost when the instruction is predicated. These
predicated loads/store would generate blocks of something like:

    %c1 = extractelement <4 x i1> %C, i32 1
    br i1 %c1, label %if, label %else
  if:
    %sa = extractelement <4 x i32> %a, i32 1
    %sb = getelementptr inbounds float, float* %pg, i32 %sa
    %sv = extractelement <4 x float> %x, i32 1
    store float %sa, float* %sb, align 4
  else:

So this increases the cost by the extract and branch. This is probably
still too low in many cases due to the cost of all that branching, but
there is already an existing hack increasing the cost using
useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is
a load or there are more than one store. This patch improves the cost
for when there is only a single store, and hopefully at some point in
the future the hack can be removed.

Differential Revision: https://reviews.llvm.org/D98243
2021-03-17 10:57:50 +00:00
Bu Le 9abe500473 [SLP] Fix the trunc instruction insertion problem
Current SLP pass has this piece of code that inserts a trunc instruction
after the vectorized instruction. In the case that the vectorized instruction
is a phi node and not the last phi node in the BB, the trunc instruction
will be inserted between two phi nodes, which will trigger verify problem
in debug version or unpredictable error in another pass.
This patch changes the algorithm to 'if the last vectorized instruction
is a phi, insert it after the last phi node in current BB' to fix this problem.
2021-03-17 13:51:08 +03:00
Bu Le dd90c36d60 [SLP][Test] Precommit test for D98423 2021-03-17 12:11:50 +03:00
Max Kazantsev a6074b092c [BasicAA] Drop dependency on Loop Info. PR43276
BasicAA stores a reference to LoopInfo inside. This imposes an implicit
requirement of keeping it up to date whenever we modify the IR (in particular,
whenever we modify terminators of blocks that belong to loops). Failing
to do so leads to incorrect state of the LoopInfo.

Because general AA does not require loop info updates and provides to API to
update it properly, the users of AA reasonably assume that there is no need to
update the loop info. It may be a reason of bugs, as example in PR43276 shows.

This patch drops dependence of BasicAA on LoopInfo to avoid this problem.

This may potentially pessimize the result of queries to BasicAA.

Differential Revision: https://reviews.llvm.org/D98627
Reviewed By: nikic
2021-03-17 11:43:44 +07:00
Zequan Wu cbd7eabea8 Revert "[ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()"
That commit caused chromium build to crash: https://bugs.chromium.org/p/chromium/issues/detail?id=1188885

This reverts commit edf7004851.
2021-03-16 14:36:21 -07:00
Mohammad Hadi Jooybar 302b80abf0 [InstCombine] Avoid Bitcast-GEP fusion for pointers directly from allocation functions
Elimination of bitcasts with void pointer arguments results in GEPs with pure byte indexes. These GEPs do not preserve struct/array information and interrupt phi address translation in later pipeline stages.

Here is the original motivation for this patch:

```
#include<stdio.h>
#include<malloc.h>

typedef struct __Node{

  double f;
  struct __Node *next;

} Node;

void foo () {
  Node *a = (Node*) malloc (sizeof(Node));
  a->next = NULL;
  a->f = 11.5f;

  Node *ptr = a;
  double sum = 0.0f;
  while (ptr) {
    sum += ptr->f;
    ptr = ptr->next;
  }
  printf("%f\n", sum);
}
```
By explicit assignment  `a->next = NULL`, we can infer the length of the link list is `1`. In this case we can eliminate while loop traversal entirely. This elimination is supposed to be performed by GVN/MemoryDependencyAnalysis/PhiTranslation .

The final IR before this patch:

```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2
  %f = bitcast i8* %call to double*
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.lr.ph

while.body.lr.ph:                                 ; preds = %entry
  %1 = bitcast i8* %call to %struct.__Node*
  br label %while.body

while.body:                                       ; preds = %while.body.lr.ph, %while.body
  %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ]
  %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %2 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %2
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %3 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %3, null
  br i1 %tobool, label %while.end, label %while.body

while.end:                                        ; preds = %while.body, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %while.body ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```

Final IR after this patch:
```
; Function Attrs: nofree nounwind
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
while.end:
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.150000e+01)
  ret void
}
```

IR before GVN before this patch:
```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2
  %f = bitcast i8* %call to double*
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.lr.ph

while.body.lr.ph:                                 ; preds = %entry
  %1 = bitcast i8* %call to %struct.__Node*
  br label %while.body

while.body:                                       ; preds = %while.body.lr.ph, %while.body
  %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ]
  %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %2 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %2
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %3 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %3, null
  br i1 %tobool, label %while.end.loopexit, label %while.body

while.end.loopexit:                               ; preds = %while.body
  %add.lcssa = phi double [ %add, %while.body ]
  br label %while.end

while.end:                                        ; preds = %while.end.loopexit, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```
IR before GVN after this patch:
```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %0 = bitcast i8* %call to %struct.__Node*
  %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1
  store %struct.__Node* null, %struct.__Node** %next, align 8, !tbaa !2
  %f = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 0
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.preheader

while.body.preheader:                             ; preds = %entry
  br label %while.body

while.body:                                       ; preds = %while.body.preheader, %while.body
  %sum.014 = phi double [ %add, %while.body ], [ 0.000000e+00, %while.body.preheader ]
  %ptr.013 = phi %struct.__Node* [ %2, %while.body ], [ %0, %while.body.preheader ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %1 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %1
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %2 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %2, null
  br i1 %tobool, label %while.end.loopexit, label %while.body

while.end.loopexit:                               ; preds = %while.body
  %add.lcssa = phi double [ %add, %while.body ]
  br label %while.end

while.end:                                        ; preds = %while.end.loopexit, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```

The phi translation fails before this patch and it prevents GVN to remove the loop. The reason for this failure is in InstCombine. When the Instruction combining pass decides to convert:
```
 %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16)
  %0 = bitcast i8* %call to %struct.__Node*
  %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1
  store %struct.__Node* null, %struct.__Node** %next
```
to
```
%call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16)
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0

```

GEP instructions with pure byte indexes (e.g. `getelementptr inbounds i8, i8* %call, i64 8`) are obstacles for address translation. address translation is looking for structural similarity between GEPs and these GEPs usually do not match since they have different structure.

This change will cause couple of failures in LLVM-tests. However, in all cases we need to change expected result by the test. I will update those tests as soon as I get green light on this patch.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96881
2021-03-16 17:05:44 -04:00
Philip Reames ef884e155d [rs4gc] don't force a conflict for a canonical broadcast
A broadcast is a shufflevector where only one input is used. Because of the way we handle constants (undef is a constant), the canonical shuffle sees a meet of (some value) and (nullptr). Given this, every broadcast gets treated as a conflict and a new base pointer computation is added.

The other way to tackle this would be to change constant handling specifically for undefs, but this seems easier.

Differential Revision: https://reviews.llvm.org/D98315
2021-03-16 12:59:06 -07:00
Philip Reames 5cabf472cb [rs4gc] don't duplicate existing values which are provably base pointers
RS4GC needs to rewrite the IR to ensure that every relocated pointer has an associated base pointer. The existing code isn't particularly smart about avoiding duplication of existing IR when it turns out the original pointer we were asked to materialize a base pointer for is itself a base pointer.

This patch adds a stage to the algorithm which prunes nodes proven (with a simple forward dataflow fixed point) to be base pointers from the list of nodes considered for duplication. This does require changing some of the later invariants slightly, that's probably the riskiest part of the change.

Differential Revision: D98122
2021-03-16 12:51:21 -07:00
Philip Reames 6972e39d47 [gvn] CSE gc.relocates based on meaning, not spelling (try 2)
This was (partially) reverted in cfe8f8e0 because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems.  This change has been reworked to not need that change (via some explicit checks in client code).  This is being done to address the original optimization issue and simplify the testing of the readonly changes.  I'm working on that piece under 49607.

Original commit message follows:

The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)

We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.

Differential Revision: https://reviews.llvm.org/D97974
2021-03-16 10:59:31 -07:00
Adrian Prantl c3a18bb1e8 Support !heapallocsite attachments in StripDebugInfo().
They point into the DIType type system, so they need to be stripped as well.

rdar://75341300

Differential Revision: https://reviews.llvm.org/D98668
2021-03-16 10:05:13 -07:00
Adrian Prantl b04c87e053 Support !heapallocsite attachments in stripNonLineTableDebugInfo().
They point into the DIType type system, so they need to be stripped as well.

rdar://75341300

Differential Revision: https://reviews.llvm.org/D98667
2021-03-16 10:05:12 -07:00
Simonas Kazlauskas 6513995be3 [InstSimplify] Restrict a GEP transform to avoid provenance changes
This is a follow-up to D98588, and fixes the inline `FIXME` about a GEP-related simplification not
preserving the provenance.

https://alive2.llvm.org/ce/z/qbQoAY

Additional tests were added in {rGf125f28afdb59eba29d2491dac0dfc0a7bf1b60b}

Depends on D98672

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98611
2021-03-16 18:53:05 +02:00
Sanjay Patel d2eae990a1 [LoopVectorize] add FP induction test with minimal FMF; NFC 2021-03-16 12:05:34 -04:00
Max Kazantsev b044f76bc8 [Test] Add test with loops guarded by trivial conditions 2021-03-16 19:46:36 +07:00
Max Kazantsev 534a1f4b05 [Test] Update auto-generated checks 2021-03-16 19:39:45 +07:00
Caroline Concatto 3c03635d53 [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse
This patch adds support for reverse loop vectorization.
It is possible to vectorize the following loop:
```
  for (int i = n-1; i >= 0; --i)
    a[i] = b[i] + 1.0;
```
with fixed or scalable vector.
The loop-vectorizer will use 'reverse' on the loads/stores to make
sure the lanes themselves are also handled in the right order.
This patch adds support for scalable vector on IRBuilder interface to
create a reverse vector. The IR function
CreateVectorReverse lowers to experimental.vector.reverse for scalable vector
and keedp the original behavior for fixed vector using shuffle reverse.

Differential Revision: https://reviews.llvm.org/D95363
2021-03-16 07:51:59 +00:00
Johannes Doerfert f40a2c3bef [NVPTX] CUDA does provide malloc/free since compute capability 2.X
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D98606
2021-03-15 22:45:56 -05:00
Sanjay Patel b1b07dd071 [SLP] update stale test comments; NFC
These bugs were fixed with 0a8e7ca402
2021-03-15 16:02:46 -04:00
Wenlei He a5d30421a6 [CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.

For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:

 - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
 - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
 - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
 - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.

Differential Revision: https://reviews.llvm.org/D98590
2021-03-15 12:22:15 -07:00
Juneyoung Lee edf634ebc2 [AssumeBundles] Add nonnull/align to op bundle if noundef exists
This is a patch to add nonnull and align to assume's operand bundle
only if noundef exists.
Since nonnull and align in fn attr have poison semantics, they should be
paired with noundef or noundef-implying attributes to be immediate UB.

Reviewed By: jdoerfert, Tyker

Differential Revision: https://reviews.llvm.org/D98228
2021-03-16 10:23:42 +09:00