Commit Graph

31124 Commits

Author SHA1 Message Date
Kazu Hirata 41ae78ea3a Use has_value instead of hasValue (NFC) 2022-07-19 20:15:44 -07:00
Johannes Doerfert f84712f0b8 [Attributor] Teach checkForAllUses to follow returns into callers
If we can determine all call sites we can follow a use in a return
instruction into the caller. AAPointerInfo utilizes this feature.
2022-07-19 18:17:40 -05:00
Johannes Doerfert 4f2ccdd0b1 [Attributor][NFC] Improve debug messages 2022-07-19 18:17:40 -05:00
Nick Desaulniers 1cf6b93df1 Revert "[Local] Allow creating callbr with duplicate successors"
This reverts commit 08860f525a.

Crashes during PPC64LE linux kernel builds as reported by @nathanchance.
https://reviews.llvm.org/D129997#3663632
2022-07-19 15:03:27 -07:00
Johannes Doerfert bf789b1957 [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.

Fixes: https://github.com/llvm/llvm-project/issues/54981

Note: A previous version was flawed and consequently reverted in
      6555558a80.
2022-07-19 16:24:42 -05:00
Arthur Eubanks 13aa2c1c3b [DSE] Revisit pointers that may no longer escape after removing another store
In dependent-capture, previously we'd see that %tmp4 is captured due to
the first store. We'd cache this info in CapturedBeforeReturn and
InvisibleToCallerAfterRet. Then the first store is then removed, causing
the cached values to be wrong.

We also need to revisit everything because normally we work backwards
when removing stores at the end of the function, but in this case
removing an earlier store causes a later store to be removable.

No compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=56796ae1a8db4c85dada28676f8303a5a3609c63&to=21b7e5248ffc423cd36c9d4a020085e363451465&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D123686
2022-07-19 09:30:34 -07:00
Sanjay Patel 3d6c10dcf3 [SimplifyLibCalls] avoid converting pow() to powi() with no FMF
powi() is not a standard math library function; it is specified
with non-strict semantics in the LangRef. We currently require
'afn' to do this transform when it needs a sqrt(), so I just
extended that requirement to the whole-number exponent too.

This bug was introduced with:
b17754bcaa
...where we deferred expansion of pow() to later passes.
2022-07-19 12:26:53 -04:00
Arnold Schwaighofer bc4870f09e [coro async] Add missing llvm.coro.id.async intrinsic to declaresCoroCleanupIntrinsics
rdar://97214593

Differential Revision: https://reviews.llvm.org/D130038
2022-07-19 07:25:04 -07:00
Andrew Turner b850762b62 Add the FreeBSD AArch64 memory layout
Use the FreeBSD AArch64 memory layout values when building for it.
These are based on the x86_64 values, scaled to take into account the
larger address space on AArch64.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125883
2022-07-19 09:58:07 -04:00
Andrew Turner e13bd2644e Add the FreeBSD AArch64 shadow offset to llvm
AArch64 has a larger address space than 64 but x86. Use the larger
shadow offset on FreeBSD AArch64.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125873
2022-07-19 09:58:07 -04:00
William Schmidt bccc9aa81c Don't vectorize PHIs in catchswitch blocks
We currently assert in vectorizeTree(TreeEntry*) when processing a PHI
bundle in a block containing a catchswitch.  We attempt to set the
IRBuilder insertion point following the catchswitch, which is invalid.
This is done so that ShuffleBuilder.finalize() knows where to insert
a shuffle if one is needed.

To avoid this occurring, watch out for catchswitch blocks during
buildTree_rec() processing, and avoid adding PHIs in such blocks to
the vectorizable tree.  It is unlikely that constraining vectorization
over an exception path will cause a noticeable performance loss, so
this seems preferable to trying to anticipate when a shuffle will and
will not be required.
2022-07-19 06:10:17 -07:00
Nikita Popov 08860f525a [Local] Allow creating callbr with duplicate successors
Since D129288, callbr is allowed to have duplicate successors. This
patch removes a limitation which prevents optimizations from actually
producing such callbrs.

Differential Revision: https://reviews.llvm.org/D129997
2022-07-19 14:28:22 +02:00
Florian Hahn a75760a269
[LV] Remove unnecessary cast in widenCallInstruction. (NFC) 2022-07-19 11:23:24 +01:00
Max Kazantsev 82309831c3 [LoopSimplifyCFG] Prevent use-def dominance breach by handling dead exits. PR56243
One of the transforms in LoopSimplifyCFG demands that the LCSSA form is
truly maintained for all values, tokens included, otherwise it may end up creating
a use that is not dominated by def (and Phi creation for tokens is impossible).
Detect this situation and prevent transform for it early.

Differential Revision: https://reviews.llvm.org/D129984
Reviewed By: efriedma
2022-07-19 15:54:12 +07:00
Ellis Hoag 3580daacf3 [InstrProf] Allow CSIRPGO function entry coverage
The flag `-fcs-profile-generate` for enabling CSIRPGO moves the pass
`pgo-instrumentation` after inlining. Function entry coverage works fine
with this change, so remove the assert. I had originally left this
assert in because I had not tested this at the time.

Reviewed By: davidxl, MaskRay

Differential Revision: https://reviews.llvm.org/D129407
2022-07-18 15:10:11 -07:00
Florian Hahn 30e53b8c03
[LV] Sink module variable and use State to set it in widenCall. (NFC)
Limits the lifetime of the variable and makes it independent of
CallInst.
2022-07-18 19:41:48 +01:00
Arnold Schwaighofer 28ebd13d63 [coro async] Fix code to run coro.async.end cleanup like the legacy pass did
The code executed for the Switch ABI does not change.

rdar://97074714

Differential Revision: https://reviews.llvm.org/D129865
2022-07-18 10:41:29 -07:00
Nicolai Hähnle 3443788087 Revert "Inliner: don't mark call sites as 'nounwind' if that would be redundant"
This reverts commit 9905c37981.

Looks like there are Clang changes that are affected in trivial ways. Will look into it.
2022-07-18 17:43:35 +02:00
Nicolai Hähnle 9905c37981 Inliner: don't mark call sites as 'nounwind' if that would be redundant
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.

If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.

v2:
- just check CI->doesNotThrow

Differential Revision: https://reviews.llvm.org/D129860
2022-07-18 17:28:52 +02:00
Sanjay Patel 26fbb79c33 [InstCombine] reduce code for signbit folds; NFC 2022-07-18 11:04:58 -04:00
Nikita Popov 21e2f133a8 [LoopSimplifyCFG] Revert accidental change
This change was included in an unrelated change
b57d61384c
and was of course not intended for commit...
2022-07-18 15:30:13 +02:00
Nikita Popov b57d61384c [ConstantRangeTest] Move nowrap binop tests to generic infrastructure (NFC)
Move testing for add/sub with nowrap flags to TestBinaryOpExhaustive,
rather than separate homegrown exhaustive testing functions.
2022-07-18 15:14:17 +02:00
Kristina Bessonova 44736c1d49 [CloneFunction][DebugInfo] Avoid cloning DILexicalBlocks of inlined subprograms
If DISubpogram was not cloned (e.g. we are cloning a function that has other
functions inlined into it, and subprograms of the inlined functions are
not supposed to be cloned), it doesn't make sense to clone its DILexicalBlocks
as well. Otherwise we'll get duplicated DILexicalBlocks that may confuse
debug info emission in AsmPrinter.

I believe it also makes no sense cloning any DILocalVariables or maybe
other local entities, if their parent subprogram was not cloned, cause
they will be dangling and will not participate in futher emission.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D127102
2022-07-18 13:14:52 +02:00
Nikita Popov 8201e3ef5c [BasicBlockUtils] Don't drop callbr with unique successor
As callbr is now allowed to have duplicate destinations, we can
have a callbr with a unique successor. Make sure it doesn't get
dropped, as we still need to preserve the side-effect.
2022-07-18 12:26:29 +02:00
Nikita Popov 4fba35f973 [InstCombine] Clarify invoke/callbr handling in constexpr call fold (NFCI)
We only need to check the block for the normal/default destination,
not for other destinations. Using the value in those would be
illegal anyway.

The callbr case cannot actually happen here, because callbr is
currently limited to inline asm. Retaining it to match the spirit
of the original code.
2022-07-18 12:02:46 +02:00
Florian Hahn 105032f549
[LV] Use PHI recipe instead of PredRecipe for subsequent uses.
At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of
the predicate recipe. This incorrectly models the def-use chains, as all
later uses should use the phi recipe. Fix that by delaying recording of
the recipe.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D129436
2022-07-18 09:35:34 +01:00
Nikita Popov 11079e8820 [IR] Don't treat callbr as indirect terminator
Callbr is no longer an indirect terminator in the sense that is
relevant here (that it's successors cannot be updated). The primary
effect of this change is that callbr no longer prevents formation
of loop simplify form.

I decided to drop the isIndirectTerminator() method entirely and
replace it with isa<IndirectBrInst>() checks. I assume this method
was added to abstract over indirectbr and callbr, but it never
really caught on, and there is nothing left to abstract anymore
at this point.

Differential Revision: https://reviews.llvm.org/D129849
2022-07-18 09:32:08 +02:00
Fangrui Song 0e3447bf8a [LegacyPM] Remove WholeProgramDevirt
Unused after LTO removal from legacy optimization passline.
2022-07-17 23:14:53 -07:00
Fangrui Song 1f90cc589e [LegacyPM] Remove FunctionImportLegacyPass
Unused after ThinLTO was removed from legacy optimization pipeline.
2022-07-17 23:06:46 -07:00
Kazu Hirata 7094ab4ee7 [llvm] Modernize bool literals (NFC)
Identified with modernize-use-bool-literals.
2022-07-17 18:08:51 -07:00
Kazu Hirata 3112987d5c Remove unused forward declarations (NFC) 2022-07-17 15:37:48 -07:00
Kazu Hirata 8b3ed1fa98 Remove redundant return statements (NFC)
Identified with readability-redundant-control-flow.
2022-07-17 15:37:46 -07:00
Fangrui Song bbaa015e82 [LegacyPM] Remove LowerTypeTestsPass
Unused after LTO removal from optimization passline.
2022-07-17 15:06:38 -07:00
Fangrui Song a6942256ca [LegacyPM] Remove NameAnonGlobalLegacyPass
Unused after LTO removal from optimization passline.
2022-07-17 14:38:29 -07:00
Fangrui Song d74b88c69d [LegacyPM] Remove CanonicalizeAliasesLegacyPass
Unused after LTO removal from optimization passline.
2022-07-17 14:30:22 -07:00
Fangrui Song 70519a1fba [LegacyPM] Remove LTO passes from optimization pipeline
Following recent changes removing non-core features of the legacy
PM/optimization pipeline.
2022-07-17 14:24:36 -07:00
Fangrui Song f502115561 [LegacyPM] Remove PGO options from PassManagerBuilder
They have been dead since legacy PGO/SamplePGO passes were removed.
2022-07-17 14:03:23 -07:00
Fangrui Song dd5e3f0e27 [LegacyPM] Remove SampleProfileLoaderLegacyPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline (e.g. PGO), remove SamplePGO.
2022-07-17 12:09:46 -07:00
Florian Hahn cc0ee17951
[LV] Move VPPredInstPHIRecipe::execute to VPlanRecipes.cpp (NFC) 2022-07-17 11:34:23 +01:00
zhongyunde 3a6b766b1b [IndVars] Directly use unsigned integer induction for FPToUI/FPToSI of float induction
Depend on D129358

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129756
2022-07-17 10:48:35 +08:00
Florian Hahn 6813b41d57
[LV] Avoid creating new run-time VF expression for each runtime checks.
At the moment, the cost of runtime checks for scalable vectors is
overestimated due to creating separate vscale * VF expressions for each
check. Instead re-use the first expression.
2022-07-16 17:24:07 +01:00
David Green 4b7913c357 [VectorCombine] Only consider shuffle uses with the same type.
The backend getShuffleCosts do not currently handle shuffles that change
size very well. Limit the shuffles we collect to the same type to make
sure they do not cause issues as reported in D128732.
2022-07-16 13:23:39 +01:00
Fangrui Song f9d6f37201 [LegacyPM] Remove ControlHeightReductionLegacyPass
This pass tries to reduce the number of conditional branches in the hot path
based on profile. It's mostly a no-op after legacy PGO passes are moved.
2022-07-16 01:35:56 -07:00
Fangrui Song 3a42c499c2 [LegacyPM] Remove createInstrProfilingLegacyPass
Follow the steps of removing non-core instrumentation passes like PGO.
2022-07-16 01:26:40 -07:00
Fangrui Song 685775bbab [LegacyPM] Remove CGProfileLegacyPass
It's mostly a no-op after I removed legacy PGO passes in D123834.
2022-07-16 00:39:56 -07:00
Fangrui Song df8f5be596 [LegacyPM] Remove ModuleSanitizerCoverageLegacyPass
Follow the steps of various other legacy instrumentation passes removed for
15.0.0.
2022-07-15 19:01:20 -07:00
Rong Xu 5e0443292b [PGO] Report number of counts being dropped when a hash-mismatch happens
This patch reports number of counts being dropped when a hash-mismatch
happens. This information will be helpful to the users -- if the dropped
counts are large, the user should redo the instrumentation build and
recollect the profile.

Differential Revision: https://reviews.llvm.org/D129001
2022-07-15 14:53:59 -07:00
Rong Xu 19ac75364f [PGO] Improve hash-mismatch warning message
This patch improves FDO hash-mismatch handling:
(1) filter out warnings to weak functions.
Weak functions definition will be overridden by a strong definition by linker.
The hash mismatch in profile use compilation is expected.
Make the profile hash mismatch warning under the existing option (default true).

(2) add an option to trace the hash of functions with the specific string.
Note that an empty string parameter will trace all functions.

Differential Revision: https://reviews.llvm.org/D129002
2022-07-15 13:44:55 -07:00
Philip Reames 6ab686eb86 [LSR] Allow already invariant operand for ICmpZero matching [try 2]
Changes since initial commit:

* Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in d767b392.
* isLoopInvariant returns true for instructions outside the loop, but not necessarily *above* the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in 2aed3cdb.

Original commit message follows...

The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.

As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?

Differential Revision: https://reviews.llvm.org/D129793
2022-07-15 13:29:43 -07:00
Warren Ristow c650793049 [Reassociate] Enable FP reassociation via 'reassoc' and 'nsz'
Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as
expected, and that enables FP reassociation. Only the two FMF flags
'reassoc' and 'nsz' are technically required to perform reassociation,
but disabling other unrelated FMF bits is needlessly suppressing the
optimization.

This patch fixes that needless suppression, and makes appropriate
adjustments to test-cases, fixing some outstanding TODOs in the process.

Fixes: #56483

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129523
2022-07-15 11:44:35 -07:00
Philip Reames 6fe766beba Revert "[LSR] Allow already invariant operand for ICmpZero matching"
This reverts commit 9153515a7b.  Builtbot crash was reported in the commit thread, reverting while investigating.
2022-07-15 10:47:57 -07:00
Florian Hahn aa00fb02c9
[LV] Use umax(VF * UF, MinProfTC) for scalable vectors.
For scalable vectors, it is not sufficient to only check
MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because
this property may not holder for larger values of vscale. In those
cases, compute umax(VF * UF, MinProfTC) instead.

This should fix
https://lab.llvm.org/buildbot/#/builders/197/builds/2262
2022-07-15 10:23:14 -07:00
Philip Reames 9153515a7b [LSR] Allow already invariant operand for ICmpZero matching
The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.

As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?

Differential Revision: https://reviews.llvm.org/D129793
2022-07-15 09:51:00 -07:00
Nikita Popov 8a519b3c21 [InstCombine] Ensure constant folding in binop of select fold
When folding a binop into a select, we need to ensure that one
of the select arms actually does constant fold, otherwise we'll
create two binop instructions and perform the reverse transform.

Ensure this by performing an explicit constant folding attempt,
and failing the transform if neither side simplifies.

A simple alternative here would have been to limit the fold to
ImmConstants, but given the current representation of scalable
vector splats, this wouldn't be ideal.
2022-07-15 11:03:10 +02:00
Mel Chen bd404fbcc8 [LV][NFC] Fix the condition for printing debug messages
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D128523
2022-07-15 01:47:33 -07:00
Nikita Popov f75ccadcdd [LSR] Create SCEVExpander earlier, use member isSafeToExpand() (NFC)
This is a followup to D129630, which switches LSR to the member
isSafeToExpand() variant, and removes the freestanding function.

This is done by creating the SCEVExpander early (already during the
analysis phase). Because the SCEVExpander is now available for the
whole lifetime of LSRInstance, I've also made it into a member
variable, rather than passing it around in even more places.

Differential Revision: https://reviews.llvm.org/D129769
2022-07-15 09:41:23 +02:00
Craig Topper 0e718443c7 [SimplifyIndVar] Use enum class for ExtendKind. NFC
I happened to notice a two places where the enum was being pass
directly to the bool IsSigned argument of createExtendInst. This
was functionally ok since SignExtended in the enum has value
of 1, but the code shouldn't rely on that.

Using an enum class prevents the enum from being convertible to bool,
but does make writing the enum values more verbose. Since we now
have to write ExtendKind:: in front of them, I've shortened the
names of ZeroExtended and SignExtended.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129733
2022-07-14 10:03:58 -07:00
Philip Reames 3bc09c7da5 [SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case
Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all.

SCEVExpander needs(*) to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values.

vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes.

(*) As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference.

Differential Revision: https://reviews.llvm.org/D129710
2022-07-14 08:56:58 -07:00
Brendon Cahoon 58fec78231 Revert "[UnifyLoopExits] Reduce number of guard blocks"
This reverts commit e13248ab0e.

Need to revert because the transformation cannot occur for basic
blocks that contain convergent instructions.
2022-07-14 10:33:52 -05:00
Warren Ristow 230c8c56f2 [Reassociate] Cleanup minor missed optimizations
In analyzing issue #56483, it was noticed that running `opt` with
`-reassociate` was missing some minor optimizations. For example,
there were cases where the running `opt` on IR with floating-point
instructions that have the `fast` flags applied, sometimes resulted in
less efficient code than the input IR (things like dead instructions
left behind, and missed reassociations). These were sometimes noted
in the test-files with TODOs, to investigate further. This commit
fixes some of these problems, removing some TODOs in the process.

FTR, I refer to these as "minor" missed optimizations, because when
running a full clang/llvm compilation, these inefficiencies are not
happening, as other passes clean that residue up. Regardless, having
cleaner IR produced by `opt`, makes assessing the quality of fixes done
in `opt` easier.
2022-07-14 08:21:04 -07:00
Brendon Cahoon c945d88d2b Revert "[StructurizeCFG] Improve basic block ordering"
This reverts commit f1b05a0a2b.

Need to revert to due to issues identified with testing. The
transformation is incorrect for blocks that contain convergent
instructions.
2022-07-14 09:40:51 -05:00
Nikita Popov 9e6e631b38 [LoopPredication] Use isSafeToExpandAt() member function (NFC)
As a followup to D129630, this switches a usage of the freestanding
function in LoopPredication to use the member variant instead. This
was the last use of the freestanding function, so drop it entirely.
2022-07-14 14:49:07 +02:00
Nikita Popov dcf4b733ef [SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506)
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.

Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.

Fixes https://github.com/llvm/llvm-project/issues/50506.

Differential Revision: https://reviews.llvm.org/D129630
2022-07-14 14:41:51 +02:00
zhongyunde fc6092fd4d [IndVars] Eliminate redundant type cast between unsigned integer and float
Extend for unsigned integer according the comment of D129191.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129358
2022-07-14 19:41:07 +08:00
Nikita Popov 7a43b382ce [IndVars] Make sure header phi simplification preserves LCSSA form
When simplifying instructions, make sure that the replacement
preserves LCSSA form. This fixes the issue reported at:
https://reviews.llvm.org/D129293#3650851
2022-07-14 11:46:48 +02:00
Nikita Popov ebc54e0cd4 [SCCP] Make check for unknown/undef in unary op handling more explicit (NFCI)
Make the implementation more similar to other functions, by
explicitly skipping an unknown/undef first, and always falling
back to overdefined at the end. I don't think it makes a difference
now, but could make one once the constant evaluation can fail. In
that case we would directly mark the result as overdefined now,
rather than keeping it unknown (and later making it overdefined
because we think it's undef-based).
2022-07-14 10:56:11 +02:00
Nikita Popov 6db3edc858 [SCCP] Don't check for UndefValue before calling markConstant()
The value lattice explicitly represents undef, and markConstant()
internally checks for UndefValue and will create an undef rather
than constant lattice element in that case.

This is mostly a code simplification, it has little practical impact
because we usually get undef results from undef operands, and those
don't get processed.

Only leave the check behind for the CmpInst case, because it
currently goes through this incorrect code in the getCompare()
implementation: f98697642c/llvm/include/llvm/Analysis/ValueLattice.h (L456-L457)

Differential Revision: https://reviews.llvm.org/D128330
2022-07-14 10:05:56 +02:00
Kazu Hirata 611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Florian Hahn ee37ae91b6
[VPlan] Move VPBB verification to separate function (NFC). 2022-07-13 18:53:40 -07:00
Florian Hahn 6f7347b888
[LV] Use PredRecipe directly instead of getOrAddVPValue (NFC).
There is no need to look up the VPValue for Instr, PredRecipe can be
used directly.
2022-07-13 17:01:42 -07:00
Alexander Shaposhnikov c916840539 [SimplifyCFG] Improve SwitchToLookupTable optimization
Try to use the original value as an index (in the lookup table)
in more cases (to avoid one subtraction and shorten the dependency chain)
(https://github.com/llvm/llvm-project/issues/56189).

Test plan:
1/ ninja check-all
2/ bootstrapped LLVM + Clang pass tests

Differential revision: https://reviews.llvm.org/D128897
2022-07-13 23:21:45 +00:00
Leonard Chan 21f72c05c4 [hwasan] Add __hwasan_add_frame_record to the hwasan interface
Hwasan includes instructions in the prologue that mix the PC and SP and store
it into the stack ring buffer stored at __hwasan_tls. This is a thread_local
global exposed from the hwasan runtime. However, if TLS-mechanisms or the
hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls.
This is the case for Fuchsia where we instrument libc, so some functions that
are instrumented but can run before hwasan initialization will incorrectly
access this global. Additionally, libc cannot have any TLS variables, so we
cannot weakly define __hwasan_tls until the runtime is loaded.

A way we can work around this is by moving the instructions into a hwasan
function that does the store into the ring buffer and creating a weak definition
of that function locally in libc. This way __hwasan_tls will not actually be
referenced. This is not our long-term solution, but this will allow us to roll
out hwasan in the meantime.

This patch includes:

- A new llvm flag for choosing to emit a libcall rather than instructions in the
  prologue (off by default)
- The libcall for storing into the ringbuffer (__hwasan_add_frame_record)

Differential Revision: https://reviews.llvm.org/D128387
2022-07-13 15:15:15 -07:00
Leonard Chan d843d5c8e6 Revert "[hwasan] Add __hwasan_record_frame_record to the hwasan interface"
This reverts commit 4956620387.

This broke a sanitizer builder: https://lab.llvm.org/buildbot/#/builders/77/builds/19597
2022-07-13 15:06:07 -07:00
Florian Hahn 225e3ec622
[LV] Move VPBranchOnMaskRecipe::execute to VPlanRecipes.cpp (NFC). 2022-07-13 14:39:59 -07:00
leonardchan 4956620387 [hwasan] Add __hwasan_record_frame_record to the hwasan interface
Hwasan includes instructions in the prologue that mix the PC and SP and store
it into the stack ring buffer stored at __hwasan_tls. This is a thread_local
global exposed from the hwasan runtime. However, if TLS-mechanisms or the
hwasan runtime haven't been setup yet, it will be invalid to access __hwasan_tls.
This is the case for Fuchsia where we instrument libc, so some functions that
are instrumented but can run before hwasan initialization will incorrectly
access this global. Additionally, libc cannot have any TLS variables, so we
cannot weakly define __hwasan_tls until the runtime is loaded.

A way we can work around this is by moving the instructions into a hwasan
function that does the store into the ring buffer and creating a weak definition
of that function locally in libc. This way __hwasan_tls will not actually be
referenced. This is not our long-term solution, but this will allow us to roll
out hwasan in the meantime.

This patch includes:

- A new llvm flag for choosing to emit a libcall rather than instructions in the
  prologue (off by default)
- The libcall for storing into the ringbuffer (__hwasan_record_frame_record)

Differential Revision: https://reviews.llvm.org/D128387
2022-07-14 05:07:11 +08:00
Martin Sebor ab7ee3c991 [InstCombine] Enable strtol folding with nonnull endptr
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129593
2022-07-13 09:26:34 -06:00
Nikita Popov 07146a9e64 [SCCP] Fix typo in previous commit
Ooops, I tested a build from the wrong checkout.
2022-07-13 16:22:40 +02:00
Nikita Popov e298dfbc1b [SCCP] Avoid ConstantExpr::get() call
Use ConstantFoldUnaryOpOperand() API instead. This is in
preparation for removing fneg constant expressions.
2022-07-13 16:20:34 +02:00
Max Kazantsev 62f4572e45 [IndVars][NFC] Make IVOperand parameter an instruction 2022-07-13 19:07:16 +07:00
Max Kazantsev 30e33b4b81 [SCEV][NFC] Make getStrengthenedNoWrapFlagsFromBinOp return optional 2022-07-13 18:54:25 +07:00
David Sherwood 307ace7f20 [LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs
When vectorising ordered reductions we call a function
LoopVectorizationPlanner::adjustRecipesForReductions to replace the
existing VPWidenRecipe for the fadd instruction with a new
VPReductionRecipe. We attempt to insert the new recipe in the same
place, but this is wrong because createBlockInMask may have
generated new recipes that VPReductionRecipe now depends upon. I
have changed the insertion code to append the recipe to the
VPBasicBlock instead.

Added a new RUN with tail-folding enabled to the existing test:

  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D129550
2022-07-13 09:29:25 +01:00
Nikita Popov af49bed933 [IndVars] Simplify instructions after replacing header phi with preheader value
After replacing a loop phi with the preheader value, it's usually
possible to simplify some of the using instructions, so do that as
part of replaceLoopPHINodesWithPreheaderValues().

Doing this as part of IndVars is valuable, because it may make GEPs
in the loop have constant offsets and allow the following SROA run
to succeed (as demonstrated in the PhaseOrdering test).

Differential Revision: https://reviews.llvm.org/D129293
2022-07-13 10:27:04 +02:00
Nikita Popov a5ee62a141 [IndVars] Call replaceLoopPHINodesWithPreheaderValues() for already constant exits
Currently we only call replaceLoopPHINodesWithPreheaderValues() if
optimizeLoopExits() replaces the exit with an unconditional exit.
However, it is very common that this already happens as part of
eliminateIVComparison(), in which case we're leaving behind the
dead header phi.

Tweak the early bailout for already-constant exits to also call
replaceLoopPHINodesWithPreheaderValues().

Differential Revision: https://reviews.llvm.org/D129214
2022-07-13 09:43:21 +02:00
Augie Fackler 9029bda041 [Attributor] Don't crash if getAnalysisResultForFunction() returns null LoopInfo
I have no idea what's going on here. This code was moved
around/introduced in change cb26b01d57 and starts crashing with a NULL
dereference once I apply https://reviews.llvm.org/D123090. I assume that
I've unwittingly taught the attributor enough that it's able to do more
clever things than in the past, and it's able to trip on this case. I
make no claims about the correctness of this patch, but it passes tests
and seems to fix all the crashes I've been seeing.

Differential Revision: https://reviews.llvm.org/D129589
2022-07-12 16:44:06 -04:00
Yuanfang Chen fcb7d76d65 [coroutine] add nomerge function attribute to `llvm.coro.save`
It is illegal to merge two `llvm.coro.save` calls unless their
`llvm.coro.suspend` users are also merged. Marks it "nomerge" for
the moment.

This reverts D129025.

Alternative to D129025, which affects other token type users like WinEH.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D129530
2022-07-12 10:39:38 -07:00
Nick Desaulniers 2240d72f15 [X86] initial -mfunction-return=thunk-extern support
Adds support for:
* `-mfunction-return=<value>` command line flag, and
* `__attribute__((function_return("<value>")))` function attribute

Where the supported <value>s are:
* keep (disable)
* thunk-extern (enable)

thunk-extern enables clang to change ret instructions into jmps to an
external symbol named __x86_return_thunk, implemented as a new
MachineFunctionPass named "x86-return-thunks", keyed off the new IR
attribute fn_ret_thunk_extern.

The symbol __x86_return_thunk is expected to be provided by the runtime
the compiled code is linked against and is not defined by the compiler.
Enabling this option alone doesn't provide mitigations without
corresponding definitions of __x86_return_thunk!

This new MachineFunctionPass is very similar to "x86-lvi-ret".

The <value>s "thunk" and "thunk-inline" are currently unsupported. It's
not clear yet that they are necessary: whether the thunk pattern they
would emit is beneficial or used anywhere.

Should the <value>s "thunk" and "thunk-inline" become necessary,
x86-return-thunks could probably be merged into x86-retpoline-thunks
which has pre-existing machinery for emitting thunks (which could be
used to implement the <value> "thunk").

Has been found to build+boot with corresponding Linux
kernel patches. This helps the Linux kernel mitigate RETBLEED.
* CVE-2022-23816
* CVE-2022-28693
* CVE-2022-29901

See also:
* "RETBLEED: Arbitrary Speculative Code Execution with Return
Instructions."
* AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion
* TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0
  2022-07-12
* Return Stack Buffer Underflow / Return Stack Buffer Underflow /
  CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702

SystemZ may eventually want to support "thunk-extern" and "thunk"; both
options are used by the Linux kernel's CONFIG_EXPOLINE.

This functionality has been available in GCC since the 8.1 release, and
was backported to the 7.3 release.

Many thanks for folks that provided discrete review off list due to the
embargoed nature of this hardware vulnerability. Many Bothans died to
bring us this information.

Link: https://www.youtube.com/watch?v=IF6HbCKQHK8
Link: https://github.com/llvm/llvm-project/issues/54404
Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html
Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html

Reviewed By: aaron.ballman, craig.topper

Differential Revision: https://reviews.llvm.org/D129572
2022-07-12 09:17:54 -07:00
David Sherwood 6b694d600a [LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF
When calculating the cost of Instruction::Br in getInstructionCost
we query PredicatedBBsAfterVectorization to see if there is a
scalar predicated block. However, this meant that the decisions
being made for a given fixed-width VF were affecting the cost for a
scalable VF. As a result we were returning InstructionCost::Invalid
pointlessly for a scalable VF that should have a low cost. I
encountered this for some loops when enabling tail-folding for
scalable VFs.

Test added here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll

Differential Revision: https://reviews.llvm.org/D128272
2022-07-12 14:53:20 +01:00
Nikita Popov 3d475dfeb9 [Mem2Reg] Consistently preserve nonnull assume for uninit load
When performing a !nonnull load from uninitialized memory, we
should preserve the nonnull assume just like in all other cases.
We already do this correctly in the generic mem2reg code, but
don't handle this case when using the optimized single-block
implementation.

Make sure that the optimized implementation exhibits the same
behavior as the generic implementation.
2022-07-12 12:53:08 +02:00
Kazu Hirata ec9a0e36d9 [IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC)
The last uses were removed on Apr 15, 2022 in commit
2e6ac54cf4.

Differential Revision: https://reviews.llvm.org/D129460
2022-07-11 20:15:24 -07:00
Florian Hahn 5d135041c5
[LV] Move VPBlendRecipe::execute to VPlanRecipes.cpp (NFC). 2022-07-11 16:01:07 -07:00
Justin Cady 3d438ceed1 [InstrProf] Mark __llvm_profile_runtime hidden to match libclang_rt.profile definition
Mark the symbol hidden to match INSTR_PROF_PROFILE_RUNTIME_VAR in compiler-rt.

Fixes second issue discussed at https://discourse.llvm.org/t/63090

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D128842
2022-07-11 11:29:20 -07:00
David Sherwood 03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
David Sherwood 02d6950d84 [LoopVectorize][NFC] Add optional Name parameter to VPInstruction
This patch is a simple piece of refactoring that now permits users
to create VPInstructions and specify the name of the value being
generated. This is useful for creating more readable/meaningful
names in IR.

Differential Revision: https://reviews.llvm.org/D128982
2022-07-11 09:23:24 +01:00
Florian Hahn 6a4bc452f8
[LV] Move VPWidenGEPRecipe::execute to VPlanRecipes.cpp (NFC). 2022-07-10 17:10:17 -07:00
Florian Hahn 13ae213469
[LV] Move VPWidenRecipe::execute to VPlanRecipes.cpp (NFC). 2022-07-09 18:46:57 -07:00
Paul Osmialowski b17754bcaa [SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer value
Since the backend's codegen is capable to expand powi into fmul's, it
is not needed anymore to do so in the ::optimizePow() function of
SimplifyLibCalls.cpp. What is sufficient is to always turn pow(x, n)
into powi(x, n) for the cases where n is a constant integer value.

Dropping the current expansion code allowed relaxation of the folding
conditions and now this can also happen at optimization levels below
Ofast.

The added CodeGen/AArch64/powi.ll test case ensures that powi is
actually expanded into fmul's, confirming that this refactor did not
cause any performance degradation.

Following an idea proposed by David Sherwood <david.sherwood@arm.com>.

Differential Revision: https://reviews.llvm.org/D128591
2022-07-09 12:00:22 -04:00
Florian Hahn 0c27b38849
[VPlan] Move VPWidenSelectRecipe::execute to VPlanRecipes.cpp (NFC).
Depends on D127968.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127970
2022-07-08 09:35:23 -07:00
Nikita Popov d287051404 [InstCombine] Avoid ConstantExpr::get() in vector binop fold (NFCI)
Use the ConstantFoldBinaryOpOperands() API instead. This case
would bail out on a non-folded result anyway.
2022-07-08 17:20:14 +02:00
Nikita Popov 29c6bf45c3 [InstCombine] Avoid ConstantExpr::get() call
Avoid calling ConstantExpr::get() for associative/commutative
binops, call ConstantFoldBinaryOpOperands() instead. We only
want to perform the reassociation of the constants actually fold.
2022-07-08 17:13:06 +02:00
Nikita Popov fc18a88231 [InstCombine] Avoid creating float binop ConstantExprs
Replace ConstantExpr:getFAdd etc with call to
ConstantFoldBinaryOpOperands(). I'm using the constant folding API
rather than IRBuilder here to ensure that this does actually
constant fold. These transforms don't use m_ImmConstant(), so this
would not otherwise be guaranteed (and apparently, they can't use
m_ImmConstant because they want to handle scalable vector splats).

There is an opportunity here to further migrate these to the
ConstantFoldFPInstOperands() API, which would respect the denormal
mode. I've held off on doing so here, because some of this code
explicitly checks for denormal results, and I don't want to touch
it in a mostly NFC change.
2022-07-08 16:36:04 +02:00