Commit Graph

26463 Commits

Author SHA1 Message Date
Florian Hahn f250892373
[VPlan] Make VPRecipeBase inherit from VPDef.
This patch makes VPRecipeBase a direct subclass of VPDef, moving the
SubclassID to VPDef.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90564
2020-12-21 13:34:00 +00:00
Florian Hahn cd608dc8d3
[VPlan] Use VPDef for VPInterleaveRecipe.
This patch turns updates VPInterleaveRecipe to manage the values it defines
using VPDef. The VPValue is used  during VPlan construction and
codegeneration instead of the plain IR reference where possible.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90562
2020-12-21 10:56:53 +00:00
David Sherwood 3bf7d47a97 [NFC][InstructionCost] Remove isValid() asserts in SLPVectorizer.cpp
An earlier patch introduced asserts that the InstructionCost is
valid because at that time the ReuseShuffleCost variable was an
unsigned. However, now that the variable is an InstructionCost
instance the asserts can be removed.

See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

See this patch for the introduction of the type:
https://reviews.llvm.org/D91174
2020-12-21 09:12:28 +00:00
Kazu Hirata 5d24935f22 [PGO] Remove dead member variable InstrumentFuncEntry (NFC)
This patch removes InstrumentFuncEntry as it is dead.

The constructor of FuncPGOInstrumentation passes InstrumentFuncEntry
to MST, but it doesn't make a local copy as a member variable.
2020-12-20 09:57:05 -08:00
Andrew Litteken 7c6f28a438 [IROutliner] Deduplicating functions that only require inputs.
Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.

We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: jroelofs, paquette

Differential Revision: https://reviews.llvm.org/D86978
2020-12-19 17:34:34 -06:00
Andrew Litteken b8a2b6af37 Revert "[IROutliner] Deduplicating functions that only require inputs."
Missing reviewers and differential revision in commit message.

This reverts commit 5cdc4f57e5.
2020-12-19 17:33:49 -06:00
Andrew Litteken 5cdc4f57e5 [IROutliner] Deduplicating functions that only require inputs.
Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.

We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
2020-12-19 17:26:29 -06:00
Roman Lebedev c043f5055e
[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1
... for conditional branch case
2020-12-20 00:18:36 +03:00
Roman Lebedev 262ff9c23e
[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree 2020-12-20 00:18:36 +03:00
Roman Lebedev 6a1617d67c
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2
... for the custom case returning void.
2020-12-20 00:18:36 +03:00
Roman Lebedev b94520c9ee
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1
... for the general case of returning a value.
2020-12-20 00:18:35 +03:00
Roman Lebedev 4d87a6ad13
[NFCI][SimplifyCFG] SimplifyCondBranchToTwoReturns(): pull out BI->getParent() into a variable 2020-12-20 00:18:35 +03:00
Roman Lebedev 83659c7076
[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
2020-12-20 00:18:34 +03:00
Roman Lebedev b7d00e29b7
[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree 2020-12-20 00:18:34 +03:00
Roman Lebedev c209b88dd4
[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree 2020-12-20 00:18:34 +03:00
Roman Lebedev 76e74d9395
[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree 2020-12-20 00:18:33 +03:00
Roman Lebedev 4be8707e64
[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree
Still boring, simply drop all edges to successors of DomBlock,
and add an edge to to BB instead.
2020-12-20 00:18:33 +03:00
Roman Lebedev b43b77ff9b
[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation
And that exposes that a number of tests don't *actually* manage to
maintain DomTree validity, which is inline with my observations.

Once again, SimlifyCFG pass currently does not require/preserve DomTree
by default, so this is effectively NFC.
2020-12-20 00:18:32 +03:00
Andrew Litteken c52bcf3a9b [IRSim][IROutliner] Limit to extracting regions that only require
inputs.

Extracted regions can have both inputs and outputs.  In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics).  We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch limits the
extracted regions to those that only require inputs, and do not have any
 other special cases.

We test that we do not outline the wrong constants in:
test/Transforms/IROutliner/outliner-different-constants.ll
test/Transforms/IROutliner/outliner-different-globals.ll
test/Transforms/IROutliner/outliner-constant-vs-registers.ll

We test that correctly outline in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: paquette, plofti

Differential Revision: https://reviews.llvm.org/D86977
2020-12-19 13:33:54 -06:00
Kazu Hirata 56edfcada9 [Target, Transforms] Use contains (NFC) 2020-12-19 10:43:19 -08:00
Aditya Kumar 1ab4db0f84 [HotColdSplit] Reflect full cost of parameters in split penalty
Make the penalty for splitting a region more accurately reflect the cost
of materializing all of the inputs/outputs to/from the region.

This almost entirely eliminates code growth within functions which
undergo splitting in key internal frameworks, and reduces the size of
those frameworks between 2.6% to 3%.

rdar://49167240

Patch by: Vedant Kumar(@vsk)
Reviewers: hiraditya,rjf,t.p.northover
Reviewed By: hiraditya,rjf

Differential Revision: https://reviews.llvm.org/D59715
2020-12-18 17:06:17 -08:00
Akira Hatanaka ffd982f7db [ObjC][ARC] Fix a bug where the inline-asm retain/claim RV marker wasn't
inserted when the original call had a 'returned' argument

The code is testing whether the instruction BBI points to is the call
that is paired up with the retainRV/claimRV call, but it doesn't work
when the call has a 'returned' argument since GetArgRCIdentityRoot looks
through 'returned' arguments.

rdar://72485383
2020-12-18 16:59:06 -08:00
Sanjay Patel 37d0dda739 [SLP] fix typo; NFC 2020-12-18 16:55:52 -05:00
Nikita Popov 1f1145006b [DSE] Use correct memory location for read clobber check
MSSA DSE starts at a killing store, finds an earlier store and
then checks that the earlier store is not read along any paths
(without being killed first). However, it uses the memory location
of the killing store for that, not the earlier store that we're
attempting to eliminate.

This has a number of problems:

* Mismatches between what BasicAA considers aliasing and what DSE
  considers an overwrite (even though both are correct in isolation)
  can result in miscompiles. This is PR48279, which D92045 tries to
  fix in a different way. The problem is that we're using a location
  from a store that is potentially not executed and thus may be UB,
  in which case analysis results can be arbitrary.
* Metadata on the killing store may be used to determine aliasing,
  but there is no guarantee that the metadata is valid, as the specific
  killing store may not be executed. Using the metadata on the earlier
  store is valid (it is the store we're removing, so on any execution
  where its removal may be observed, it must be executed).
* The location is imprecise. For full overwrites the killing store
  will always have a location that is larger or equal than the earlier
  access location, so it's beneficial to use the earlier access
  location. This is not the case for partial overwrites, in which
  case either location might be smaller. There is some room for
  improvement here.

Using the earlier access location means that we can no longer cache
which accesses are read for a given killing store, as we may be
querying different locations. However, it turns out that simply
dropping the cache has no notable impact on compile-time.

Differential Revision: https://reviews.llvm.org/D93523
2020-12-18 20:26:53 +01:00
Kazu Hirata 5ac37725df [GVNHoist] Remove successorDominate (NFC)
The function was introduced on Aug 25, 2016 in commit
5f0d0e60d1.

Its last use was removed on Sep 13, 2017 in commit
dfa8741c96.
2020-12-18 10:29:52 -08:00
Roman Lebedev 897c985e1e
[InstCombine] Canonicalize SPF to abs intrinsic
This patch enables canonicalization of SPF_ABS and SPF_ABS
to the abs intrinsic.

This is a recommit, the original try was
05d4c4ebc2,
but it was reverted due to an apparent miscompile,
which since then has just been fixed by the previous commit.

Differential Revision: https://reviews.llvm.org/D87188
2020-12-18 21:18:14 +03:00
Whitney Tsang 2a814cd9e1 Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-18 17:37:17 +00:00
Arnamoy Bhattacharyya 06d5b1c9ad [SROA] Remove Dead Instructions while creating speculative instructions
The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list.  However it does not remove instructions from the dead list while running eraseFromParent() on those instructions.

This causes (rare) null pointer dereferences.  For example, in the speculatePHINodeLoads() instruction, in the following code snippet:

```
   while (!PN.use_empty()) {
     LoadInst *LI = cast<LoadInst>(PN.user_back());
     LI->replaceAllUsesWith(NewPN);
     LI->eraseFromParent();
   }
```

If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called.  However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line:

```
LoadInst *Load = PredBuilder.CreateAlignedLoad(
         LoadTy, InVal, Alignment,
         (PN.getName() + ".sroa.speculate.load." + Pred->getName()));
```

This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize.  In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D92431
2020-12-18 11:47:02 -05:00
Sanjay Patel 47aaa99c0e [VectorCombine] allow peeking through GEPs when creating a vector load
This is an enhancement motivated by https://llvm.org/PR16739
(see D92858 for another).

We can look through a GEP to find a base pointer that may be
safe to use for a vector load. If so, then we shuffle (shift)
the necessary vector element over to index 0.

Alive2 proof based on 1 of the regression tests:
https://alive2.llvm.org/ce/z/yPJLkh

The vector translation is independent of endian (verify by
changing to leading 'E' in the datalayout string).

Differential Revision: https://reviews.llvm.org/D93229
2020-12-18 09:25:03 -05:00
Yevgeny Rouban f0e3d1d6ca [IndVars] Fix adding trunc instructions to unwind blocks
Truncate instruction must not be inserted before landing pads.
The insertion point is fixed.
2020-12-18 12:52:23 +07:00
Kazu Hirata b621116716 [Transforms] Use llvm::erase_if (NFC) 2020-12-17 19:53:10 -08:00
Rong Xu 31c0b8700b Fix clang-ppc64le-rhel buildbot build error
ix buildbot build error due to
commit 3733463d: [IR][PGO] Add hot func attribute and use hot/cold
attribute in func section
2020-12-17 19:14:43 -08:00
Rong Xu 3733463dbb [IR][PGO] Add hot func attribute and use hot/cold attribute in func section
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.

This patch adds support of hot function attribute to LLVM IR.  This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).

This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
    is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
    isFunctionColdInCallGraph is true, this function will be marked as
    cold.

The changes are:
(1) user annotated function attribute will used in setting function
    section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.

The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.

Differential Revision: https://reviews.llvm.org/D92493
2020-12-17 18:41:12 -08:00
Andrew Litteken cea807602a [IRSim][IROutliner] Adding InstVisitor to disallow certain operations.
This adds a custom InstVisitor to return false on instructions that
should not be allowed to be outlined.  These match the illegal
instructions in the IRInstructionMapper with exception of the addition
of the llvm.assume intrinsic.

Tests all the tests marked: illegal-*-.ll with a test for each kind of
instruction that has been marked as illegal.

Reviewers: jroelofs, paquette

Differential Revisions: https://reviews.llvm.org/D86976
2020-12-17 19:33:57 -06:00
Roman Lebedev 2d07414ee5
[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree
Pretty boring, removeUnwindEdge() already known how to update DomTree,
so if we are to call it, we must first flush our own pending updates;
otherwise, we just stop predecessors from branching to us,
and for certain predecessors, stop their predecessors from
branching to them also.
2020-12-18 00:37:22 +03:00
Roman Lebedev 2ee724863e
[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:22 +03:00
Roman Lebedev 164e0847a5
[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:21 +03:00
Bangtian Liu 511cfe9441 Revert "Ensure SplitEdge to return the new block between the two given blocks"
This reverts commit d20e0c3444.
2020-12-17 21:00:37 +00:00
Johannes Doerfert 994bb6eb7d [OpenMP][NFC] Provide a new remark and documentation
If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D93439
2020-12-17 14:38:26 -06:00
Andrew Litteken dae34463e3 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Recommit of bf899e8913 fixing memory
leaks.

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-12-17 11:27:26 -06:00
Nabeel Omer df2b9a3e02 [DebugInfo] Avoid re-ordering assignments in LCSSA
The LCSSA pass makes use of a function insertDebugValuesForPHIs() to
propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty
behaviour occurs when the parent PHI of a newly inserted PHI is not the
most recent assignment to a source variable. insertDebugValuesForPHIs ends
up propagating a value that isn't the most recent assignemnt.

This change removes the call to insertDebugValuesForPHIs() from LCSSA,
preventing incorrect dbg.value intrinsics from being propagated.
Propagating variable locations between blocks will occur later, during
LiveDebugValues.

Differential Revision: https://reviews.llvm.org/D92576
2020-12-17 16:17:32 +00:00
Bangtian Liu d20e0c3444 Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-17 16:00:15 +00:00
Florian Hahn 01089c876b
[InstCombine] Preserve !annotation on newly created instructions.
If the source instruction has !annotation metadata, all instructions
created during combining should also have it. Tell the builder to
add it.

The !annotation system was discussed on llvm-dev as part of
'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

This patch is based on an earlier patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, lebedev.ri

Differential Revision: https://reviews.llvm.org/D91444
2020-12-17 15:20:23 +00:00
Florian Hahn 75c04bfc61
[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest.
When folding a branch to a common destination, preserve !annotation on
the created instruction, if the terminator of the BB that is going to be
removed has !annotation. This should ensure that !annotation is attached
to the instructions that 'replace' the original terminator.

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D93410
2020-12-17 14:06:58 +00:00
Jun Ma 0138399903 [InstCombine] Remove scalable vector restriction in InstCombineCasts
Differential Revision: https://reviews.llvm.org/D93389
2020-12-17 22:02:33 +08:00
Florian Hahn 29077ae860
[IRBuilder] Generalize debug loc handling for arbitrary metadata.
This patch extends IRBuilder to allow adding/preserving arbitrary
metadata on created instructions.

Instead of using references to specific metadata nodes (like DebugLoc),
IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which
are added to each created instruction.

The patch itself is a NFC and only moves the existing debug location
handling over to the new system. In a follow-up patch it will be used to
preserve !annotation metadata besides !dbg.

The current approach requires iterating over MetadataToCopy to avoid
adding duplicates, but given that the number of metadata kinds to
copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2
(!dbg and !annotation)) that should not matter.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D93400
2020-12-17 13:27:43 +00:00
Cullen Rhodes 1fd3a04775 [LV] Disable epilogue vectorization for scalable VFs
Epilogue vectorization doesn't support scalable vectorization factors
yet, disable it for now.

Reviewed By: sdesmalen, bmahjour

Differential Revision: https://reviews.llvm.org/D93063
2020-12-17 12:14:03 +00:00
dfukalov 9ed8e0caab [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Barry Revzin 92310454bf Make LLVM build in C++20 mode
Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).

There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:

// 1) Missing const
namespace missing_const {
    struct A {
    #ifndef FIXED
        bool operator==(A const&);
    #else
        bool operator==(A const&) const;
    #endif
    };

    bool a = A{} == A{}; // error
}

// 2) Type mismatch on CRTP
namespace crtp_mismatch {
    template <typename Derived>
    struct Base {
    #ifndef FIXED
        bool operator==(Derived const&) const;
    #else
        // in one case changed to taking Base const&
        friend bool operator==(Derived const&, Derived const&);
    #endif
    };

    struct D : Base<D> { };

    bool b = D{} == D{}; // error
}

// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
    template <bool Const>
    struct iterator {
        using const_iterator = iterator<true>;

        iterator();

        template <bool B, std::enable_if_t<(Const && !B), int> = 0>
        iterator(iterator<B> const&);

    #ifndef FIXED
        bool operator==(const_iterator const&) const;
    #else
        friend bool operator==(iterator const&, iterator const&);
    #endif
    };

    bool c = iterator<false>{} == iterator<false>{} // error
          || iterator<false>{} == iterator<true>{}
          || iterator<true>{} == iterator<false>{}
          || iterator<true>{} == iterator<true>{};
}

// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
    enum Color { Red };

    struct C {
        C();
        C(Color);
        operator Color() const;
        bool operator==(Color) const;
        friend bool operator==(C, C);
    };

    bool c = C{} == C{}; // error
    bool d = C{} == Red;
}

Differential revision: https://reviews.llvm.org/D78938
2020-12-17 10:44:10 +00:00
Florian Hahn eba09a2db9
[InstCombine] Preserve !annotation for newly created instructions.
When replacing an instruction with !annotation with a newly created
replacement, add the !annotation metadata to the replacement.

This mostly covers cases where the new instructions are created using
the ::Create helpers. Instructions created by IRBuilder will be handled
by D91444.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D93399
2020-12-17 09:06:51 +00:00
Kazu Hirata 4ad5b634f6 [GCN] Remove unused function handleNewInstruction (NFC)
The function was added without a user on Dec 22, 2016 in commit
7e274e02ae.  It seems to be unused since
then.
2020-12-16 21:57:48 -08:00
Hongtao Yu ac068e014b [CSSPGO] Consume pseudo-probe-based AutoFDO profile
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.

Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).

**Adjustment to profile format **

A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.

Differential Revision: https://reviews.llvm.org/D92347
2020-12-16 15:57:18 -08:00
alex-t 35ec3ff76d Disable Jump Threading for the targets with divergent control flow
Details: Jump Threading does not make sense for the targets with divergent CF
         since they do not use branch prediction for speculative execution.
         Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform.
         This may cause errors in further CF lowering.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93302
2020-12-17 02:40:54 +03:00
Roman Lebedev d22a47e9ff
[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree
A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predecessors
of a block and carefully delay deletion of the block until afterwards
the DomTree updates are appled, or add a successor to the block.

There wasn't a great test coverage for this, so i added extra, to be sure.
2020-12-17 01:03:50 +03:00
Roman Lebedev 5cce4aff18
[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev 49dac4aca0
[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev 4fc169f664
[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
2020-12-17 01:03:49 +03:00
Rong Xu 0abd744597 [PGO] Use the sum of profile counts to fix the function entry count
Raw profile count values for each BB are not kept after profile
annotation. We record function entry count and branch weights
and use them to compute the count when needed.  This mechanism
works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in
BFI). This patch uses sum of profile count values to fix
function entry count to make the BFI count close to real profile
counts.

Differential Revision: https://reviews.llvm.org/D61540
2020-12-16 13:37:43 -08:00
Nikita Popov e728024808 [DSE] Pass MemoryLocation by const ref (NFC) 2020-12-16 21:47:46 +01:00
Sanjay Patel 38ebc1a13d [VectorCombine] optimize alignment for load transform
Here's another minimal step suggested by D93229 / D93397 .
(I'm trying to be extra careful in these changes because
load transforms are easy to get wrong.)

We can optimistically choose the greater alignment of a
load and its pointer operand. As the test diffs show, this
can improve what would have been unaligned vector loads
into aligned loads.

When we enhance with gep offsets, we will need to adjust
the alignment calculation to include that offset.

Differential Revision: https://reviews.llvm.org/D93406
2020-12-16 15:25:45 -05:00
Sanjay Patel aaaf0ec72b [VectorCombine] loosen alignment constraint for load transform
As discussed in D93229, we only need a minimal alignment constraint
when querying whether a hypothetical vector load is safe. We still
pass/use the potentially stronger alignment attribute when checking
costs and creating the new load.

There's already a test that changes with the minimum code change,
so splitting this off as a preliminary commit independent of any
gep/offset enhancements.

Differential Revision: https://reviews.llvm.org/D93397
2020-12-16 12:25:18 -05:00
Whitney Tsang fa3693ad0b [LoopNest] Handle loop-nest passes in LoopPassManager
Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this
patch (and other following patches) is to create facilities that allow
implementing loop nest passes that run on top-level loop nests for the
New Pass Manager.

This patch extends the functionality of LoopPassManager to handle
loop-nest passes by specializing the definition of LoopPassManager that
accepts both kinds of passes in addPass.

Only loop passes are executed if L is not a top-level one, and both
kinds of passes are executed if L is top-level. Currently, loop nest
passes should have the following run method:

PreservedAnalyses run(LoopNest &, LoopAnalysisManager &,
LoopStandardAnalysisResults &, LPMUpdater &);

Reviewed By: Whitney, ychen
Differential Revision: https://reviews.llvm.org/D87045
2020-12-16 17:07:14 +00:00
Caroline Concatto be9184bc55 [SLPVectorizer]Migrate getEntryCost to return InstructionCost
This patch also changes:
  the return type of getGatherCost and
  the signature of the debug function dumpTreeCosts
to use InstructionCost.

This patch is part of a series of patches to use InstructionCost instead of
unsigned/int for the cost model functions.

See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

See this patch for the introduction of the type:
https://reviews.llvm.org/D91174

Depends on D93049

Differential Revision: https://reviews.llvm.org/D93127
2020-12-16 14:18:40 +00:00
Caroline Concatto 07217e0a1b [CostModel]Migrate getTreeCost() to use InstructionCost
This patch changes the type of cost variables (for instance: Cost, ExtractCost,
SpillCost) to use InstructionCost.
This patch also changes the type of cost variables to InstructionCost in other
functions that use the result of getTreeCost()
This patch is part of a series of patches to use InstructionCost instead of
unsigned/int for the cost model functions.

See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D91174

Differential Revision: https://reviews.llvm.org/D93049
2020-12-16 13:08:37 +00:00
Bangtian Liu c10757200d Revert "Ensure SplitEdge to return the new block between the two given blocks"
This reverts commit cf638d793c.
2020-12-16 11:52:30 +00:00
Philip Reames 1f6e15566f [LV] Weaken a unnecessarily strong assert [NFC]
Account for the fact that (in the future) the latch might be a switch not a branch.  The existing code is correct, minus the assert.
2020-12-15 19:07:53 -08:00
Philip Reames af7ef895d4 [LV] Extend dead instruction detection to multiple exiting blocks
Given we haven't yet enabled multiple exiting blocks, this is currently non functional, but it's an obvious extension which cleans up a later patch.

I don't think this is worth review (as it's pretty obvious), if anyone disagrees, feel feel to revert or comment and I will.
2020-12-15 18:46:32 -08:00
Bangtian Liu cf638d793c Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-15 23:32:29 +00:00
Johannes Doerfert dcaec81211 [OpenMP] Use assumptions during ICV tracking
The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us
to ignore calls that would otherwise prevent ICV tracking.

Once we track more ICVs we might need to distinguish the ones that could
be impacted even with `no_openmp_routines`.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D92050
2020-12-15 16:51:34 -06:00
Johannes Doerfert d08d490a4c [OpenMPOpt][NFC] Clang format 2020-12-15 16:51:34 -06:00
Roman Lebedev e113317958
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
  `FoldBranchToCommonDest()` transform in certain cases,
   where the successor is dominated by predecessor,
   because we then don't have PHI's, and can't recreate them,
   well, without handrolling 'is dominated by' check,
   which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
   decrease the number of `Dominator Tree Construction` by 5
   (from 28 now, i.e. -18%) in `-O3` old-pm pipeline
   (as per `llvm/test/Other/opt-O3-pipeline.ll`)
   This might or might not be beneficial for compile time.

So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.

Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.

But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.

Then, eventually, the default should be flipped,
and eventually^2 the flag removed.

And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-16 00:38:00 +03:00
Philip Reames a81db8b315 [LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC]
This should be purely non-functional.  When touching this code for another reason, I found the handling of the PredicateOrDontVectorize piece here very confusing.  Let's make it an explicit state (instead of an implicit combination of two variables), and use early return for options/hint processing.
2020-12-15 12:38:13 -08:00
Simon Pilgrim a3bd67f222 SeparateConstOffsetFromGEP::lowerToSingleIndexGEPs - don't use dyn_cast_or_null. NFCI.
ResultPtr is guaranteed to be non-null - and using dyn_cast_or_null causes unnecessary static analyzer warnings.

We can't say the same for FirstResult AFAICT, so keep dyn_cast_or_null for that.
2020-12-15 17:27:25 +00:00
Florian Hahn 7ea3932ab1
[AnnotationRemarks] Also generate annotation remarks when using -O0.
The AnnotationRemarks pass is already run at the end of the module
pipeline. This patch also adds it before bailing out for -O0, so remarks
are also generated with -O0.
2020-12-15 14:46:52 +00:00
Florian Hahn 7186a3965a
[VPlan] Use VPDef for VPWidenSelectRecipe.
This patch turns updates VPWidenSelectRecipe to manage the value
it defines using VPDef.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90560
2020-12-15 14:15:01 +00:00
Jun Ma 52a3267ffa [InstCombine] Remove scalable vector restriction in foldVectorBinop
Differential Revision: https://reviews.llvm.org/D93289
2020-12-15 21:14:59 +08:00
Jun Ma ffe84d90e9 [InstCombine][NFC] Change cast of FixedVectorType to dyn_cast. 2020-12-15 20:36:57 +08:00
Jun Ma e12f584578 [InstCombine] Remove scalable vector restriction in InstCombineCompares
Differential Revision: https://reviews.llvm.org/D93269
2020-12-15 20:36:57 +08:00
Jun Ma 2ac58e21a1 [InstCombine] Remove scalable vector restriction when fold SelectInst
Differential Revision: https://reviews.llvm.org/D93083
2020-12-15 20:36:57 +08:00
Florian Hahn 318f5798d8
[VPlan] Use VPDef for VPWidenGEPRecipe.
This patch turns updates VPWidenGEPRecipe to manage the value it defines
using VPDef. The VPValue is used  during VPlan construction and
codegeneration instead of the plain IR reference where possible.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90561
2020-12-15 09:30:14 +00:00
Florian Hahn ad1161f9b5
[VPlan] Use VPdef for VPWidenCall.
This patch turns updates VPWidenREcipe to manage the value it defines
using VPDef.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90559
2020-12-15 09:20:07 +00:00
Nico Weber a852ee199c Reland "[MachineDebugify] Insert synthetic DBG_VALUE instructions"
This reverts commit 841f9c937f.
The change landed many months ago; something else broke those tests.
2020-12-14 22:34:23 -05:00
Nico Weber 841f9c937f Revert "[MachineDebugify] Insert synthetic DBG_VALUE instructions"
This reverts commit 2a5675f11d.
The tests it adds fail: https://reviews.llvm.org/D78135#2453736
2020-12-14 22:14:48 -05:00
Reid Kleckner d2ed9d6b7e Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC"
We determined that the MSVC implementation of std::aligned* isn't suited
to our needs. It doesn't support 16 byte alignment or higher, and it
doesn't really guarantee 8 byte alignment. See
https://github.com/microsoft/STL/issues/1533

Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC"

Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back
AlignedCharArrayUnion.

This reverts commit 4d8bf870a8.

This reverts commit d10f9863a5.

This reverts commit 4b5dc150b9.
2020-12-14 17:04:06 -08:00
Rong Xu 54e03d03a7 [PGO] Verify BFI counts after loading profile data
This patch adds the functionality to compare BFI counts with real
profile
counts right after reading the profile. It will print remarks under
-Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo.

Differential Revision: https://reviews.llvm.org/D91813
2020-12-14 15:56:10 -08:00
Gulfem Savrun Yeniceri 7c0e3a77bc [clang][IR] Add support for leaf attribute
This patch adds support for leaf attribute as an optimization hint
in Clang/LLVM.

Differential Revision: https://reviews.llvm.org/D90275
2020-12-14 14:48:17 -08:00
Sanjay Patel d399f870b5 [VectorCombine] make load transform poison-safe
As noted in D93229, the transform from scalar load to vector load
potentially leaks poison from the extra vector elements that are
being loaded.

We could use freeze here (and x86 codegen at least appears to be
the same either way), but we already have a shuffle in this logic
to optionally change the vector size, so let's allow that
instruction to serve both purposes.

Differential Revision: https://reviews.llvm.org/D93238
2020-12-14 17:42:01 -05:00
Craig Topper 25067f179f [LoopIdiomRecognize] Teach detectShiftUntilZeroIdiom to recognize loops where the counter is decrementing.
This adds support for loops like

unsigned clz(unsigned x) {
    unsigned w = sizeof (x) * CHAR_BIT;
    while (x) {
        w--;
        x >>= 1;
    }

    return w;
}

and

unsigned clz(unsigned x) {
    unsigned w = sizeof (x) * CHAR_BIT - 1;
    while (x >>= 1) {
        w--;
    }

    return w;
}

To support these we look for add x, -1 as well as add x, 1 that
we already matched. If the value was -1 we need to subtract from
the initial counter value instead of adding to it.

Fixes PR48404.

Differential Revision: https://reviews.llvm.org/D92745
2020-12-14 14:25:05 -08:00
Philip Reames f5fe8493e5 [LAA] Relax restrictions on early exits in loop structure
his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist).  Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression.

Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so.

Differential Revision: https://reviews.llvm.org/D92066
2020-12-14 12:44:01 -08:00
Roman Lebedev 59560e8589
[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450)
Even though d38205144f was mostly a correct
fix for the external non-PHI users, it's not a *generally* correct fix,
because the 'placeholder' values in those trivial PHI's we create
shouldn't be *always* 'undef', but the PHI itself for the backedges,
else we end up with wrong value, as the `@pr48450_2` test shows.

But we can't just do that, because we can't check that the PHI
can be it's own incoming value when coming from certain predecessor,
because we don't have a dominator tree.

So until we can address this correctness problem properly,
ensure that we don't perform the transformation
if there are such problematic external uses.

Making dominator tree available there is going to be involved,
since `-simplifycfg` pass currently does not preserve/update domtree...
2020-12-14 20:14:31 +03:00
Roman Lebedev e8360a8e1e
[NFC][SimplifyCFG] FoldBranchToCommonDest(): pull out 'common successor' into a variable
Makes it easier to use it elsewhere
2020-12-14 20:14:31 +03:00
Stanislav Mekhanoshin 87d7757bbe [SLP] Control maximum vectorization factor from TTI
D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Differential Revision: https://reviews.llvm.org/D92059
2020-12-14 08:49:40 -08:00
Markus Lavin 2a6782bb9f Reland [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo).  Before rewrite (but after introduction of new
induction variables) use SCEV to compute an equivalent set of values for
each @llvm.dbg.value in the loop body (among the loop header PHI-nodes).
After rewrite (and dead PHI elimination) update those @llvm.dbg.value
now referencing undef by picking a remaining value from its equivalence
set.  Allow match with offset by inserting compensation code in the
DIExpression.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-12-14 16:15:18 +01:00
Florian Hahn e42e5263bd
[VPlan] Make VPWidenMemoryInstructionRecipe a VPDef.
This patch updates VPWidenMemoryInstructionRecipe to use VPDef
to manage the value it produces instead of inheriting from VPValue.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90563
2020-12-14 14:13:59 +00:00
Anton Afanasyev fac7c7ec3c [SLP] Fix vector element size for the store chains
Vector element size could be different for different store chains.
This patch prevents wrong computation of maximum number of elements
for that case.

Differential Revision: https://reviews.llvm.org/D93192
2020-12-14 15:51:43 +03:00
Kazu Hirata 5891ad4e22 [Transforms] Use llvm::erase_value (NFC) 2020-12-13 09:48:47 -08:00
Florian Hahn 533f85767c
[VPlan] Use interleaveComma in printOperands() (NFC). 2020-12-13 16:29:16 +00:00
Roman Lebedev d38205144f
[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450)
In particular, if the successor block, which is about to get a new
predecessor block, currently only has a single predecessor,
then the bonus instructions will be directly used within said successor,
which is fine, since the block with bonus instructions dominates that
successor. But once there's a new predecessor, the IR is no longer valid,
and we don't fix it, because we only update PHI nodes.

Which means, the live-out bonus instructions must be exclusively used
by the PHI nodes in successor blocks. So we have to form trivial PHI nodes.
which will then be successfully updated to recieve cloned bonus instns.

This all works fine, except for the fact that we don't have access to
the dominator tree, and we don't ignore unreachable code,
so we sometimes do end up having to deal with some weird IR.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48450
2020-12-13 00:06:57 +03:00
Nikita Popov afbb6d97b5 [CVP] Simplify and generalize switch handling
CVP currently handles switches by checking an equality predicate
on all edges from predecessor blocks. Of course, this can only
work if the value being switched over is defined in a different block.

Replace this implementation with a call to getPredicateAt(), which
also does the predecessor edge predicate check (if not defined in
the same block), but can also do quite a bit more: It can reason
about phi-nodes by checking edge predicates for incoming values,
it can reason about assumes, and it can reason about block values.

As such, this makes the implementation both simpler and more
powerful. The compile-time impact on CTMark is in the noise.
2020-12-12 21:12:27 +01:00
Kazu Hirata 215c1b1935 [Transforms] Use is_contained (NFC) 2020-12-12 09:37:49 -08:00
David Green ab97c9bdb7 [LV] Fix scalar cost for tail predicated loops
When it comes to the scalar cost of any predicated block, the loop
vectorizer by default regards this predication as a sign that it is
looking at an if-conversion and divides the scalar cost of the block by
2, assuming it would only be executed half the time. This however makes
no sense if the predication has been introduced to tail predicate the
loop.

Original patch by Anna Welker

Differential Revision: https://reviews.llvm.org/D86452
2020-12-12 14:21:40 +00:00
Fangrui Song b5ad32ef5c Migrate deprecated DebugLoc::get to DILocation::get
This migrates all LLVM (except Kaleidoscope and
CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get.

The CodeGen/StackProtector.cpp usage may have a nullptr Scope
and can trigger an assertion failure, so I don't migrate it.

Reviewed By: #debug-info, dblaikie

Differential Revision: https://reviews.llvm.org/D93087
2020-12-11 12:45:22 -08:00
Marco Elver c28b18af19 [KernelAddressSanitizer] Fix globals exclusion for indirect aliases
GlobalAlias::getAliasee() may not always point directly to a
GlobalVariable. In such cases, try to find the canonical GlobalVariable
that the alias refers to.

Link: https://github.com/ClangBuiltLinux/linux/issues/1208

Reviewed By: dvyukov, nickdesaulniers

Differential Revision: https://reviews.llvm.org/D92846
2020-12-11 12:20:40 +01:00
David Sherwood 9b76160e53 [Support] Introduce a new InstructionCost class
This is the first in a series of patches that attempts to migrate
existing cost instructions to return a new InstructionCost class
in place of a simple integer. This new class is intended to be
as light-weight and simple as possible, with a full range of
arithmetic and comparison operators that largely mirror the same
sets of operations on basic types, such as integers. The main
advantage to using an InstructionCost is that it can encode a
particular cost state in addition to a value. The initial
implementation only has two states - Normal and Invalid - but these
could be expanded over time if necessary. An invalid state can
be used to represent an unknown cost or an instruction that is
prohibitively expensive.

This patch adds the new class and changes the getInstructionCost
interface to return the new class. Other cost functions, such as
getUserCost, etc., will be migrated in future patches as I believe
this to be less disruptive. One benefit of this new class is that
it provides a way to unify many of the magic costs in the codebase
where the cost is set to a deliberately high number to prevent
optimisations taking place, e.g. vectorization. It also provides
a route to represent the extremely high, and unknown, cost of
scalarization of scalable vectors, which is not currently supported.

Differential Revision: https://reviews.llvm.org/D91174
2020-12-11 08:12:54 +00:00
Hongtao Yu 705a4c149d [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 17:29:28 -08:00
Mitch Phillips 7ead5f5aa3 Revert "[CSSPGO] Pseudo probe encoding and emission."
This reverts commit b035513c06.

Reason: Broke the ASan buildbots:
  http://lab.llvm.org:8011/#/builders/5/builds/2269
2020-12-10 15:53:39 -08:00
Zequan Wu b5216b2950 [PGO] Enable preinline and cleanup when optimize for size
Differential Revision: https://reviews.llvm.org/D91673
2020-12-10 12:29:17 -08:00
Sanjay Patel 4f051fe374 [InstCombine] avoid crash sinking to unreachable block
The test is reduced from the example in D82005.

Similar to 94f6d365e, the test here would assert in
the DomTree when we tried to convert a select to a
phi with an unreachable block operand.

We may want to add some kind of guard code in DomTree
itself to avoid this sort of problem.
2020-12-10 13:10:26 -05:00
Sanjay Patel 12b684ae02 [VectorCombine] improve readability; NFC
If we are going to allow adjusting the pointer for GEPs,
rearranging the code a bit will make it easier to follow.
2020-12-10 13:10:26 -05:00
Hongtao Yu b035513c06 [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 09:50:08 -08:00
Jun Ma 137674f882 [TruncInstCombine] Remove scalable vector restriction
Differential Revision: https://reviews.llvm.org/D92819
2020-12-10 18:00:19 +08:00
Jianzhou Zhao ea981165a4 [dfsan] Track field/index-level shadow values in variables
*************
* The problem
*************
See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
DFSan always uses a 16bit shadow value for a variable with any type by
combining all shadow values of all bytes of the variable. So it cannot
distinguish two fields of a struct: each field's shadow value equals the
combined shadow value of all fields. This introduces an overtaint issue.

Consider a parsing function

   std::pair<char*, int> get_token(char* p);

where p points to a buffer to parse, the returned pair includes the next
token and the pointer to the position in the buffer after the token.

If the token is tainted, then both the returned pointer and int ar
tainted. If the parser keeps on using get_token for the rest parsing,
all the following outputs are tainted because of the tainted pointer.

The CL is the first change to address the issue.

**************************
* The proposed improvement
**************************
Eventually all fields and indices have their own shadow values in
variables and memory.

For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
[2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
{[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
primary type still have shadow values i16.

***************************
* An potential implementation plan
***************************

The idea is to adopt the change incrementially.

1) This CL
Support field-level accuracy at variables/args/ret in TLS mode,
load/store/alloca still use combined shadow values.

After the alloca promotion and SSA construction phases (>=-O1), we
assume alloca and memory operations are reduced. So if struct
variables do not relate to memory, their tracking is accurate at
field level.

2) Support field-level accuracy at alloca
3) Support field-level accuracy at load/store

These two should make O0 and real memory access work.

4) Support vector if necessary.
5) Support Args mode if necessary.
6) Support passing more accurate shadow values via custom functions if
necessary.

***************
* About this CL.
***************
The CL did the following

1) extended TLS arg/ret to work with aggregate types. This is similar
to what MSan does.

2) implemented how to map between an original type/value/zero-const to
its shadow type/value/zero-const.

3) extended (insert|extract)value to use field/index-level progagation.

4) for other instructions, propagation rules are combining inputs by or.
The CL converts between aggragate and primary shadow values at the
cases.

5) Custom function interfaces also need such a conversion because
all existing custom functions use i16. It is unclear whether custome
functions need more accurate shadow propagation yet.

6) Added test cases for aggregate type related cases.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92261
2020-12-09 19:38:35 +00:00
Sanjay Patel b2ef264096 [VectorCombine] allow peeking through an extractelt when creating a vector load
This is an enhancement to load vectorization that is motivated by
a pattern in https://llvm.org/PR16739.
Unfortunately, it's still not enough to make a difference there.
We will have to handle multi-use cases in some better way to avoid
creating multiple overlapping loads.

Differential Revision: https://reviews.llvm.org/D92858
2020-12-09 10:36:14 -05:00
Roman Lebedev e6f2a79d7a
[InstCombine] canonicalizeSaturatedAdd(): last fold is only valid for strict comparison (PR48390)
We could create uadd.sat under incorrect circumstances
if a select with -1 as the false value was canonicalized
by swapping the T/F values. Unlike the other transforms
in the same function, it is not invariant to equality.

Some alive proofs: https://alive2.llvm.org/ce/z/emmKKL

Based on original patch by David Green!

Fixes https://bugs.llvm.org/show_bug.cgi?id=48390

Differential Revision: https://reviews.llvm.org/D92717
2020-12-09 18:19:09 +03:00
Anton Afanasyev e5bf2e8989 [SLP] Use the width of value truncated just before storing
For stores chain vectorization we choose the size of vector
elements to ensure we fit to minimum and maximum vector register
size for the number of elements given. This patch corrects vector
element size choosing the width of value truncated just before
storing instead of the width of value stored.

Fixes PR46983

Differential Revision: https://reviews.llvm.org/D92824
2020-12-09 16:38:45 +03:00
Sander de Smalen d568cff696 [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF.
* Steps are scaled by `vscale`, a runtime value.
* Changes to circumvent the cost-model for now (temporary)
  so that the cost-model can be implemented separately.

This can vectorize the following loop [1]:

   void loop(int N, double *a, double *b) {
     #pragma clang loop vectorize_width(4, scalable)
     for (int i = 0; i < N; i++) {
       a[i] = b[i] + 1.0;
     }
   }

[1] This source-level example is based on the pragma proposed
separately in D89031. This patch only implements the LLVM part.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91077
2020-12-09 11:25:21 +00:00
Sander de Smalen adc37145de [LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable.
This patch removes a number of asserts that VF is not scalable, even though
the code where this assert lives does nothing that prevents VF being scalable.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91060
2020-12-09 11:25:21 +00:00
Joe Ellis 80c33de2d3 [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
This commit adds two new intrinsics.

- llvm.experimental.vector.insert: used to insert a vector into another
  vector starting at a given index.

- llvm.experimental.vector.extract: used to extract a subvector from a
  larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D91362
2020-12-09 11:08:41 +00:00
Philip Reames 5171b7b40e [indvars] Common a bit of code [NFC] 2020-12-08 15:25:48 -08:00
Anna Thomas 29356e3279 [ScalarizeMaskedMemIntrin] Add new PM support
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.

Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
2020-12-08 17:15:22 -05:00
Benjamin Kramer 5f18e2f31e Move createScalarizeMaskedMemIntrinPass to Scalar.h 2020-12-08 19:08:09 +01:00
Benjamin Kramer 10987e30be Remove unused include. NFC.
This is also a layering violation.
2020-12-08 19:03:56 +01:00
Anna Thomas 09f2f9605f [ScalarizeMaskedMemIntrinsic] Move from CodeGen into Transforms
ScalarizeMaskedMemIntrinsic is currently a codeGen level pass. The pass
is actually operating on IR level and does not use any code gen specific
passes.  It is useful to move it into transforms directory so that it
can be more widely used as a mid-level transform as well (apart from
usage in codegen pipeline).
In particular, we have a usecase downstream where we would like to use
this pass in our mid-level pipeline which operates on IR level.

The next change will be to add support for new PM.

Reviewers: craig.topper, apilipenko, skatkov
Reviewed-By: skatkov
Differential Revision: https://reviews.llvm.org/D92407
2020-12-08 12:25:58 -05:00
Xun Li 31e60b9133 [coroutine] should disable inline before calling coro split
This is a rework of D85812, which didn't land.
When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue
test plan: check-llvm, check-clang

In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue.
I believe it's not worth doing just to be able to use one constant string in one place.
Today, there are already 3 possible attribute values for "coroutine.presplit": c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)
If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill.

Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly.

Differential Revision: https://reviews.llvm.org/D92706
2020-12-08 08:53:08 -08:00
Teresa Johnson 77b509710c [ICP] Don't promote when target not defined in module
This guards against cases where the symbol was dead code eliminated in
the binary by ThinLTO, and we have a sample profile collected for one
binary but used to optimize another.

Most of the benefit from ICP comes from inlining the target, which we
can't do with only a declaration anyway. If this is in the pre-ThinLTO
link step (e.g. for instrumentation based PGO), we will attempt the
promotion again in the ThinLTO backend after importing anyway, and we
don't need the early promotion to facilitate that.

Differential Revision: https://reviews.llvm.org/D92804
2020-12-08 07:45:36 -08:00
Sjoerd Meijer 1e260f955d [LICM][docs] Document that LICM is also a canonicalization transform. NFC.
This documents that LICM is a canonicalization transform, which we discussed
recently in:

http://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html

but which was also discused earlier, e.g. in:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135058.html
2020-12-08 11:56:35 +00:00
Evgeniy Brevnov 2d1b024d06 [DSE][NFC] Need to be carefull mixing signed and unsigned types
Currently in some places we use signed type to represent size of an access and put explicit casts from unsigned to signed.
For example: int64_t EarlierSize = int64_t(Loc.Size.getValue());

Even though it doesn't loos bits (immidiatly) it may overflow and we end up with negative size. Potentially that cause later code to work incorrectly. A simple expample is a check that size is not negative.

I think it would be safer and clearer if we use unsigned type for the size and handle it appropriately.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D92648
2020-12-08 16:53:37 +07:00
Valentin Churavy 700cf7dcc9 [VNCoercion] Disallow coercion between different ni addrspaces
I'm not sure if it would be legal by the IR reference to introduce
an addrspacecast here, since the IR reference is a bit vague on
the exact semantics, but at least for our usage of it (and I
suspect for many other's usage) it is not. For us, addrspacecasts
between non-integral address spaces carry frontend information that the
optimizer cannot deduce afterwards in a generic way (though we
have frontend specific passes in our pipline that do propagate
these). In any case, I'm sure nobody is using it this way at
the moment, since it would have introduced inttoptrs, which
are definitely illegal.

Fixes PR38375

Co-authored-by: Keno Fischer <keno@alumni.harvard.edu>

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D50010
2020-12-07 20:19:48 -05:00
Sanjay Patel 5fe1a49f96 [SLP] fix typo in debug string; NFC 2020-12-07 15:09:21 -05:00
Bardia Mahjour 4db9b78c81 [LV] Epilogue Vectorization with Optimal Control Flow - Default Enablement
This patch enables epilogue vectorization by default per reviewer requests.

Differential Revision: https://reviews.llvm.org/D89566
2020-12-07 14:29:36 -05:00
Florian Hahn 32825e8636
[ConstraintElimination] Tweak placement in pipeline.
This patch adds the ConstraintElimination pass to the LTO pipeline and
also runs it after SCCP in the function simplification pipeline.

This increases the number of cases we can elimination. Pending further
tuning.
2020-12-07 19:08:40 +00:00
Simon Pilgrim 50dd1dba6e [IPO] Fix operator precedence warning. NFCI.
Check the entire assertion condition before && with the message.
2020-12-07 18:23:54 +00:00
Alexey Bataev 438682de6a [SLP]Merge reorder and reuse shuffles.
It is possible to merge reuse and reorder shuffles and reduce the total
cost of the ivectorization tree/number of final instructions.

Differential Revision: https://reviews.llvm.org/D92668
2020-12-07 07:50:00 -08:00
Jun Ma 216689ace7 [Coroutines] Add DW_OP_deref for transformed dbg.value intrinsic.
Differential Revision: https://reviews.llvm.org/D92462
2020-12-07 10:24:44 +08:00
Craig Topper 305fcc9122 [LoopIdiomRecognize] Merge a conditional operator with an earlier if and remove an extra temporary variable. NFC
The CountPrev variable was only used to forward a value from
the if statement to the conditional operator under the same
condition.

While there move some variable declarations to their first
assignment.
2020-12-06 15:23:18 -08:00
Fangrui Song 2832f3528c [Transforms] Delete unused declarations from NewGVN/CoroSplit/ValueMapper 2020-12-06 13:04:01 -08:00
Wenlei He 6b989a1710 [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining
This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon.

Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO.

**Interface**

There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile.

- Query base profile (`getBaseSamplesFor`)
Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function.

- Query context profile (`getContextSamplesFor`)
Context profile is a function's CFG profile for a given calling context. We can query context profile by context string.

- Track inlined context profile (`markContextSamplesInlined`)
When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function.

- Track not-inlined context profile (`promoteMergeContextSamplesTree`)
When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination.

**Implementation**

Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state.

On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes.

**Integration**

`SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`.

Differential Revision: https://reviews.llvm.org/D90125
2020-12-06 11:49:18 -08:00
Kazu Hirata ddb002d7c7 [InstCombine] Remove replacePointer (NFC)
The declaration was introduced on Feb 10, 2017 in commit
ba01ed00fe without a corresponding
definition.
2020-12-06 10:24:08 -08:00
Sanjay Patel 94f6d365e4 [InstCombine] avoid crash on phi with unreachable incoming block (PR48369) 2020-12-06 09:31:47 -05:00
Fangrui Song 204d0d51b3 [MemProf] Make __memprof_shadow_memory_dynamic_address dso_local in static relocation model
The x86-64 backend currently has a bug which uses a wrong register when for the GOTPCREL reference.
The program will crash without the dso_local specifier.
2020-12-05 21:36:31 -08:00
Florian Hahn 4ceecc820b [ConstraintElimination] Handle constraints with all zero var coeffs.
Constraints where all variable coefficients are 0 do not add any useful
information. When checking, we can check if they are always true/false.
2020-12-05 12:06:53 +00:00
Kazu Hirata 8006043b13 [IRCE] Remove unused IsSigned and its accessor (NFC)
IsSigned and its accessor, isSigned, were introduced on Oct 25, 2017
in commit 9ac7021a25.  The last use was
removed on Nov 20, 2017 in commit
268467869b.
2020-12-04 21:26:12 -08:00
Jianzhou Zhao a28db8b27a [dfsan] Add empty APIs for field-level shadow
This is a child diff of D92261.

This diff adds APIs that return shadow type/value/zero from origin
objects. For the time being these APIs simply returns primitive
shadow type/value/zero. The following diff will be implementing the
conversion.

As D92261 explains, some cases still use primitive shadow during
the incremential changes. The cases include
1) alloca/load/store
2) custom function IO
3) vectors
At the cases this diff does not use the new APIs, but uses primitive
shadow objects explicitly.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92629
2020-12-04 21:42:07 +00:00
Duncan P. N. Exon Smith d10f9863a5 ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC
Prepare to delete `AlignedCharArrayUnion` by migrating its users over to
`std::aligned_union_t`.

I will delete `AlignedCharArrayUnion` and its tests in a follow-up
commit so that it's easier to revert in isolation in case some
downstream wants to keep using it.

Differential Revision: https://reviews.llvm.org/D92516
2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith 5b267fb796 ADT: Stop peeking inside AlignedCharArrayUnion, NFC
Update all the users of `AlignedCharArrayUnion` to stop peeking inside
(to look at `buffer`) so that a follow-up patch can replace it with an
alias to `std::aligned_union_t`.

This was reviewed as part of https://reviews.llvm.org/D92512, but I'm
splitting this bit out to commit first to reduce churn in case the
change to `AlignedCharArrayUnion` needs to be reverted for some
unexpected reason.
2020-12-04 11:07:42 -08:00
Hiroshi Yamauchi f9c3954a6e Fix for Bug 48055.
Differential Revision: https://reviews.llvm.org/D92599
2020-12-04 11:05:01 -08:00
Arthur Eubanks 7f6f9f4cf9 [NewPM] Make pass adaptors less templatey
Currently PassBuilder.cpp is by far the file that takes longest to
compile. This is due to tons of templates being instantiated per pass.

Follow PassManager by using wrappers around passes to avoid making
the adaptors templated on the pass type. This allows us to move various
adaptors' run methods into .cpp files.

This reduces the compile time of PassBuilder.cpp on my machine from 66
to 39 seconds. It also reduces the size of opt from 685M to 676M.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D92616
2020-12-04 08:30:50 -08:00
Evgeniy Brevnov 061cebb46f [NFC][NARY-REASSOCIATE] Restructure code to aviod isPotentiallyReassociatable
Currently we have to duplicate the same checks in isPotentiallyReassociatable and tryReassociate. With simple pattern like add/mul this may be not a big deal. But the situation gets much worse when I try to add support for min/max. Min/Max may be represented by several instructions and can take different forms. In order reduce complexity for upcoming min/max support we need to restructure the code a bit to avoid mentioned code duplication.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88286
2020-12-04 16:19:43 +07:00
Evgeniy Brevnov f61c29b3a7 [NARY-REASSOCIATE] Simplify traversal logic by post deleting dead instructions
Currently we delete optimized instructions as we go. That has several negative consequences. First it complicates traversal logic itself. Second if newly generated instruction has been deleted the traversal is repeated from scratch.

But real motivation for the change is upcoming change with support for min/max reassociation. Here we employ SCEV expander to generate code. As a result newly generated instructions may be inserted not right before original instruction (because SCEV may do hoisting) and there is no way to know 'next' instruction.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88285
2020-12-04 16:17:50 +07:00
Kazu Hirata e2fc11cf9f [JumpThreading] Call eraseBlock when folding a conditional branch
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.

Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.

Differential Revision: https://reviews.llvm.org/D92608
2020-12-03 23:50:17 -08:00
Max Kazantsev 12b6c5e682 Return "[IndVars] ICmpInst should not prevent IV widening"
This reverts commit 4bd35cdc3a.

The patch was reverted during the investigation. The investigation
shown that the patch did not cause any trouble, but just exposed
the existing problem that is addressed by the previous patch
"[IndVars] Quick fix LHS/RHS bug". Returning without changes.
2020-12-04 12:34:43 +07:00
Max Kazantsev 3df0daceb2 [IndVars] Quick fix LHS/RHS bug
The code relies on fact that LHS is the NarrowDef but never
really checks it. Adding the conservative restrictive check,
will follow-up with handling of case where RHS is a NarrowDef.
2020-12-04 12:34:42 +07:00
Jianzhou Zhao 80e326a8c4 [dfsan] Support passing non-i16 shadow values in TLS mode
This is a child diff of D92261.

It extended TLS arg/ret to work with aggregate types.

For a function
  t foo(t1 a1, t2 a2, ... tn an)
Its arguments shadow are saved in TLS args like
  a1_s, a2_s, ..., an_s
TLS ret simply includes r_s. By calculating the type size of each shadow
value, we can get their offset.

This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls
from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp.

Note that this change does not add test cases for overflowed TLS
arg/ret because this is hard to test w/o supporting aggregate shdow
types. We will be adding them after supporting that.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92440
2020-12-04 02:45:07 +00:00
Philip Reames 0c866a3d6a [LoopVec] Support non-instructions as argument to uniform mem ops
The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists.

This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of *it's* operand. Only instructions within the loop have their uses scanned.)

In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction.  This allows detection of uniform mem ops involving global addresses.

Differential Revision: https://reviews.llvm.org/D92056
2020-12-03 14:51:44 -08:00
dfukalov 2ce38b3f03 [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Max Kazantsev 4bd35cdc3a Revert "[IndVars] ICmpInst should not prevent IV widening"
This reverts commit 0c9c6ddf17.

We are seeing some failures with this patch locally. Not clear
if it's causing them or just triggering a problem in another
place. Reverting while investigating.
2020-12-03 18:01:41 +07:00
modimo c1ba991e8d [NFC] Fix typo 2020-12-02 22:23:57 -08:00
Jianzhou Zhao bd726d2796 [dfsan] Rename ShadowTy/ZeroShadow with prefix Primitive
This is a child diff of D92261.

After supporting field/index-level shadow, the existing shadow with type
i16 works for only primitive types.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92459
2020-12-03 05:31:01 +00:00
Florian Hahn 2304528bb5 [ConstraintElimination] Make sure arguments of std:pow match.
This should fix a build failure on some systems, e.g. solaris11-sparcv9
http://lab.llvm.org:8014/#/builders/22
2020-12-02 22:23:26 +00:00
Hongtao Yu 24d4291ca7 [CSSPGO] Pseudo probes for function calls.
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against  the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756
2020-12-02 13:45:20 -08:00
Jianzhou Zhao dad5d95883 [dfsan] Rename CachedCombinedShadow to be CachedShadow
At D92261, this type will be used to cache both combined shadow and
converted shadow values.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92458
2020-12-02 21:39:16 +00:00
jasonliu a65d8c5d72 [XCOFF][AIX] Generate LSDA data and compact unwind section on AIX
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
   It doesn't use the GCC personality rountine, because the
   interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D91455
2020-12-02 18:42:44 +00:00
Bardia Mahjour a7e2c26939 [LV] Epilogue Vectorization with Optimal Control Flow (Recommit)
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.

Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D89566
2020-12-02 10:09:56 -05:00
Sanjay Patel 56fd29e93b [SLP] use 'match' for binop/select; NFC
This might be a small improvement in readability, but the
real motivation is to make it easier to adapt the code to
deal with intrinsics like 'maxnum' and/or integer min/max.

There is potentially help in doing that with D92086, but
we might also just add specialized wrappers here to deal
with the expected patterns.
2020-12-02 09:04:08 -05:00
Alex Zinenko 240dd92432 [OpenMPIRBuilder] forward arguments as pointers to outlined function
OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.

Reviewed By: jdoerfert, llitchev

Differential Revision: https://reviews.llvm.org/D92189
2020-12-02 14:59:41 +01:00
David Sherwood 71bd59f0cb [SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute
In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:

  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
  ...
  !2 = !{!2, !3, !4}
  !3 = !{!"llvm.loop.vectorize.width", i32 8}
  !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.

Differential Revision: https://reviews.llvm.org/D88962
2020-12-02 13:23:43 +00:00
Chen Zheng 3cb7d62452 [LSR][NFC] don't collect chains when isNumRegsMajorCostOfLSR is false.
Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D92159
2020-12-01 22:29:33 -05:00
Jianzhou Zhao 405ea2b93d [msan] Replace 8 by kShadowTLSAlignment
Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D92275
2020-12-02 01:09:49 +00:00
Fangrui Song a5309438fe static const char *const foo => const char foo[]
By default, a non-template variable of non-volatile const-qualified type
having namespace-scope has internal linkage, so no need for `static`.
2020-12-01 10:33:18 -08:00
Bardia Mahjour c94af03f7f Revert "[LV] Epilogue Vectorization with Optimal Control Flow"
This reverts commit 9c5504adce.
Reverting to investigate build failure in http://lab.llvm.org:8011/#/builders/98/builds/1461/steps/9
2020-12-01 12:50:36 -05:00
Bardia Mahjour 9c5504adce [LV] Epilogue Vectorization with Optimal Control Flow
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.

Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D89566
2020-12-01 12:04:29 -05:00
Nikita Popov 624af932a8 [MemCpyOpt] Port to MemorySSA
This is a straightforward port of MemCpyOpt to MemorySSA following
the approach of D26739. MemDep queries are replaced with MSSA queries
without changing the overall structure of the pass. Some care has
to be taken to account for differences between these APIs
(MemDep also returns reads, MSSA doesn't).

Differential Revision: https://reviews.llvm.org/D89207
2020-12-01 17:57:41 +01:00
Clement Courbet 735e6c888e [MergeICmps] Fix missing split.
We were not correctly splitting a blocks for chains of length 1.

Before that change, additional instructions for blocks in chains of
length 1 were not split off from the block before removing (this was
done correctly for chains of longer size).
If this first block contained an instruction referenced elsewhere,
deleting the block, would result in invalidation of the produced value.

This caused a miscompile which motivated D92297 (before D17993,
nonnull and dereferenceable attributed were not added so MergeICmps were
not triggered.) The new test gep-references-bb.ll demonstrate the issue.

The regression was introduced in
rG0efadbbcdeb82f5c14f38fbc2826107063ca48b2.

This supersedes D92364.

Test case by MaskRay (Fangrui Song).

Differential Revision: https://reviews.llvm.org/D92375
2020-12-01 16:50:55 +01:00
Sanjay Patel 9f60b8b3d2 [InstCombine] canonicalize sign-bit-shift of difference to ext(icmp)
icmp is the preferred spelling in IR because icmp analysis is
expected to be better than any other analysis. This should
lead to more follow-on folding potential.

It's difficult to say exactly what we should do in codegen to
compensate. For example on AArch64, which of these is preferred:
	sub	w8, w0, w1
	lsr	w0, w8, #31

vs:
	cmp	w0, w1
	cset	w0, lt

If there are perf regressions, then we should deal with those in
codegen on a case-by-case basis.

A possible motivating example for better optimization is shown in:
https://llvm.org/PR43198 but that will require other transforms
before anything changes there.

Alive proof:
https://rise4fun.com/Alive/o4E

  Name: sign-bit splat
  Pre: C1 == (width(%x) - 1)
  %s = sub nsw %x, %y
  %r = ashr %s, C1
  =>
  %c = icmp slt %x, %y
  %r = sext %c

  Name: sign-bit LSB
  Pre: C1 == (width(%x) - 1)
  %s = sub nsw %x, %y
  %r = lshr %s, C1
  =>
  %c = icmp slt %x, %y
  %r = zext %c
2020-12-01 09:58:11 -05:00
Florian Hahn 7a4f1d59b8 [ConstraintElimination] Decompose GEP %ptr, ZEXT(SHL()).
Add support to decompose a GEP with a ZEXT(SHL()) operand.
2020-12-01 14:23:21 +00:00
Bhramar Vatsa fd679107d6
[InstCombine] Optimize away the unnecessary multi-use sign-extend
C.f. https://bugs.llvm.org/show_bug.cgi?id=47765

Added a case for handling the sign-extend (Shl+AShr) for multiple uses,
to optimize it away for an individual use,
when the demanded bits aren't affected by sign-extend.

https://rise4fun.com/Alive/lgf

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D91343
2020-12-01 16:54:00 +03:00
Roman Lebedev 94ead0190f
[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold, 2
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
2020-12-01 16:54:00 +03:00
Roman Lebedev 52533b52b8
Revert "[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold"
It seems i have missed checklines, temporairly reverting,
will reland momentairly..

This reverts commit aa1aa13509.
2020-12-01 15:47:04 +03:00
Roman Lebedev aa1aa13509
[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
2020-12-01 15:13:08 +03:00
Roman Lebedev 8e29e20e0d
[InstCombine] Evaluate new shift amount for sext(ashr(shl(trunc()))) fold in wide type (PR48343)
It is not correct to compute that new shift amount in it's narrow type
and only then extend it into the wide type:

----------------------------------------
Optimization: PR48343 good
Precondition: (width(%X) == width(%r))
  %o0 = trunc %X
  %o1 = shl %o0, %Y
  %o2 = ashr %o1, %Y
  %r = sext %o2
=>
  %n0 = sext %Y
  %n1 = sub width(%o0), %n0
  %n2 = sub width(%X), %n1
  %n3 = shl %X, %n2
  %r = ashr %n3, %n2

Done: 2016
Optimization is correct!

----------------------------------------
Optimization: PR48343 bad
Precondition: (width(%X) == width(%r))
  %o0 = trunc %X
  %o1 = shl %o0, %Y
  %o2 = ashr %o1, %Y
  %r = sext %o2
=>
  %n0 = sub width(%o0), %Y
  %n1 = sub width(%X), %n0
  %n2 = sext %n1
  %n3 = shl %X, %n2
  %r = ashr %n3, %n2

Done: 1
ERROR: Domain of definedness of Target is smaller than Source's for i9 %r

Example:
%X i9 = 0x000 (0)
%Y i4 = 0x3 (3)
%o0 i4 = 0x0 (0)
%o1 i4 = 0x0 (0)
%o2 i4 = 0x0 (0)
%n0 i4 = 0x1 (1)
%n1 i4 = 0x8 (8, -8)
%n2 i9 = 0x1F8 (504, -8)
%n3 i9 = 0x000 (0)
Source value: 0x000 (0)
Target value: undef


I.e. we should be computing it in the wide type from the beginning.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48343
2020-12-01 15:13:07 +03:00
Roman Lebedev 15f8060f6f
[SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction
There is no correctness need for that, and since we allow live-out
uses, this could theoretically happen, because currently nothing
will move the cond to right before the branch in those tests.
But regardless, lifting that restriction even makes the transform
easier to understand.

This makes the transform happen in 81 more cases (+0.55%)
)
2020-12-01 15:13:06 +03:00
Cullen Rhodes cba4accda0 [LV] Clamp VF hint when unsafe
In the following loop the dependence distance is 2 and can only be
vectorized if the vector length is no larger than this.

  void foo(int *a, int *b, int N) {
    #pragma clang loop vectorize(enable) vectorize_width(4)
    for (int i=0; i<N; ++i) {
      a[i + 2] = a[i] + b[i];
    }
  }

However, when specifying a VF of 4 via a loop hint this loop is
vectorized. According to [1][2], loop hints are ignored if the
optimization is not safe to apply.

This patch introduces a check to bail of vectorization if the user
specified VF is greater than the maximum feasible VF, unless explicitly
forced with '-force-vector-width=X'.

[1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
[2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations

Reviewed By: sdesmalen, fhahn, Meinersbur

Differential Revision: https://reviews.llvm.org/D90687
2020-12-01 11:30:34 +00:00
Caroline Concatto 4b0ef2b075 [NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type
This patch replaces the attribute  `unsigned VF`  in the class
IntrinsicCostAttributes by `ElementCount VF`.
This is a non-functional change to help upcoming patches to compute the cost
model for scalable vector inside this class.

Differential Revision: https://reviews.llvm.org/D91532
2020-12-01 11:12:51 +00:00
Florian Hahn efa9728a50 [ConstraintElimination] Decompose GEP %ptr, SHL().
Add support the decompose a GEP with an SHL operand.
2020-12-01 10:58:36 +00:00
Sjoerd Meijer f44ba25135 ExtractValue instruction costs
Instruction ExtractValue wasn't handled in
LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled
as a mul which is not really accurate. Since it is free (most of the times),
this now gets a cost of 0 using getInstructionCost.

This is a follow-up of D92208, that required changing this regression test.
In a follow up I will look at InsertValue which also isn't handled yet.

Differential Revision: https://reviews.llvm.org/D92317
2020-12-01 10:42:23 +00:00
Greg Parker bcc802fa36 [DSE] Remove a redundant call to getLocForWriteEx()
Differential Revision: https://reviews.llvm.org/D92263
2020-11-30 21:12:24 -08:00
Mircea Trofin 5fe10263ab [llvm][inliner] Reuse the inliner pass to implement 'always inliner'
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Differential Revision: https://reviews.llvm.org/D91567
2020-11-30 12:03:39 -08:00
Hongtao Yu 64fa8cce22 [CSSPGO] Pseudo probe instrumentation pass
This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86499
2020-11-30 10:16:54 -08:00
Florian Hahn fe83adb05a
[VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC).
VPPredInstPHIRecipe is one of the recipes that was missed during the
initial conversion. This patch adjusts the recipe to also manage its
operand using VPUser.
2020-11-30 13:09:58 +00:00
Roman Lebedev b0e9b7c59f
[NFC][SimplifyCFG] Add STATISTIC() to the FoldValueComparisonIntoPredecessors() fold 2020-11-30 12:27:16 +03:00
Max Kazantsev 0c9c6ddf17 [IndVars] ICmpInst should not prevent IV widening
If we decided to widen IV with zext, then unsigned comparisons
should not prevent widening (same for sext/sign comparisons).
The result of comparison in wider type does not change in this case.

Differential Revision: https://reviews.llvm.org/D92207
Reviewed By: nikic
2020-11-30 10:51:31 +07:00
Fangrui Song 5408fdcd78 [VPlan] Fix -Wunused-variable after a813090072 2020-11-29 10:38:01 -08:00
Florian Hahn 4bc9b909d7
[VPlan] Use VPValue and VPUser ops to print VPReplicateRecipe. 2020-11-29 18:28:27 +00:00
Florian Hahn a813090072
[VPlan] Manage stored values of interleave groups using VPUser (NFC)
Interleave groups also depend on the values they store. Manage the
stored values as VPUser operands. This is currently a NFC, but is
required to allow VPlan transforms and to manage generated vector values
exclusively in VPTransformState.
2020-11-29 17:24:36 +00:00
Andrew Litteken a8a43b6338 Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner."
Reverting commit due to address sanitizer errors.

> Extracting the similar regions is the first step in the IROutliner.
> 
> Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
> sort them by how many instructions will be removed.  Each
> IRSimilarityCandidate is used to define an OutlinableRegion.  Each
> region is ordered by their occurrence in the Module and the regions that
> are not compatible with previously outlined regions are discarded.
> 
> Each region is then extracted with the CodeExtractor into its own
> function.
> 
> We test that correctly extract in:
> test/Transforms/IROutliner/extraction.ll
> test/Transforms/IROutliner/address-taken.ll
> test/Transforms/IROutliner/outlining-same-globals.ll
> test/Transforms/IROutliner/outlining-same-constants.ll
> test/Transforms/IROutliner/outlining-different-structure.ll
> 
> Reviewers: paquette, jroelofs, yroux
> 
> Differential Revision: https://reviews.llvm.org/D86975

This reverts commit bf899e8913.
2020-11-27 19:55:57 -06:00
Andrew Litteken bf899e8913 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-11-27 19:08:29 -06:00
Florian Hahn ae008798a4
[VPlan] Use VPTransformState::set in widenGEP.
This patch updates widenGEP to manage the resulting vector values using
the VPValue of VPWidenGEP recipe.
2020-11-27 17:01:55 +00:00
Francesco Petrogalli 8e0148dff7 [AllocaInst] Update `getAllocationSizeInBits` to return `TypeSize`.
Reviewed By: peterwaller-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D92020
2020-11-27 16:39:10 +00:00
Sjoerd Meijer 10ad64aa3b [SLP] Dump Tree costs. NFC.
This adds LLVM_DEBUG messages to dump the (intermediate) tree cost
calculations, which is useful to trace and see how the final cost is
calculated.
2020-11-27 11:37:33 +00:00
Roman Lebedev b33fbbaa34
Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions
This was orginally committed in 2245fb8aaa.
but was immediately reverted in f3abd54958
because of a PHI handling issue.

Original commit message:

1. It doesn't make sense to enforce that the bonus instruction
   is only used once in it's basic block. What matters is
   whether those user instructions fit within our budget, sure,
   but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
   are only used within their basic block. Perhaps the branch
   condition isn't using the value computed by said bonus instruction,
   and said bonus instruction is simply being calculated
   to be used in successors?

So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.

Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.

We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.

This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.

The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
2020-11-27 12:47:15 +03:00
Wang, Pengfei 8dcf8d1da5 [msan] Fix bugs when instrument x86.avx512*_cvt* intrinsics.
Scalar intrinsics x86.avx512*_cvt* have an extra rounding mode operand.
We can directly ignore it to reuse the SSE/AVX math.
This fix the bug https://bugs.llvm.org/show_bug.cgi?id=48298.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D92206
2020-11-27 16:33:14 +08:00
Markus Lavin 808fcfe594 Revert "[DebugInfo] Improve dbg preservation in LSR."
This reverts commit 06758c6a61.

Bug: https://bugs.llvm.org/show_bug.cgi?id=48166
Additional discussion in: https://reviews.llvm.org/D91711
2020-11-27 08:52:32 +01:00
Max Kazantsev faf183874c [IndVars] LCSSA Phi users should not prevent widening
When widening an IndVar that has LCSSA Phi users outside
the loop, we can safely widen it as usual and then truncate
the result outside the loop without hurting the performance.

Differential Revision: https://reviews.llvm.org/D91593
Reviewed By: skatkov
2020-11-27 11:19:54 +07:00
Roman Lebedev f3abd54958
Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions"
Many bots are unhappy, at the very least missed a few codegen tests,
and possibly this has a logic hole inducing a miscompile
(will be really awesome to have ready reproducer..)

Need to investigate.

This reverts commit 2245fb8aaa.
2020-11-26 23:13:43 +03:00
Roman Lebedev 2245fb8aaa
[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions
1. It doesn't make sense to enforce that the bonus instruction
   is only used once in it's basic block. What matters is
   whether those user instructions fit within our budget, sure,
   but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
   are only used within their basic block. Perhaps the branch
   condition isn't using the value computed by said bonus instruction,
   and said bonus instruction is simply being calculated
   to be used in successors?

So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.

Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.

We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.

This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.

The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
2020-11-26 22:51:22 +03:00
Roman Lebedev 65db7d38e0
[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold 2020-11-26 22:51:21 +03:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Florian Hahn bd0b1311db
[VPlan] Turn VPReplicateRecipe into a VPValue.
Update VPReplicateRecipe to inherit from VPValue. This still does not
update scalarizeInstruction to set the result for the VPValue of
VPReplicateRecipe, because this first requires tracking scalar values in
VPTransformState.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D91500
2020-11-26 13:50:24 +00:00
David Stenberg 384996f9e1 [IndVarSimplify] Fix Modified status when handling dead PHI nodes
When bailing out in rewriteLoopExitValues() you could be left with PHI
nodes in the DeadInsts vector. Those would be not handled by the use of
RecursivelyDeleteTriviallyDeadInstructions() in IndVarSimplify. This
resulted in the IndVarSimplify pass returning an incorrect modified
status. This was caught by the expensive check introduced in D86589.

This patches changes IndVarSimplify so that it deletes those PHI nodes,
using RecursivelyDeleteDeadPHINode().

This fixes PR47486.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D91153
2020-11-26 14:28:21 +01:00
Zhengyang Liu 345fcccb33 Fix use-of-uninitialized-value in rG75f50e15bf8f
Differential Revision: https://reviews.llvm.org/D71126
2020-11-26 01:39:22 -07:00
Max Kazantsev 664e1da485 [LoopLoadElim] Make sure all loops are in simplify form. PR48150
LoopLoadElim may end up expanding an AddRec from a loop
which is not the current loop. This loop may not be in simplify
form. We figure it out after the no-return point, so cannot bail
in this case.

AddRec requires simplify form to expand. The only way to ensure
this does not crash is to simplify all loops beforehand.

The issue only exists in new PM. Old PM requests LoopSimplify
required pass and it simplifies all loops before the opt begins.

Differential Revision: https://reviews.llvm.org/D91525
Reviewed By: asbirlea, aeubanks
2020-11-26 10:51:11 +07:00
Roman Lebedev a8d74517dc
[PassManager] Run Induction Variable Simplification pass *after* Recognize loop idioms pass, not before
Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.

LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.

Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.

While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91800
2020-11-25 19:20:07 +03:00
Cullen Rhodes 1ba4b82f67 [LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits
MaxSafeRegisterWidth is a misnomer since it actually returns the maximum
safe vector width. Register suggests it relates directly to a physical
register where it could be a vector spanning one or more physical
registers.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91727
2020-11-25 13:06:26 +00:00
Florian Hahn ad5b83ddcf
[VPlan] Add VPReductionSC to VPUser::classof, unify VPValue IDs.
This is a follow-up to 00a6601136 to make
isa<VPReductionRecipe> work and unifies the VPValue ID names, by making
sure they all consistently start with VPV*.
2020-11-25 11:08:25 +00:00
David Green e0c479cd0e [VPlan] Switch VPWidenRecipe to be a VPValue
Similar to other patches, this makes VPWidenRecipe a VPValue. Because of
the way it interacts with the reduction code it also slightly alters the
way that VPValues are registered, removing the up front NeedDef and
using getOrAddVPValue to create them on-demand if needed instead.

Differential Revision: https://reviews.llvm.org/D88447
2020-11-25 08:25:06 +00:00
David Green 00a6601136 [VPlan] Turn VPReductionRecipe into a VPValue
This converts the VPReductionRecipe into a VPValue, like other
VPRecipe's in preparation for traversing def-use chains. It also makes
it a VPUser, now storing the used VPValues as operands.

It doesn't yet change how the VPReductionRecipes are created. It will
need to call replaceAllUsesWith from the original recipe they replace,
but that is not done yet as VPWidenRecipe need to be created first.

Differential Revision: https://reviews.llvm.org/D88382
2020-11-25 08:25:05 +00:00
Kazu Hirata 1c82d32089 [CHR] Use pred_size (NFC) 2020-11-24 22:52:30 -08:00
Max Kazantsev 28d7ba1543 [IndVars] Use more precise context when eliminating narrowing
When deciding to widen narrow use, we may need to prove some facts
about it. For proof, the context is used. Currently we take the instruction
being widened as the context.

However, we may be more precise here if we take as context the point that
dominates all users of instruction being widened.

Differential Revision: https://reviews.llvm.org/D90456
Reviewed By: skatkov
2020-11-25 11:47:39 +07:00
Philip Reames 10ddb927c1 [SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC]
Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute.  Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.
2020-11-24 18:47:49 -08:00
Sanjay Patel 678b9c5dde [InstCombine] try difference-of-shifts factorization before negator
We need to preserve wrapping flags to allow better folds.
The cases with geps may be non-intuitive, but that appears to agree with Alive2:
https://alive2.llvm.org/ce/z/JQcqw7
We create 'nsw' ops independent from the original wrapping on the sub.
2020-11-24 13:56:30 -05:00
Philip Reames 075468621c [LoopVec] Add a minor clarifying comment 2020-11-24 10:45:06 -08:00
Teresa Johnson 6e4c1cf293 [ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends
Previously this option could be used to skip devirtualizations of the
given functions in regular LTO and in the ThinLTO indexing step. This
change allows them to be skipped in the backend as well, which is useful
when debugging WPD in a distributed ThinLTO backend.

Differential Revision: https://reviews.llvm.org/D91812
2020-11-24 09:35:07 -08:00
Ayal Zaks 32d9a386bf [LV] Keep Primary Induction alive when folding tail by masking
Fix PR47390.

The primary induction should be considered alive when folding tail by masking,
because it will be used by said masking; even when it may otherwise appear
useless: feeding only its own 'bump', which is correctly considered dead, and
as the 'bump' of another induction variable, which may wrongfully want to
consider its bump = the primary induction, dead.

Differential Revision: https://reviews.llvm.org/D92017
2020-11-24 15:12:54 +02:00
Arthur Eubanks 932e4f8815 [FunctionAttrs][NPM] Fix handling of convergent
The legacy pass didn't properly detect indirect calls.

We can still remove the convergent attribute when there are indirect
calls. The LangRef says:

> When it appears on a call/invoke, the convergent attribute indicates
that we should treat the call as though we’re calling a convergent
function. This is particularly useful on indirect calls; without this we
may treat such calls as though the target is non-convergent.

So don't skip handling of convergent when there are unknown calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89826
2020-11-23 21:09:41 -08:00
Philip Reames 1a9c72f8a8 [LoopVec] Reuse a lambda [NFC]
Minor code refactor to improve readability.
2020-11-23 21:07:34 -08:00
Philip Reames b06a2ad94f [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE)
A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane.

This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic.

In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs.  The discussion on that item remains unsettled and is pending larger architectural discussion.  We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled.

Differential Revision: https://reviews.llvm.org/D91398
2020-11-23 15:32:17 -08:00
Sanjay Patel ab29f091eb [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps
This is a retry of 324a53205. I cautiously reverted that at 6aa3fc4
because the rules about gep math were not clear. Since then, we
have added this line to LangRef for gep inbounds:
"The successive addition of offsets (without adding the base address)
does not wrap the pointer index type in a signed sense (nsw)."

See D90708 and post-commit comments on the revert patch for more details.
2020-11-23 16:50:09 -05:00
Arthur Eubanks 3c811ce4f3 [NPM] Share pass building options with legacy PM
We should share options when possible.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91741
2020-11-23 13:04:05 -08:00
Sjoerd Meijer 33b2c88fa8 [LoopFlatten] Widen IV, support ZExt.
I disabled the widening in fa5cb4b because it run in an assert, which was
related to replacing values with different types. I forgot that an extend could
also be a zero-extend, which I have added now. This means that the approach now
is to create and insert a trunc value of the outerloop for each user, and use
that to replace IV values.

Differential Revision: https://reviews.llvm.org/D91690
2020-11-23 08:57:19 +00:00
Kazu Hirata df73b8c174 [ValueMapper] Remove unused declaration remapFunction (NFC)
The function declaration with two parameters was introduced on Apr 16
2016 in commit f0d73f95c1 without a
corresponding definition.
2020-11-22 21:52:03 -08:00
Kazu Hirata 186d129320 [hwasan] Remove unused declaration shadowBase (NFC)
The function was introduced on Jan 23, 2019 in commit
73078ecd38.

Its definition was removed on Oct 27, 2020 in commit
0930763b4b, leaving the declaration
unused.
2020-11-22 20:08:51 -08:00
Kazu Hirata def7cfb7ff [InstCombine] Use is_contained (NFC) 2020-11-21 15:47:11 -08:00
Alexey Bataev 0b420d674a [SLP][NFC]Fix assert condition in newTreeEntry, NFC. 2020-11-20 13:25:21 -08:00
Hongtao Yu f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Jamie Schmeiser 7f6360cdc6 Reland: Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager.  Enable memoryssa for loopsink with new pass manager.  This
combination exposed a bug that was previously fixed for loopsink
without memoryssa.  When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation.  This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.

Respond to review comments.  Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control.  Update tests accordingly.

Respond to review comments.  Add options controlling whether memoryssa is
used for loop sink, defaulting to off.  Expand testing based on these
options.

Respond to review comments.  Properly indicated preserved analyses.

This relanding addresses a compile-time performance problem by moving
test for profile data earlier to avoid unnecessary computations.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
2020-11-20 10:26:33 -05:00
Arthur Eubanks b77436047a [PGO] Make -disable-preinline work with NPM
Fixes cspgo_profile_summary.ll under NPM.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D91826
2020-11-19 22:58:55 -08:00
Arthur Eubanks 513d165b80 Port -lower-matrix-intrinsics-minimal to NPM
This reuses the existing lower-matrix-intrinsics pass rather than going
the legacy pass route of creating a new pass.

Use this new variant in the NPM -O0 pipeline.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91811
2020-11-19 17:42:48 -08:00
Florian Hahn 7fa14a7c69
[ConstraintElimination] Decompose GEP with arbitrary offsets.
This patch decomposes `GEP %x, %offset` as  0 + 1 * %x + 1 * %off.
2020-11-19 22:49:21 +00:00
Geoffrey Martin-Noble b156514f8d Remove unused private fields
Unused since https://reviews.llvm.org/D91762 and triggering
-Wunused-private-field

```
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:365:13: error: private field 'GetArgTLS' is not used [-Werror,-Wunused-private-field]
  Constant *GetArgTLS;
            ^
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:366:13: error: private field 'GetRetvalTLS' is not used [-Werror,-Wunused-private-field]
  Constant *GetRetvalTLS;
```

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D91820
2020-11-19 13:54:54 -08:00
Roman Lebedev a91e96702a
[InstCombine] Fold `and(shl(zext(x), width(SIGNMASK) - width(%x)), SIGNMASK)` to `and(sext(%x), SIGNMASK)`
One less instruction and reducing use count of zext.
As alive2 confirms, we're fine with all the weird combinations of
undef elts in constants, but unless the shift amount was undef
for a lane, we must sanitize undef mask to zero, since sign bits
are no longer zeros.

https://rise4fun.com/Alive/d7r
```
----------------------------------------
Optimization: zz
Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2))
  %o0 = zext %x
  %o1 = shl %o0, C1
  %r = and %o1, C2
=>
  %n0 = sext %x
  %r = and %n0, C2

Done: 2016
Optimization is correct!
```
2020-11-20 00:31:27 +03:00
Jianzhou Zhao 6c1c308c0e Remove deadcode from DFSanFunction::get*TLS*()
clean more deadcode after D84704

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D91762
2020-11-19 21:10:37 +00:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Sander de Smalen 41c9f4c1ce [LoopVectorize] NFC: Fix unused variable warning for MaxSafeDepDist
rGf571fe6df585127d8b045f8e8f5b4e59da9bbb73 led to a warning of an unused
variable for MaxSafeDepDist (written but not used). It seems this
variable and assignment can be safely removed.
2020-11-19 17:41:35 +00:00
Joseph Huber da8bec47ab [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.

Reviewers: jdoerfert grokos

Differential Revision: https://reviews.llvm.org/D87946
2020-11-19 12:01:53 -05:00
Simon Moll a1de391dae [LV][NFC-ish] Allow vector widths over 256 elements
The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D91518
2020-11-19 10:58:29 +01:00
Max Kazantsev 515105f46b [NFC] Remove comment (commited ahead of time by mistake) 2020-11-19 16:28:34 +07:00
Max Kazantsev 7c601d09a7 [NFC] Move code earlier as preparation for further changes 2020-11-19 16:27:23 +07:00
Andrew Wei ea7ab5a42c [IndVarSimplify] Notify top most loop to drop cached exit counts
Some nested loops may share the same ExitingBB, so after we finishing FoldExit,
we need to notify OuterLoop and SCEV to drop any stored trip count.

Patched by: guopeilin
Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D91325
2020-11-19 15:37:54 +08:00
Kazu Hirata 43c0e4f665 [Transforms] Use llvm::is_contained (NFC) 2020-11-18 20:42:22 -08:00
Jamie Schmeiser cff479b145 Revert "Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."""
This reverts commit e29292969b.

This apparently causes a regression in compile time (ie, it slows down).
2020-11-18 16:07:16 -05:00
Roman Lebedev 7bf89c2174
[NFC][Reassociate] Delay checking isLoadCombineCandidate() until after ShouldConvertOrWithNoCommonBitsToAdd() but before haveNoCommonBitsSet()
This appears to improve -O3 compile-time performance somewhat:
https://llvm-compile-time-tracker.com/compare.php?from=87369c626114ae17f4c637635c119e6de0856a9a&to=c04b8271e1609b0dfb20609b40844b0c4324517e&stat=instructions
It doesn't look like delaying it until after haveNoCommonBitsSet() is better:
https://llvm-compile-time-tracker.com/compare.php?from=c04b8271e1609b0dfb20609b40844b0c4324517e&to=b2943d450eaf41b5f76d2dc7350f0a279f64cd99&stat=instructions
2020-11-18 23:57:12 +03:00
Jamie Schmeiser e29292969b Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.""
This reverts commit 562addba65.

Reverted change too quickly, the failing test cases passed on the next build.
So reverting revert (to include the changes).
2020-11-18 15:33:02 -05:00
Florian Hahn 2fead1ac61
[ConstraintElimination] Decompose add nuw/sub nuw.
Make use of the more flexible constraint handling added in
a8a79c9069 to decompose add nuw/sub nuw.
2020-11-18 20:29:30 +00:00
Joseph Huber 97e55cfef5 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D89802
2020-11-18 15:28:39 -05:00
Nikita Popov f4a3969bff [Inline] Fix incorrectly dropped noalias metadata
This is the same fix as 23aeadb89d,
just for CloneScopedAliasMetadata rather than PropagateCallSiteMetadata.

In this case the previous outcome was incorrectly dropped metadata,
as it was not part of the computed metadata map.

The real change in the test is that the first load now retains
metadata, the rest of the changes are due to changes in metadata
numbering.
2020-11-18 21:22:50 +01:00
Jamie Schmeiser 562addba65 Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."
This reverts commit d4ba28bddc.
2020-11-18 15:17:53 -05:00
Nikita Popov 23aeadb89d [Inline] Fix incorrect noalias metadata application (PR48209)
The VMap also contains a mapping from Argument => Instruction,
where the instruction is part of the original function, not the
inlined one. The code was assuming that all the instructions in
the VMap were inlined.

This was a pre-existing problem for the loop access metadata, but
was extended to the more common noalias metadata by
27f647d117, thus causing miscompiles.

There is a similar assumption inside CloneAliasScopeMetadata(), so
that one likely needs to be fixed as well.
2020-11-18 20:52:58 +01:00
Jamie Schmeiser d4ba28bddc Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager.  Enable memoryssa for loopsink with new pass manager.  This
combination exposed a bug that was previously fixed for loopsink
without memoryssa.  When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation.  This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.

Respond to review comments.  Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control.  Update tests accordingly.

Respond to review comments.  Add options controlling whether memoryssa is
used for loop sink, defaulting to off.  Expand testing based on these
options.

Respond to review comments.  Properly indicated preserved analyses.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
2020-11-18 14:08:42 -05:00
Piotr Sobczak b3b9be4ae7 SpeculativeExecution: Allow speculating more instruction types
Support more instructions in SpeculativeExecution pass:
- ExtractValue
- InsertValue
- Trunc
- Freeze

Differential Revision: https://reviews.llvm.org/D91688
2020-11-18 17:00:19 +01:00
Roman Lebedev 34ff90ad5d
[Reassociate] Don't convert add-like-or's into add's if they appear to be part of load-combining idiom
As Wei Mi is reporting in post-commit review
  https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html
teaching -reassociate about add-like-or's (70472f3) results in breaking apart
load widening patterns, and reassociating them.

For now, simply exclude any such `or` that appears to be a root of
load widening idiom from the or->add transformation.

Note that the heuristic is greedy, it doesn't ensure that loads
can *actually* be widened into a single load.
2020-11-18 17:55:02 +03:00
Florian Hahn a8a79c9069
[ConstraintElimination] Refactor constraint extraction (NFC).
This patch generalizes the extraction of a constraint for a given
condition. It allows decompose to return a vector of c * X pairs, which
allows de-composing multiple instructions in the future.

It also adds more clarifying comments.
2020-11-18 13:59:18 +00:00
Benjamin Kramer 4dbe12e866 [SLP] Use the minimum alignment of the load bundle when forming a masked.gather
Instead of the first load. That works when vectorizing contiguous loads,
but not for gathers.

Fixes a miscompile introduced in fcad8d3635.
2020-11-18 12:53:39 +01:00
Max Kazantsev f33118c61c [IndVars] Support different types of ExitCount when optimizing exit conds
In some cases we can handle IV and iter count of different types. It's a typical situation
after IV have been widened. This patch adds support for such cases, when legal.

Differential Revision: https://reviews.llvm.org/D88528
Reviewed By: skatkov
2020-11-18 18:20:05 +07:00
Piotr Sobczak c173f1b8eb SpeculativeExecution: Allow speculating more instruction types
Support more instructions in SpeculativeExecution pass:
- ExtractElement
- InsertElement
- ShuffleVector

Differential Revision: https://reviews.llvm.org/D91633
2020-11-18 09:46:43 +01:00
Arthur Eubanks 9e3b4f4941 [JumpThreading] Make -print-lvi-after-jump-threading work with NPM 2020-11-17 23:15:20 -08:00
Arthur Eubanks ee7d315cd9 [DCE] Always get TargetLibraryInfo
I don't see any reason not to unconditionally retrieve TLI, it's fairly
cheap.

Fixes calls-errno.ll under NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91476
2020-11-17 20:41:05 -08:00
Nick Desaulniers f4c6080ab8 Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch"
This reverts commit b7926ce6d7.

Going with a simpler approach.
2020-11-17 17:27:14 -08:00
Sanjay Patel 08834979e3 [SLP] avoid unreachable code crash/infloop
Example based on the post-commit comments for D88735.
2020-11-17 15:10:23 -05:00
Sanjay Patel 4a66a1d17a [InstCombine] allow vectors for masked-add -> xor fold
https://rise4fun.com/Alive/I4Ge

  Name: add with pow2 mask
  Pre: isPowerOf2(C2) && (C1 & C2) != 0 && (C1 & (C2-1)) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %n = and i8 %x, C2
  %r = xor i8 %n, C2
2020-11-17 13:36:08 -05:00
Simon Pilgrim f7ebdec987 [InstCombine] visitAnd - remove unnecessary Value *X, *Y shadow variables. NFCI.
Fixes a number of Wshadow warnings.
2020-11-17 17:59:21 +00:00
Simon Pilgrim abf29d9862 [InstCombine] visitAnd - use m_SpecificInt instead of m_APInt + comparison. NFCI.
m_SpecificInt has the same 'no undef element' behaviour as m_APInt so no change there, and anyway we have test coverage for undef elements in the fold.

Noticed while fixing a Wshadow warning about shadow Value *X, *Y variables.
2020-11-17 17:37:10 +00:00
Sanjay Patel f791ad7e1e [InstCombine] remove scalar constraint for mask-of-add fold
https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2
2020-11-17 12:13:45 -05:00
Sanjay Patel 433696911a [InstCombine] relax constraints on mask-of-add
There are 2 changes:
1. Remove the unnecessary one-use check.
2. Remove the unnecessary power-of-2 check.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2
2020-11-17 12:13:44 -05:00
Florian Hahn 52f3714dae [VPlan] Add VPDef class.
This patch introduces a new VPDef class, which can be used to
manage VPValues defined by recipes/VPInstructions.

The idea here is to mirror VPUser for values defined by a recipe. A
VPDef can produce either zero (e.g. a store recipe), one (most recipes)
or multiple (VPInterleaveRecipe) result VPValues.

To traverse the def-use chain from a VPDef to its users, one has to
traverse the users of all values defined by a VPDef.

VPValues now contain a pointer to their corresponding VPDef, if one
exists. To traverse the def-use chain upwards from a VPValue, we first
need to check if the VPValue is defined by a VPDef. If it does not have
a VPDef, this means we have a VPValue that is not directly defined
iniside the plan and we are done.

If we have a VPDef, it is defined inside the region by a recipe, which
is a VPUser, and the upwards def-use chain traversal continues by
traversing all its operands.

Note that we need to add an additional field to to VPVAlue to link them
to their defs. The space increase is going to be offset by being able to
remove the SubclassID field in future patches.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D90558
2020-11-17 16:18:11 +00:00
Matt Arsenault c5ce6036c1 Linker: Fix linking of byref types
This wasn't properly remapping the type like with the other
attributes, so this would end up hitting a verifier error after
linking different modules using byref.
2020-11-17 11:02:04 -05:00
Anton Afanasyev 0a1d315f9f [SLPVectorizer] Fix assert 2020-11-17 18:46:31 +03:00
Anton Afanasyev fcad8d3635 [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic
For the scattered operands of load instructions it makes sense
to use gathering load intrinsic, which can lower to native instruction
for X86/AVX512 and ARM/SVE. This also enables building
vectorization tree with entries containing scattered operands.
The next step is to add scattered store.

Fixes PR47629 and PR47623

Differential Revision: https://reviews.llvm.org/D90445
2020-11-17 18:11:45 +03:00
Florian Hahn 13042da5cb
[ConstraintElimination] Add support for And.
When processing conditional branches, if the condition is an AND of 2 compares
and the true successor only has the current block as predecessor, queue both
conditions for the true successor.
2020-11-17 14:12:15 +00:00
Sander de Smalen f571fe6df5 Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
This relands https://reviews.llvm.org/D91059 and reverts commit
30fded75b4.

GetRegUsage now returns 0 when Ty is not a valid vector element type.
2020-11-17 13:45:10 +00:00
Yevgeny Rouban a57fe210ff [JumpThreading] Fix branch probabilities in DuplicateCondBranchOnPHIIntoPred()
When instructions are cloned from block BB to PredBB in the method
DuplicateCondBranchOnPHIIntoPred() number of successors of PredBB
changes from 1 to number of successors of BB. So we have to copy
branch probabilities from BB to PredBB.

Reviewed By: Kazu Hirata

Differential Revision: https://reviews.llvm.org/D90841
2020-11-17 14:40:50 +07:00
Max Kazantsev 63dd1734b2 [NFC] Collect ext users into vector instead of finding them twice 2020-11-17 14:01:43 +07:00
Kazu Hirata 1da60f1d44 [Transforms] Use pred_empty (NFC) 2020-11-16 22:09:14 -08:00
Kazu Hirata 5935952c31 [SanitizerCoverage] Use [&] for lambdas (NFC) 2020-11-16 21:45:21 -08:00
Arthur Eubanks 7de6dcd246 [Debugify] Skip debugifying on special/immutable passes
With a function pass manager, it would insert debuginfo metadata before
getting to function passes while processing the pass manager, causing
debugify to skip while running the function passes.

Skip special passes + verifier + printing passes. Compared to the legacy
implementation of -debugify-each, this additionally skips verifier
passes. Probably no need to update the legacy version since it will be
obsolete soon.

This fixes 2 instcombine tests using -debugify-each under NPM.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D91558
2020-11-16 20:39:46 -08:00
Sjoerd Meijer fa5cb4b936 [LoopFlatten] Disable IV widening
Disable widening of the IV in LoopFlatten while I investigate an assertion
failures. Please note that the pass is also not yet enabled by default.
2020-11-16 22:30:52 +00:00
Michael Liao f375885ab8 [InferAddrSpace] Teach to handle assumed address space.
- In certain cases, a generic pointer could be assumed as a pointer to
  the global memory space or other spaces. With a dedicated target hook
  to query that address space from a given value, infer-address-space
  pass could infer and propagate that to all its users.

Differential Revision: https://reviews.llvm.org/D91121
2020-11-16 17:06:33 -05:00
Florian Hahn 5a4ca8b550 [ConstraintElimination] Add support for Or.
When processing conditional branches, if the condition is an OR of 2 compares
and the false successor only has the current block as predecessor, queue both
negated conditions for the false successor
2020-11-16 21:48:38 +00:00
Philip Reames 2240d3d054 [LoopVec] Introduce an api for detecting uniform memory ops
Split off D91398 at request of reviewer.
2020-11-16 13:30:48 -08:00
Arnold Schwaighofer d861cc0e43 [coro] Async coroutines: Make sure we can handle control flow in suspend point dispatch function
Create a valid basic block with a terminator before we call
InlineFunction.

Differential Revision: https://reviews.llvm.org/D91547
2020-11-16 11:59:02 -08:00
Sanjay Patel 4e68bc0999 Revert "[InstCombine] add multi-use demanded bits fold for add with low-bit mask"
This reverts commit e56103d250.
There is a stage2 msan failure blamed on this commit:
http://lab.llvm.org:8011/#/builders/74/builds/888/steps/9/logs/stdio
2020-11-16 14:48:09 -05:00
Arthur Eubanks aeb0fdff35 [SimplifyCFG] Respect optforfuzzing in NPM pass
Regression caused by refactoring in
cdd006eec9.

See discussion in https://reviews.llvm.org/D89917.

Reviewed By: arsenm, morehouse

Differential Revision: https://reviews.llvm.org/D91473
2020-11-16 09:56:37 -08:00
Xun Li 985c524001 [Coroutine] Allocas used by StoreInst does not always escape
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.

This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.

Differential Revision: https://reviews.llvm.org/D91305
2020-11-16 09:14:44 -08:00
Florian Hahn 8dbe44cb29 Add pass to add !annotate metadata from @llvm.global.annotations.
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated  using
__attribute__((annotate("_name"))) on functions in Clang.

This has been discussed on llvm-dev as part of
    RFC: Combining Annotation Metadata and Remarks
    http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D91195
2020-11-16 14:57:11 +00:00
Benjamin Kramer 2e7455f00a [LoopFlatten] Fold variable into assert. NFC. 2020-11-16 11:51:39 +01:00
Sjoerd Meijer 9aa773381b [LoopFlatten] Widen the IV
Widen the IV to the widest available and legal integer type, which makes this
transformations always safe so that we can skip overflow checks.

Motivation is to let this pass trigger on 64-bit targets too, and this is the
last patch in a serie to achieve this: D90402 moves pass LoopFlatten to just
before IndVarSimplify so that IVs are not already widened, D90421 factors out
widening from IndVarSimplify into Utils/SimplifyIndVar so that we can also use
it in LoopFlatten.

Differential Revision: https://reviews.llvm.org/D90640
2020-11-16 10:20:13 +00:00
Max Kazantsev b4624f65cf Recommit "[NFC] Move code between functions as a preparation step for further improvement"
The bug should be fixed now.
2020-11-16 14:30:34 +07:00
Kazu Hirata 147ccc848a [JumpThreading] Call eraseBlock when folding a conditional branch
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.

Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.

This patch is one of the steps before we can safely re-apply D91017.

Differential Revision: https://reviews.llvm.org/D91511
2020-11-15 22:29:30 -08:00
Kazu Hirata 0888eaf3fd [Loop Fusion] Use pred_empty and succ_empty (NFC) 2020-11-15 20:32:57 -08:00
Kazu Hirata 0c03d1328c [ADCE] Use succ_empty (NFC) 2020-11-15 19:52:59 -08:00
Kazu Hirata 43a6a1e928 [TRE] Use successors(BB) (NFC) 2020-11-15 19:12:49 -08:00
Kazu Hirata 918e3439e2 [SanitizerCoverage] Use llvm::all_of (NFC) 2020-11-15 19:01:20 -08:00
Serguei Katkov 400f6edce7 [IRCE] Use the same min runtime iteration threshold for BPI and BFI checks
In the last change to IRCE the BPI is ignored if BFI is present, however
BFI and BPI have a different thresholds. Specifically BPI approach checks only
latch exit probability so it is expected if the loop has only one exit block (latch)
the behavior with BFI and BPI should be the same,

BPI approach by default uses threshold 10, so it considers the loop with estimated
number of iterations less then 10 should not be considered for IRCE optimization.
BFI approach uses the default value 3 and this is inconsistent.

The CL modifies the code to use the same threshold for both approaches..

The test is updated due to it has two side-exits (except latch) and each of them has a
probability 1/16, so BFI estimates the number of runtime iteration is about to 7
(1/16 + 1/16 + some for latch) and test fails.

Reviewers: mkazantsev, ebrevnov
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D91230
2020-11-16 09:21:50 +07:00
Sanjay Patel 6ddc237766 [InstCombine] reduce code for flip of masked bit; NFC
There are 1-2 potential follow-up NFC commits to reduce
this further on the way to generalizing this for vectors.

The operand replacing path should be dead code because demanded
bits handles that more generally (D91415).
2020-11-15 15:43:34 -05:00
Sanjay Patel e56103d250 [InstCombine] add multi-use demanded bits fold for add with low-bit mask
I noticed an add example like the one from D91343, so here's a similar patch.
The logic is based on existing code for the single-use demanded bits fold.
But I only matched a constant instead of using compute known bits on the
operands because that was the motivating patterni that I noticed.

I think this will allow removing a special-case (but incomplete) dedicated
fold within visitAnd(), but I need to untangle the existing code to be sure.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2

Differential Revision: https://reviews.llvm.org/D91415
2020-11-15 15:09:49 -05:00
Florian Hahn 0c119ba8a8 [VPlan] Use VPValue def for VPWidenGEPRecipe.
This patch turns VPWidenGEPRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84683
2020-11-15 15:12:47 +00:00
Arthur Eubanks 6e04da0a5a [DCE] Port -redundant-dbg-inst-elim to NPM
This is used to test RemoveRedundantDbgInstrs(), which is used by other
passes.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D91477
2020-11-14 16:55:20 -08:00
Florian Hahn a70b511e78 Recommit "[VPlan] Use VPValue def for VPWidenSelectRecipe."
This reverts the revert commit c8d73d939f.

It includes a fix for cases where we missed inserting VPValues
for some selects, which should fix PR48142.
2020-11-14 20:00:25 +00:00
Arnold Schwaighofer 8fb73cecfd [Coroutines] Make sure that async coroutine context size is a multiple of the alignment requirement
This simplifies the code the allocator has to executed

Differential Revision: https://reviews.llvm.org/D91471
2020-11-14 04:56:56 -08:00
Roman Lebedev 6861d938e5
Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.

I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.

This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
2020-11-14 13:12:38 +03:00
Akira Hatanaka 2ed3a76745 [ObjC][ARC] Add and use a function which finds and returns the single
dependency. NFC

Use findSingleDependency in place of FindDependencies and stop passing a
set of Instructions around. Modify FindDependencies to return a boolean
flag which indicates whether the dependencies it has found are all
valid.
2020-11-13 14:02:58 -08:00
Akira Hatanaka 00d0974e62 Move variable declarations to functions in which they are used. NFC 2020-11-13 14:02:58 -08:00
Guozhi Wei a20220d25b [AlwaysInliner] Call mergeAttributesForInlining after inlining
Like inlineCallIfPossible and InlinerPass, after inlining mergeAttributesForInlining
should be called to merge callee's attributes to caller. But it is not called in
AlwaysInliner, causes caller's attributes inconsistent with inlined code.

Attached test case demonstrates that attribute "min-legal-vector-width"="512" is
not merged into caller without this patch, and it causes failure in SelectionDAG
when lowering the inlined AVX512 intrinsic.

Differential Revision: https://reviews.llvm.org/D91446
2020-11-13 12:01:35 -08:00
Jianzhou Zhao 06c9b4aaa9 Extend the dfsan store/load callback with write/read address
This helped debugging.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D91236
2020-11-13 19:46:32 +00:00
Nikita Popov 02dda1c659 [Local] Clean up EmitGEPOffset
Handle the emission of the add in a single place, instead of three
different ones.

Don't emit an unnecessary add with zero to start with. It will get
dropped by InstCombine, but we may as well not create it in the
first place. This also means that InstCombine does not need to
specially handle this extra add.

This is conceptually NFC, but can affect worklist order etc.
2020-11-13 18:30:56 +01:00
David Zarzycki 5a327f3337 Revert "[NFC] Move code between functions as a preparation step for further improvement"
This reverts commit 08016ac32b.

A bunch of tests are failing my local two stage builder.
2020-11-13 10:52:49 -05:00
serge-sans-paille 95537f4508 llvmbuildectomy - compatibility with ocaml bindings
Use exact component name in add_ocaml_library.
Make expand_topologically compatible with new architecture.
Fix quoting in is_llvm_target_library.
Fix LLVMipo component name.
Write release note.
2020-11-13 14:35:52 +01:00
Florian Hahn 8bb6347939
Add !annotation metadata and remarks pass.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.

It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.

The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.

To motivate this, consider these specific questions we would like to get answered:

* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?

Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

Reviewed By: thegameg, jdoerfert

Differential Revision: https://reviews.llvm.org/D91188
2020-11-13 13:24:10 +00:00
Max Kazantsev 08016ac32b [NFC] Move code between functions as a preparation step for further improvement 2020-11-13 18:12:45 +07:00
Max Kazantsev 185cface2e [NFC] Refactor lambda into static function 2020-11-13 17:42:23 +07:00
Max Kazantsev 68490aec4e [NFC] Move lambdae into static functions 2020-11-13 17:07:25 +07:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Max Kazantsev 9224d322a2 [IndVars] Fix branches exiting by true with invariant conditions
Forgot to invert the condition for them.
2020-11-13 15:52:00 +07:00
Max Kazantsev 0a1d394bf3 [NFC] Refactor loop-invariant getters to return Optional 2020-11-13 15:03:10 +07:00
Arthur Eubanks b9406121a0 [NFC] Removed unused variable
Obsolete as of https://reviews.llvm.org/D91046.
2020-11-12 22:24:57 -08:00
Akira Hatanaka 09266e4af0 [ObjC][ARC] Clear the lists of basic blocks and instructions before
continuing the loop

This fixes a bug introduced in c6f1713c46.
2020-11-12 22:20:02 -08:00
Max Kazantsev 77efb73c67 [IndVars] Replace checks with invariants if we cannot remove them
If we cannot prove that the check is trivially true, but can prove that it either
fails on the 1st iteration or never fails, we can replace it with first iteration check.

Differential Revision: https://reviews.llvm.org/D88527
Reviewed By: skatkov
2020-11-13 12:23:12 +07:00
Sanjay Patel 0abde4bc92 [InstCombine] fold sub of low-bit masked value from offset of same value
There might be some demanded/known bits way to generalize this,
but I'm not seeing it right now.

This came up as a regression when I was looking at a different
demanded bits improvement.

https://rise4fun.com/Alive/5fl

  Name: general
  Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0
  %a1 = add i8 %x, C1
  %a2 = and i8 %x, C2
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, ~C2

  Name: test 1
  %a1 = add i8 %x, 192
  %a2 = and i8 %x, 10
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -11

  Name: test 2
  %a1 = add i8 %x, -108
  %a2 = and i8 %x, 3
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -4
2020-11-12 20:10:28 -05:00
Jianzhou Zhao 2d96859ea6 [msan] Break the getShadow loop after matching an argument
Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D91320
2020-11-12 19:48:59 +00:00
Alexander Kornienko 76b6cb515b Fix unused variable warning in release builds 2020-11-12 18:14:06 +01:00
Jamie Schmeiser f79b483385 [NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct without exposing internals
Summary:
Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other
classes to construct flags with meaningful defaults while not exposing LICM internal details.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea)

Differential Revision: https://reviews.llvm.org/D90482
2020-11-12 15:06:59 +00:00
Xun Li 94a45a8098 Revert "[Coroutine] Allocas used by StoreInst does not always escape"
This reverts commit 8bc7b9278e, which landed by accident.
2020-11-11 21:09:39 -08:00
Max Kazantsev d6dd938589 [IndVars] IV user should not prevent use widening
Sometimes the an instruction we are trying to widen is used by the IV
(which means the instruction is the IV increment). Currently this may
prevent its widening. We should ignore such user because it will be
dead once the transform is done anyways.

Differential Revision: https://reviews.llvm.org/D90920
Reviewed By: fhahn
2020-11-12 12:02:01 +07:00
Xun Li 8bc7b9278e [Coroutine] Allocas used by StoreInst does not always escape
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.

This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.

Differential Revision: https://reviews.llvm.org/D91305
2020-11-11 20:53:51 -08:00
Max Kazantsev 2e01ceafaa [IndVars] Recognize 'sub nuw' expressed as 'add' for widening
InstCombine canonicalizes 'sub nuw' instructions to 'add' without the
`nuw` flag. The typical case where we see it is decrementing induction
variables. For them, IndVars fails to prove that it's legal to widen them,
and inserts unprofitable `zext`'s.

This patch adds recognition of such pattern using SCEV.

Differential Revision: https://reviews.llvm.org/D89550
Reviewed By: fhahn, skatkov
2020-11-12 10:51:29 +07:00
Arnold Schwaighofer 431337662e [coro] Async coroutines: Allow more than 3 arguments in the dispatch function
We need to be able to call function pointers. Inline the dispatch
function.

Also inline the context projection function.

Transfer debug locations from the suspend point to the inlined functions.

Use the function argument index instead of the function argument in
coro.id.async. This solves any spurious use issues.

Coerce the arguments of the tail call function at a suspend point. The LLVM
optimizer seems to drop casts leading to a vararg intrinsic.

rdar://70097093

Differential Revision: https://reviews.llvm.org/D91098
2020-11-11 15:25:28 -08:00
Arthur Eubanks d9cbceb041 [CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass
Previously the inliner did a bit of a hack by adding ref edges for all
new edges introduced by performing an inline before calling
updateCGAndAnalysisManagerForPass(). This was because
updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
edges.

This adds handling of non-trivial call edges to
updateCGAndAnalysisManagerForPass().  The inliner called
updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
newly introduced edges (so updateCGAndAnalysisManagerForPass() would
only have to handle promotion), but now it needs to call
updateCGAndAnalysisManagerForCGSCCPass() since
updateCGAndAnalysisManagerForPass() is now handling the new call edges
and function passes cannot add new edges.

We follow the previous path of adding trivial ref edges then letting promotion
handle changing the ref edges to call edges and the CGSCC updates. So
this still does not allow adding call edges that result in an addition
of a non-trivial ref edge.

This is in preparation for better detecting devirtualization. Previously
since the inliner itself would add ref edges,
updateCGAndAnalysisManagerForPass() would think that promotion and thus
devirtualization had happened after any sort of inlining.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91046
2020-11-11 13:43:49 -08:00
Jianzhou Zhao 0dd87825db Add a flag to control whether to propagate labels from condition values to results
Before the change, DFSan always does the propagation. W/o
origin tracking, it is harder to understand such flows. After
the change, the flag is off by default.

Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D91234
2020-11-11 20:41:42 +00:00
Akira Hatanaka 5e85d00ed6 Move variable declarations to functions in which they are used. NFC 2020-11-11 10:58:43 -08:00
Sander de Smalen 30fded75b4 Revert "[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost."
This reverts commits:
* [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
  b873aba394.
* [LoopVectorizer] Silence warning in GetRegUsage.
  9ff701100a.
2020-11-11 14:41:55 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Sander de Smalen 9ff701100a [LoopVectorizer] Silence warning in GetRegUsage.
This patch silences the warning:
	error: lambda capture 'DL' is not used [-Werror,-Wunused-lambda-capture]
	  auto GetRegUsage = [&DL, &TTI=TTI](Type *Ty, ElementCount VF) {
	                      ~^~~
	1 error generated.

Introduced in:
  https://reviews.llvm.org/rGb873aba3943c067a5efd5303cbdf5aeb0732cf88
2020-11-11 10:54:20 +00:00
Sander de Smalen b873aba394 [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
This is more accurate than dividing the bitwidth based on the element count by the
maximum register size, as it can just reuse whatever has been calculated for
legalization of these types.

This change is also necessary when calculating register usage for scalable vectors, where
the legalization of these types cannot be done based on the widest register size, because
that does not take the 'vscale' component into account.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91059
2020-11-11 10:18:50 +00:00
Sander de Smalen 0141f5a49d [LoopVectorizer] NFC: Return ElementCount from compute[Feasible]MaxVF
Interfaces changed to return `ElementCount`:
* LoopVectorizationCostModel::computeMaxVF
* LoopVectorizationCostModel::computeFeasibleMaxVF

This is NFC for fixed-width vectors.

Reviewed By: dmgreen, ctetreau

Differential Revision: https://reviews.llvm.org/D90880
2020-11-11 09:55:06 +00:00
Chen Zheng 4eb8359e74 [EarlyCSE] delete abs/nabs handling
delete abs/nabs handling in earlycse pass to avoid bugs related to
hashing values. After abs/nabs is canonicalized to intrinsics in D87188,
we should get CSE ability for abs/nabs back.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D90734
2020-11-10 21:10:58 -05:00
Florian Hahn c8d73d939f Revert "[VPlan] Use VPValue def for VPWidenSelectRecipe."
This reverts commit a8e50f1c6e.

This reportedly breaks building the Linux kernel.
  https://bugs.llvm.org/show_bug.cgi?id=48142
2020-11-10 22:50:46 +00:00
Bruno Cardoso Lopes dc14542a71 [Coroutines] Add missing llvm.dbg.declare's to cover for more allocas
Tracking local variables across suspend points is still somewhat incomplete.
Consider this coroutine snippet:

```
resumable foo() {
  int x[10] = {};
  int a = 3;
  co_await std::experimental::suspend_always();
  a++;
  x[0] = 1;
  a += 2;
  x[1] = 2;
  a += 3;
  x[2] = 3;
}
```

Can't manage to print `a` or `x` if they turn out to be allocas during
CoroSplit (which happens if you build this code with `-O0` prior to this
commit):

```
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003729 main-noprint`foo() at main-noprint.cpp:43:5
   40     co_await std::experimental::suspend_always();
   41     a++;
   42     x[0] = 1;
-> 43     a += 2;
   44     x[1] = 2;
   45     a += 3;
   46     x[2] = 3;
(lldb) p x
error: <user expression 21>:1:1: use of undeclared identifier 'x'
x
^
```

The generated IR contains a `llvm.dbg.declare` for `x` in it's initialization
basic block. After CoroSplit, the `llvm.dbg.declare` might not dominate all of
`x` uses and we lose debugging quality.

Add `llvm.dbg.value`s to all relevant basic blocks such that if later
transformations break the dominance the reliable debug info is already in
place. For instance, this BB:

```
await.ready:
  ...
  %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
  ...
  %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
  ...
  %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```

becomes:

```
await.ready:
  ...
  call void @llvm.dbg.value(metadata [10 x i32]* %x.reload.addr, metadata !751, metadata !DIExpression()), !dbg !753
  ...
  %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
  ...
  %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
  ...
  %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```

Differential Revision: https://reviews.llvm.org/D90772
2020-11-10 12:36:07 -08:00
Sjoerd Meijer 2ef47910d5 [LoopFlatten] Run it earlier, just before IndVarSimplify
This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.

Differential Revision: https://reviews.llvm.org/D90402
2020-11-10 20:22:41 +00:00
Sjoerd Meijer 706ead0e87 [LoopFlatten] Make it a FunctionPass
This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't
run into problems of a loop pass deleting a (inner)loop.

Differential Revision: https://reviews.llvm.org/D90940
2020-11-10 20:03:31 +00:00
Florian Hahn a8e50f1c6e
[VPlan] Use VPValue def for VPWidenSelectRecipe.
This patch turns VPWidenSelectRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84682
2020-11-10 19:39:37 +00:00
Jonas Paulsson 89a1042b6a Make inferLibFuncAttributes() add SExt attribute on second arg to ldexp.
This was missing as discovered by the SystemZ multistage bot:
http://lab.llvm.org:8011/#/builders/8, where wrong code resulted when this
extension was not performed.

Thanks for review by Ulrich Weigand and Roman Lebedev.

Differential Revision: https://reviews.llvm.org/D90760
2020-11-10 18:32:15 +01:00
David Green c7e275388e [ARM] Don't aggressively unroll vector remainder loops
We already do not unroll loops with vector instructions under MVE, but
that does not include the remainder loops that the vectorizer produces.
These remainder loops will be rarely executed and are not worth
unrolling, as the trip count is likely to be low if they get executed at
all. Luckily they get llvm.loop.isvectorized to make recognizing them
simpler.

We have wanted to do this for a while but hit issues with low overhead
loops being reverted due to difficult registry allocation. With recent
changes that seems to be less of an issue now.

Differential Revision: https://reviews.llvm.org/D90055
2020-11-10 17:01:31 +00:00
Sanne Wouda dd03881bd5 Add loop distribution to the LTO pipeline
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896
2020-11-10 12:04:32 +00:00
Sander de Smalen f47573f9bf [LoopVectorizer] NFC: Propagate ElementCount to more interfaces.
Interfaces changed to take `ElementCount` as parameters:
* LoopVectorizationPlanner::buildVPlans
* LoopVectorizationPlanner::buildVPlansWithVPRecipes
* LoopVectorizationCostModel::selectVectorizationFactor

This patch is NFC for fixed-width vectors.

Reviewed By: dmgreen, ctetreau

Differential Revision: https://reviews.llvm.org/D90879
2020-11-10 11:11:02 +00:00
Max Kazantsev 25755a0159 [NFC] Add flag to disable IV widening in indvar instance
This allows us to have control over IV widening in the pipeline.
2020-11-10 15:10:44 +07:00
Arthur Eubanks 1cbf8e89b5 [NewPM] Port -separate-const-offset-from-gep
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91095
2020-11-09 17:42:36 -08:00
Michael Kruse e5dba2d7e5 [OMPIRBuilder] Start 'Create' methods with lower case. NFC.
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.

This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.

I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.

Reviewed By: mehdi_amini, fghanim

Differential Revision: https://reviews.llvm.org/D91109
2020-11-09 19:35:11 -06:00
Xun Li c2cb093d9b [Coroutine] Move all used local allocas to the .resume function
Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function.
However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function.
When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used.
This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry.

Differential Revision: https://reviews.llvm.org/D90977
2020-11-09 17:24:49 -08:00
Sjoerd Meijer e2dcea4489 [LoopFlatten] FlattenInfo bookkeeping. NFC.
Introduce struct FlattenInfo to group some of the bookkeeping. Besides this
being a bit of a clean-up, it is a prep step for next additions (D90640). I
could take things a bit further, but thought this was a good first step also
not to make this change too large.

Differential Revision: https://reviews.llvm.org/D90408
2020-11-09 14:50:26 +00:00
Florian Hahn f0d76275cb
[VPlan] Print result value for loads in VPWidenMemoryInst (NFC).
For loads, print the result value.
2020-11-09 14:01:29 +00:00
Florian Hahn 537829f2a7
[VPlan] Add isStore helper to VPWidenMemoryInstructionRecipe (NFC).
Move logic to check if the recipe is a store to a helper for easier
reuse.
2020-11-09 14:01:29 +00:00
Florian Hahn fec64de261
[VPlan] Use VPValue def for VPWidenCall.
This patch turns VPWidenCall into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84681
2020-11-09 13:29:41 +00:00
Florian Hahn 091c5c9a18
[VPlan] Add printOperands helper to VPUser (NFC).
Factor out the code for printing operands of a VPUser so it can be
re-used when printing other recipes.
2020-11-09 12:30:57 +00:00
LemonBoy 42732d33cc
[InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors
Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated.
This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D89628
2020-11-09 14:41:07 +03:00
Tim Northover f7fe7ea24d [MergeFunctions] fix function attribute comparison in FunctionComparator
The comparison of AttributeSets stopped after seeing a matching type attribute.
Subsequent mismatching attributes were not detected causing a crash.
2020-11-09 09:19:11 +00:00
Simon Pilgrim b11eaf5617 [DSE] Don't dereference a dyn_cast<> result - use cast<> instead. NFCI.
We were relying on the dyn_cast<> succeeding - better use cast<> and have it assert that its the correct type than dereference a null result.
2020-11-08 13:07:45 +00:00
Simon Pilgrim 0fe91ad463 [InstCombine] foldSelectFunnelShift - block poison in funnel shift value
As raised by @nlopes on D90382 - if this is not a rotate then the select was blocking poison from the 'shift-by-zero' non-TVal, but a funnel shift won't - so freeze it.
2020-11-08 12:58:30 +00:00
Florian Hahn e8dc17a2b7
[LoopInterchange] Skip non SCEV-able operands in cost function.
This fixes a crash when trying to get a SCEV expression for operands
that are not SCEV-able.
2020-11-08 11:41:19 +00:00
Pedro Tammela 5e8ecff0d8 [Reg2Mem] add support for the new pass manager
This patch refactors the pass to accomodate the new pass manager
boilerplate.

Differential Revision: https://reviews.llvm.org/D91005
2020-11-08 11:14:05 +00:00
Kazu Hirata 75e46c6328 [Mem2Reg] Use llvm::count instead of std::count (NFC) 2020-11-07 20:18:47 -08:00
Kazu Hirata c95fff5be7 [JumpThreading] Fix function names (NFC) 2020-11-07 19:35:03 -08:00
Atmn Patel 04a0896487 Revert "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops"
This reverts commit 0b17c6e447. This patch
causes a compile-time error in SCEV.
2020-11-07 00:32:12 -05:00
Atmn Patel 0b17c6e447 [LoopDeletion] Allows deletion of possibly infinite side-effect free loops
From C11 and C++11 onwards, a forward-progress requirement has been
introduced for both languages. In the case of C, loops with non-constant
conditionals that do not have any observable side-effects (as defined by
6.8.5p6) can be assumed by the implementation to terminate, and in the
case of C++, this assumption extends to all functions. The clang
frontend will emit the `mustprogress` function attribute for C++
functions (D86233, D85393, D86841) and emit the loop metadata
`llvm.loop.mustprogress` for every loop in C11 or later that has a
non-constant conditional.

This patch modifies LoopDeletion so that only loops with
the `llvm.loop.mustprogress` metadata or loops contained in functions
that are required to make progress (`mustprogress` or `willreturn`) are
checked for observable side-effects. If these loops do not have an
observable side-effect, then we delete them.

Loops without observable side-effects that do not satisfy the above
conditions will not be deleted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86844
2020-11-06 22:06:58 -05:00
Atmn Patel babc224c5d [LoopDeletion] Remove dead loops with no exit blocks
Currently, LoopDeletion refuses to remove dead loops with no exit blocks
because it cannot statically determine the control flow after it removes
the block. This leads to miscompiles if the loop is an infinite loop and
should've been removed.

Differential Revision: https://reviews.llvm.org/D90115
2020-11-06 17:08:34 -05:00
Quentin Colombet a585228027 Prevent LICM and machineLICM from hoisting convergent operations
Results of convergent operations are implicitly affected by the
enclosing control flows and should not be hoisted out of arbitrary
loops.

Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>

Differential Revision: https://reviews.llvm.org/D90361
2020-11-06 10:26:39 -08:00
Arnold Schwaighofer c6543cc6b8 llvm.coro.id.async lowering: Parameterize how-to restore the current's continutation context and restart the pipeline after splitting
The `llvm.coro.suspend.async` intrinsic takes a function pointer as its
argument that describes how-to restore the current continuation's
context from the context argument of the continuation function. Before
we assumed that the current context can be restored by loading from the
context arguments first pointer field (`first_arg->caller_context`).

This allows for defining suspension points that reuse the current
context for example.

Also:

llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic

Blocks inlining until after the async coroutine was split.

Also, change the async function pointer's context size position

   struct async_function_pointer {
     uint32_t relative_function_pointer_to_async_impl;
     uint32_t context_size;
   }

And make the position of the `async context` argument configurable. The
position is specified by the `llvm.coro.id.async` intrinsic.

rdar://70097093

Differential Revision: https://reviews.llvm.org/D90783
2020-11-06 06:22:46 -08:00
Florian Hahn d8d1cc647d [SLP] Also try to vectorize incoming values of PHIs .
Currently we do not consider incoming values of PHIs as roots for SLP
vectorization. This means we miss scenarios like the one in the test
case and PR47670.

It appears quite straight-forward to consider incoming values of PHIs as
roots for vectorization, but I might be missing something that makes
this problematic.

In terms of vectorized instructions, this applies to quite a few
benchmarks across MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto

    Same hash: 185 (filtered out)
    Remaining: 52
    Metric: SLP.NumVectorInstructions

    Program                                        base    patch   diff
     test-suite...ProxyApps-C++/HPCCG/HPCCG.test     9.00   27.00  200.0%
     test-suite...C/CFP2000/179.art/179.art.test     8.00   22.00  175.0%
     test-suite...T2006/458.sjeng/458.sjeng.test    14.00   30.00  114.3%
     test-suite...ce/Benchmarks/PAQ8p/paq8p.test    11.00   18.00  63.6%
     test-suite...s/FreeBench/neural/neural.test    12.00   18.00  50.0%
     test-suite...rimaran/enc-3des/enc-3des.test    65.00   95.00  46.2%
     test-suite...006/450.soplex/450.soplex.test    63.00   89.00  41.3%
     test-suite...ProxyApps-C++/CLAMR/CLAMR.test   177.00  250.00  41.2%
     test-suite...nchmarks/McCat/18-imp/imp.test    13.00   18.00  38.5%
     test-suite.../Applications/sgefa/sgefa.test    26.00   35.00  34.6%
     test-suite...pplications/oggenc/oggenc.test   100.00  133.00  33.0%
     test-suite...6/482.sphinx3/482.sphinx3.test   103.00  134.00  30.1%
     test-suite...oxyApps-C++/miniFE/miniFE.test   169.00  213.00  26.0%
     test-suite.../Benchmarks/Olden/tsp/tsp.test    59.00   73.00  23.7%
     test-suite...TimberWolfMC/timberwolfmc.test   503.00  622.00  23.7%
     test-suite...T2006/456.hmmer/456.hmmer.test    65.00   79.00  21.5%
     test-suite...libquantum/462.libquantum.test    58.00   68.00  17.2%
     test-suite...ternal/HMMER/hmmcalibrate.test    84.00   98.00  16.7%
     test-suite...ications/JM/ldecod/ldecod.test   351.00  401.00  14.2%
     test-suite...arks/VersaBench/dbms/dbms.test    52.00   57.00   9.6%
     test-suite...ce/Benchmarks/Olden/bh/bh.test   118.00  128.00   8.5%
     test-suite.../Benchmarks/Bullet/bullet.test   6355.00 6880.00  8.3%
     test-suite...nsumer-lame/consumer-lame.test   480.00  519.00   8.1%
     test-suite...000/183.equake/183.equake.test   226.00  244.00   8.0%
     test-suite...chmarks/Olden/power/power.test   105.00  113.00   7.6%
     test-suite...6/471.omnetpp/471.omnetpp.test    92.00   99.00   7.6%
     test-suite...ications/JM/lencod/lencod.test   1173.00 1261.00  7.5%
     test-suite...0/253.perlbmk/253.perlbmk.test    55.00   59.00   7.3%
     test-suite...oxyApps-C/miniAMR/miniAMR.test    92.00   98.00   6.5%
     test-suite...chmarks/MallocBench/gs/gs.test   446.00  473.00   6.1%
     test-suite.../CINT2006/403.gcc/403.gcc.test   464.00  491.00   5.8%
     test-suite...6/464.h264ref/464.h264ref.test   998.00  1055.00  5.7%
     test-suite...006/453.povray/453.povray.test   5711.00 6007.00  5.2%
     test-suite...FreeBench/distray/distray.test   102.00  107.00   4.9%
     test-suite...:: External/Povray/povray.test   4184.00 4378.00  4.6%
     test-suite...DOE-ProxyApps-C/CoMD/CoMD.test   112.00  117.00   4.5%
     test-suite...T2006/445.gobmk/445.gobmk.test   104.00  108.00   3.8%
     test-suite...CI_Purple/SMG2000/smg2000.test   789.00  819.00   3.8%
     test-suite...yApps-C++/PENNANT/PENNANT.test   233.00  241.00   3.4%
     test-suite...marks/7zip/7zip-benchmark.test   417.00  428.00   2.6%
     test-suite...arks/mafft/pairlocalalign.test   627.00  643.00   2.6%
     test-suite.../Benchmarks/nbench/nbench.test   259.00  265.00   2.3%
     test-suite...006/447.dealII/447.dealII.test   4641.00 4732.00  2.0%
     test-suite...lications/ClamAV/clamscan.test   106.00  108.00   1.9%
     test-suite...CFP2000/177.mesa/177.mesa.test   1639.00 1664.00  1.5%
     test-suite...oxyApps-C/RSBench/rsbench.test    66.00   65.00  -1.5%
     test-suite.../CINT2000/252.eon/252.eon.test   3416.00 3444.00  0.8%
     test-suite...CFP2000/188.ammp/188.ammp.test   1846.00 1861.00  0.8%
     test-suite.../CINT2000/176.gcc/176.gcc.test   152.00  153.00   0.7%
     test-suite...CFP2006/444.namd/444.namd.test   3528.00 3544.00  0.5%
     test-suite...T2006/473.astar/473.astar.test    98.00   98.00   0.0%
     test-suite...frame_layout/frame_layout.test    NaN     39.00   nan%

On ARM64, there appears to be a slight regression on SPEC2006, which
might be interesting to investigate:

   test-suite...T2006/473.astar/473.astar.test   0.9%

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D88735
2020-11-06 12:50:32 +00:00
Sander de Smalen 4a3bb9ea6c [VPlan] NFC: Change VFRange to take ElementCount
This patch changes the type of Start, End in VFRange to be an ElementCount
instead of `unsigned`. This is done as preparation to make VPlans for
scalable vectors, but is otherwise NFC.

Reviewed By: dmgreen, fhahn, vkmr

Differential Revision: https://reviews.llvm.org/D90715
2020-11-06 09:50:20 +00:00
Roman Lebedev 8d0fdd36a3
[IR] CmpInst: Add getFlippedSignednessPredicate()
And refactor a few places to use it
2020-11-06 11:31:09 +03:00
Giorgis Georgakoudis 700d2417d8 [CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers
CodeExtractor handles bitcasts in the extracted region that have
lifetime markers users in the outer region as outputs. That
creates unnecessary alloca/reload instructions and extra lifetime
markers. The patch identifies those cases, and replaces uses in
out-of-region lifetime markers with new bitcasts in the outer region.

**Example**
```
define void @foo() {
entry:
  %0 = alloca i32
  br label %extract

extract:
  %1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %1)
  call void @use(i32* %0)
  br label %exit

exit:
  call void @use(i32* %0)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %1)
  ret void
}
```

**Current extraction**
```
define void @foo() {
entry:
  %.loc = alloca i8*, align 8
  %0 = alloca i32, align 4
  br label %codeRepl

codeRepl:                                         ; preds = %entry
  %lt.cast = bitcast i8** %.loc to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast)
  %lt.cast1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1)
  call void @foo.extract(i32* %0, i8** %.loc)
  %.reload = load i8*, i8** %.loc, align 8
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* %lt.cast)
  br label %exit

exit:                                             ; preds = %codeRepl
  call void @use(i32* %0)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %.reload)
  ret void
}

define internal void @foo.extract(i32* %0, i8** %.out) {
newFuncRoot:
  br label %extract

exit.exitStub:                                    ; preds = %extract
  ret void

extract:                                          ; preds = %newFuncRoot
  %1 = bitcast i32* %0 to i8*
  store i8* %1, i8** %.out, align 8
  call void @use(i32* %0)
  br label %exit.exitStub
}
```

**Extraction with patch**
```
define void @foo() {
entry:
  %0 = alloca i32, align 4
  br label %codeRepl

codeRepl:                                         ; preds = %entry
  %lt.cast1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1)
  call void @foo.extract(i32* %0)
  br label %exit

exit:                                             ; preds = %codeRepl
  call void @use(i32* %0)
  %lt.cast = bitcast i32* %0 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %lt.cast)
  ret void
}

define internal void @foo.extract(i32* %0) {
newFuncRoot:
  br label %extract

exit.exitStub:                                    ; preds = %extract
  ret void

extract:                                          ; preds = %newFuncRoot
  %1 = bitcast i32* %0 to i8*
  call void @use(i32* %0)
  br label %exit.exitStub
}
```

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D90689
2020-11-05 17:01:08 -08:00
Sjoerd Meijer 7eb70158e4 [IndVarSimplify][SimplifyIndVar] Move WidenIV to Utils/SimplifyIndVar. NFCI.
This moves WidenIV from IndVarSimplify to Utils/SimplifyIndVar so that we have
createWideIV available as a generic helper utility. I.e., this is not only
useful in IndVarSimplify, but could be useful for loop transformations. For
example, motivation for this refactoring is the loop flatten transformation: if
induction variables in a loop nest can be widened, we can avoid having to
perform certain overflow checks, enabling this transformation.

Differential Revision: https://reviews.llvm.org/D90421
2020-11-05 16:52:47 +00:00
Florian Hahn be0578f0b4 [GVN] Fix MemorySSA update when replacing assume(false) with stores.
When replacing an assume(false) with a store, we have to be more careful
with the order we insert the new access. This patch updates the code to
look at the accesses in the block to find a suitable insertion point.

Alterantively we could check the defining access of the assume, but IIRC
there has been some discussion about making assume() readnone, so
looking at the access list might be more future proof.

Fixes PR48072.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90784
2020-11-05 12:09:32 +00:00
Simon Pilgrim 4b2be681f4 [InstCombine] Remove orphan InstCombinerImpl method declarations. NFCI. 2020-11-05 10:13:16 +00:00
Arnold Schwaighofer ea5989b43a Start of an llvm.coro.async implementation
This patch adds the `async` lowering of coroutines.

This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.

This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.

rdar://70097093

Reapply with fix for memory sanitizer failure and sphinx failure.

Differential Revision: https://reviews.llvm.org/D90612
2020-11-04 10:29:21 -08:00
Arnold Schwaighofer 42f1916640 Revert "Start of an llvm.coro.async implementation"
This reverts commit ea606cced0.

This patch causes memory sanitizer failures sanitizer-x86_64-linux-fast.
2020-11-04 08:26:20 -08:00
Arnold Schwaighofer ea606cced0 Start of an llvm.coro.async implementation
This patch adds the `async` lowering of coroutines.

This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.

This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.

rdar://70097093

Differential Revision: https://reviews.llvm.org/D90612
2020-11-04 07:32:29 -08:00
Roman Lebedev 93f3d7f7b3
[Reassociate] Guard `add`-like `or` conversion into an `add` with profitability check
This is slightly better compile-time wise,
since we avoid potentially-costly knownbits analysis that will
ultimately not allow us to actually do anything with said `add`.
2020-11-04 16:10:34 +03:00
Martin Storsjö 36cf1e7d0e Revert "[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts"
This reverts commit 59b22e495c.

That commit broke building for ARM and AArch64, reproducible like this:

$ cat apedec-reduced.c
a;
b(e) {
  int c;
  unsigned d = f();
  c = d >> 32 - e;
  return c;
}
g() {
  int h = i();
  if (a)
    h = h << a | b(a);
  return h;
}
$ clang -target aarch64-linux-gnu -w -c -O3 apedec-reduced.c
clang: ../lib/Transforms/InstCombine/InstructionCombining.cpp:3656: bool llvm::InstCombinerImpl::run(): Assertion `DT.dominates(BB, UserParent) && "Dominance relation broken?"' failed.

Same thing for e.g. an armv7-linux-gnueabihf target.
2020-11-04 08:39:32 +02:00
Xun Li 7f34aca083 [musttail] Unify musttail call preceding return checking
There is already an API in BasicBlock that checks and returns the musttail call if it precedes the return instruction.
Use it instead of manually checking in each place.

Differential Revision: https://reviews.llvm.org/D90693
2020-11-03 11:39:27 -08:00
Roman Lebedev 70472f34b2
[Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation
InstCombine is quite aggressive in doing the opposite transform,
folding `add` of operands with no common bits set into an `or`,
and that not many things support that new pattern..

In this case, teaching Reassociate about it is easy,
there's preexisting art for `sub`/`shl`:
just convert such an `or` into an `add`:
https://rise4fun.com/Alive/Xlyv
2020-11-03 22:30:51 +03:00
Sanne Wouda 2ec26d3a23 Revert "Add loop distribution to the LTO pipeline"
This reverts commit 6e80318eec.
2020-11-03 19:29:27 +00:00
Sanne Wouda 6e80318eec Add loop distribution to the LTO pipeline
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896
2020-11-03 18:54:24 +00:00
Jameson Nash 59a6ab28c4 [GVN] small improvements to comments 2020-11-03 13:21:48 -05:00
Roman Lebedev c009d11bda
[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator
In particular, it makes it fire for C=0, because negator doesn't want
to perform that fold since in general it's not beneficial.
2020-11-03 16:06:52 +03:00
Roman Lebedev e465f9c303
[InstCombine] Negator: - (C - %x) --> %x - C (PR47997)
This relaxes one-use restriction on that `sub` fold,
since apparently the addition of Negator broke
preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.
2020-11-03 16:06:51 +03:00
Florian Hahn d68bed0fa9 [SCCP] Handle bitcast of vector constants.
Vectors where all elements have the same known constant range are treated as a
single constant range in the lattice. When bitcasting such vectors, there is a
mis-match between the width of the lattice value (single constant range) and
the original operands (vector). Go to overdefined in that case.

Fixes PR47991.
2020-11-03 12:58:39 +00:00
Simon Pilgrim 59b22e495c [AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts
The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns.

This should allow us to begin resolving PR46896 et al.

Differential Revision: https://reviews.llvm.org/D90625
2020-11-03 10:49:49 +00:00
Florian Hahn d9cbf39a37 [SLP] Pass VecPred argument to getCmpSelInstrCost.
Check if all compares in VL have the same predicate and pass it to
getCmpSelInstrCost, to improve cost-modeling on targets that only
support compare/select combinations for certain uniform predicates.

This leads to additional vectorization in some cases

```
Same hash: 217 (filtered out)
Remaining: 19
Metric: SLP.NumVectorInstructions

Program                                        base    slp2    diff
 test-suite...marks/SciMark2-C/scimark2.test    11.00   26.00  136.4%
 test-suite...T2006/445.gobmk/445.gobmk.test    79.00  135.00  70.9%
 test-suite...ediabench/gsm/toast/toast.test    54.00   71.00  31.5%
 test-suite...telecomm-gsm/telecomm-gsm.test    54.00   71.00  31.5%
 test-suite...CI_Purple/SMG2000/smg2000.test   426.00  542.00  27.2%
 test-suite...ch/g721/g721encode/encode.test    30.00   24.00  -20.0%
 test-suite...000/186.crafty/186.crafty.test   116.00  138.00  19.0%
 test-suite...ications/JM/ldecod/ldecod.test   697.00  765.00   9.8%
 test-suite...6/464.h264ref/464.h264ref.test   822.00  886.00   7.8%
 test-suite...chmarks/MallocBench/gs/gs.test   154.00  162.00   5.2%
 test-suite...nsumer-lame/consumer-lame.test   621.00  651.00   4.8%
 test-suite...lications/ClamAV/clamscan.test   223.00  231.00   3.6%
 test-suite...marks/7zip/7zip-benchmark.test   680.00  695.00   2.2%
 test-suite...CFP2000/177.mesa/177.mesa.test   2121.00 2129.00  0.4%
 test-suite...:: External/Povray/povray.test   2406.00 2412.00  0.2%
 test-suite...TimberWolfMC/timberwolfmc.test   634.00  634.00   0.0%
 test-suite...CFP2006/433.milc/433.milc.test   1036.00 1036.00  0.0%
 test-suite.../Benchmarks/nbench/nbench.test   321.00  321.00   0.0%
 test-suite...ctions-flt/Reductions-flt.test    NaN      5.00   nan%
```

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D90124
2020-11-03 10:16:43 +00:00
Max Kazantsev 46b2e85f0f [NFC] Refactor code in IndVars, preparing for further improvement 2020-11-03 15:08:12 +07:00
Max Kazantsev a44b7322a2 [NFC] Split lambda into 2 parts for further reuse 2020-11-03 14:13:55 +07:00
Max Kazantsev f847094c24 [IndVars] Use knowledge about execution on last iteration when removing checks
If we know that some check will not be executed on the last iteration, we can use this
fact to eliminate its check.

Differential Revision: https://reviews.llvm.org/D88210
Reviwed By: ebrevnov
2020-11-03 13:38:58 +07:00
Alina Sbirlea f514b32a89 [LICM] Add assert of AST/MSSA exclusiveness.
The API `canSinkOrHoistInst` may be called by LoopSink. Add assert to
avoid having two analyses passed in.
2020-11-02 18:04:43 -08:00
Akira Hatanaka b0f1d7d562 Remove unused parameter 2020-11-02 17:40:06 -08:00
Ettore Tiotto 4274cbba1c [PartialInliner]: Handle code regions in a switch stmt cases
This patch enhances computeOutliningColdRegionsInfo() to allow it to
consider regions containing a single basic block and a single
predecessor as candidate for partial inlining.

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89911
2020-11-02 14:32:45 -05:00
Simon Pilgrim 55f15f99cb [AggressiveInstCombine] foldGuardedRotateToFunnelShift - generalize rotation to funnel shift matcher.
Replace matchRotate with a more general matchFunnelShift - at the moment this is still just used for rotation patterns.
2020-11-02 17:09:17 +00:00
Fangrui Song 98b9338588 [Debugify] Port -debugify-each to NewPM
Preemptively switch 2 tests to the new PM

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D90365
2020-11-02 08:16:43 -08:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Teresa Johnson 0949f96dc6 [MemProf] Pass down memory profile name with optional path from clang
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.

Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.

Change the existing memprof tests to specify log_path=stderr when that
was being relied on.

Depends on D89086.

Differential Revision: https://reviews.llvm.org/D89087
2020-11-01 17:38:23 -08:00
Florian Hahn ca38652b9a [VPlan] Assert no users remaining when deleting a VPValue.
When deleting a VPValue, all users must already by deleted. Add an
assertion to make sure and catch violations.
2020-11-01 17:44:53 +00:00
Florian Hahn aab71d4443 [DSE] Use same logic as legacy impl to check if free kills a location.
This patch updates DSE + MemorySSA to use the same check as the legacy
implementation to determine if a location is killed by a free call.

This changes the existing behavior so that a free does not kill
locations before the start of the freed pointer.

This should fix PR48036.
2020-10-31 20:09:25 +00:00
Florian Hahn 799033d8c5 Reland "[SLP] Consider alternatives for cost of select instructions."
This reverts the revert commit a1b53db324.

This patch includes a fix for a reported issue, caused by
matchSelectPattern returning UMIN for selects of pointers in
some cases by looking to some connected casts.

For now, ensure integer instrinsics are only returned for selects of
ints or int vectors.
2020-10-31 16:52:36 +00:00
Simon Pilgrim 538fdb0189 [InstCombine] foldSelectRotate - generalize to foldSelectFunnelShift
This is the last of the rotate->funnel shift InstCombine generalizations for PR46896

We still have foldGuardedRotateToFunnelShift to deal with in AggressiveInstCombine

Differential Revision: https://reviews.llvm.org/D90382
2020-10-31 12:32:34 +00:00
Simon Pilgrim 4da6a48399 [CSE] Make some basic EarlyCSE::StackNode helper methods const. NFCI.
Fixes a number of cppcheck remarks.
2020-10-31 12:16:48 +00:00
Nikita Popov 27f647d117 [Inliner] Consistently apply callsite noalias metadata
Previously, !noalias and !alias.scope metadata on the call site was
applied as part of CloneAliasScopeMetadata(), which short-circuits
if the callee does not use any noalias metadata itself. However,
these two things have no relation to each other.

Consistently apply !noalias and !alias.scope metadata by integrating
this into an existing function that handled !llvm.access.group and
!llvm.mem.parallel_loop_access metadata. The handling for all of
these metadata kinds essentially the same.
2020-10-31 10:54:45 +01:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Florian Hahn a1b53db324 Revert "[SLP] Consider alternatives for cost of select instructions."
This reverts commit 1922570489.

This appears to cause a crash in the following example

 a, b, c;
 l() {
   int e = a, f = l, g, h, i, j;
   float *d = c, *k = b;
   for (;;)
     for (; g < f; g++) {
       k[h] = d[i];
       k[h - 1] = d[j];
       h += e << 1;
       i += e;
     }
 }

 clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c

 llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.
2020-10-30 21:26:14 +00:00
Florian Hahn 408c4408fa Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df5.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Peter Collingbourne 3d049bce98 hwasan: Support for outlined checks in the Linux kernel.
Add support for match-all tags and GOT-free runtime calls, which
are both required for the kernel to be able to support outlined
checks. This requires extending the access info to let the backend
know when to enable these features. To make the code easier to maintain
introduce an enum with the bit field positions for the access info.

Allow outlined checks to be enabled with -mllvm
-hwasan-inline-all-checks=0. Kernels that contain runtime support for
outlined checks may pass this flag. Kernels lacking runtime support
will continue to link because they do not pass the flag. Old versions
of LLVM will ignore the flag and continue to use inline checks.

With a separate kernel patch [1] I measured the code size of defconfig
+ tag-based KASAN, as well as boot time (i.e. time to init launch)
on a DragonBoard 845c with an Android arm64 GKI kernel. The results
are below:

         code size    boot time
before    92824064      6.18s
after     38822400      6.65s

[1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76

Depends on D90425

Differential Revision: https://reviews.llvm.org/D90426
2020-10-30 14:25:40 -07:00
Peter Collingbourne 0930763b4b hwasan: Move fixed shadow behind opaque no-op cast as well.
This is a workaround for poor heuristics in the backend where we can
end up materializing the constant multiple times. This is particularly
bad when using outlined checks because we materialize it for every call
(because the backend considers it trivial to materialize).

As a result the field containing the shadow base value will always
be set so simplify the code taking that into account.

Differential Revision: https://reviews.llvm.org/D90425
2020-10-30 13:23:52 -07:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Pedro Tammela 86e0c1acdb [NFC][Reg2Mem] modernize loops iterators
This patch updates the Reg2Mem loops to use more modern iterators.

Differential Revision: https://reviews.llvm.org/D90122
2020-10-30 16:50:07 +00:00
Pedro Tammela 70a495c7f0 [NFC][LoopSimplify] modernize for loops over LoopInfo
This patch modifies two for loops to use the range based syntax.
Since they are equivalent, this patch is tagged NFC.

Differential Revision: https://reviews.llvm.org/D90069
2020-10-30 16:50:07 +00:00
Michael Liao c82403d025 [gvn] PRE needs to skip convergent intrinsics/calls.
- As convergent intrinsics/calls could only be moved to
  control-equivalent blocks, or more precisely the same divergent
  branch, PRE needs to skip them.

Differential Revision: https://reviews.llvm.org/D90391
2020-10-30 11:24:40 -04:00
Evgeniy Brevnov 3d31adaec4 [DSE] Improve partial overlap detection
Currently isOverwrite returns OW_MaybePartial even for accesss known not to overlap. This is not a big problem for legacy implementation (since isPartialOverwrite follows isOverwrite and clarifies the result). Contrary SSA based version does a lot of work to later find out that accesses don't overlap. Besides negative impact on compile time we quickly reach MemorySSAPartialStoreLimit and miss optimization opportunities.

Note: In fact, I think it would be cleaner implementation if isOverwrite returned fully clarified result in the first place whithout need to call isPartialOverwrite. This can be done as a follow up. What do you think?

Reviewed By: fhahn, asbirlea

Differential Revision: https://reviews.llvm.org/D90371
2020-10-30 22:23:20 +07:00
Simon Pilgrim ed577892cf Use cast<> instead of dyn_cast<> as we dereference the pointers immediately. NFCI.
Fix clang static analyzer warnings - we're better off relying on cast<> asserting on failure rather than a null dereference crash.
2020-10-30 15:20:40 +00:00
Florian Hahn aa1a198a64 [VPlan] Use isa<> instead getVPRecipeID in getFirstNonPhi (NFC).
As per the comment in VPRecipeBase, clients should not rely on
getVPRecipeID, as it may change in the future. It should only be used in
classof implementations. Use isa instead in getFirstNonPhi.
2020-10-30 14:56:06 +00:00
Simon Pilgrim b7c91a9b8e [SCEV] SCEVExpander::InsertNoopCastOfTo - reduce scope of pointer type. NFCI.
By reducing the scope of the dyn_cast<PointerType> we can make this a cast<PointerType> and avoid clang static analyzer null deference warnings.
2020-10-30 14:55:09 +00:00
Florian Hahn 73f01e3df5 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
Simon Pilgrim 9e154f1aca [SROA] Pass Twine by const reference. NFCI.
Fixes clang-tidy warnings.
2020-10-30 11:36:58 +00:00
Max Kazantsev bd341bafbf [NFC] Simplify code in IndVars 2020-10-30 17:49:32 +07:00
Florian Hahn 05e4f7bde9 [DSE] Remove noop stores after killing stores for a MemoryDef.
Currently we fail to eliminate some noop stores if there is a kill-able
store between the starting def and the load. This is because we
eliminate noop stores first.

In practice it seems like eliminating noop stores after the main
elimination for a def covers slightly more cases.

This patch improves the number of stores slightly in 2 cases for X86 -O3
-flto

Same hash: 235 (filtered out)
Remaining: 2
Metric: dse.NumRedundantStores

Program                                          base      patch diff
 test-suite...ce/Benchmarks/PAQ8p/paq8p.test     2.00   3.00 50.0%
 test-suite...006/453.povray/453.povray.test    18.00  21.00 16.7%

There might be other phase ordering issues, but it appears that they do
not show up in the test-suite/SPEC2000/SPEC2006. We can always tune the
ordering later.

Partly fixes PR47887.

Reviewed By: asbirlea, zoecarver

Differential Revision: https://reviews.llvm.org/D89650
2020-10-30 09:40:15 +00:00
Roman Lebedev 81fc53a36a
[SCEV] Introduce SCEVPtrToIntExpr (PR46786)
And use it to model LLVM IR's `ptrtoint` cast.

This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)

As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
  and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
  to not loose pointer-ness of an expression in more cases

As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.

But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89456
2020-10-30 11:13:35 +03:00
Vitaly Buka 36fa658db5 [NFC] Fix "ambiguous overload for ‘operator=’"
From D89768
2020-10-30 00:43:32 -07:00
Vitaly Buka 1455259546 [NFC] Fix "ambiguous overload for ‘operator=’" 2020-10-30 00:36:50 -07:00
Xun Li 9f5a2beadc [Coroutine] Properly determine whether an alloca should live on the frame
The existing logic in determining whether an alloca should live on the frame only looks explicit def-use relationships. However a value defined by an alloca may be implicitly needed across suspension points, either because an alias has across-suspension-point def-use relationship, or escaped by store/call/memory intrinsics. To properly handle all these cases, we have to properly visit the alloca pointer up-front. Thie patch extends the exisiting alloca use visitor to determine whether an alloca should live on the frame.

Differential Revision: https://reviews.llvm.org/D89768
2020-10-29 23:56:05 -07:00
Stefanos Baziotis a3345300b6 [LCSSA] Doc for special treatment of PHIs
Differential Revision: https://reviews.llvm.org/D89739
2020-10-29 22:50:07 +02:00
Nikita Popov 20b386aae0 [LoopUtils] Fix neutral value for vector.reduce.fadd
Use -0.0 instead of 0.0 as the start value. The previous use of 0.0
was fine for all existing uses of this function though, as it is
always generated with fast flags right now, and thus nsz.
2020-10-29 21:45:13 +01:00
Florian Hahn 1922570489 [SLP] Consider alternatives for cost of select instructions.
Some architectures do not have general vector select instructions (e.g.
AArch64). But some cmp/select patterns can be vectorized using other
instructions/intrinsics.

One example is using min/max instructions for certain patterns.

This patch updates the cost calculations for selects in the SLP
vectorizer to consider using min/max intrinsics.

This patch does not change SLP vectorizer's codegen itself to actually
generate those intrinsics, but relies on the backends to lower the
vector cmps & selects. This keeps things simple on the SLP side and
works well in practice for AArch64.

This exposes additional SLP vectorization opportunities in some
benchmarks on AArch64 (-O3 -flto).

Metric: SLP.NumVectorInstructions

Program                                        base    slp     diff
 test-suite...ications/JM/ldecod/ldecod.test   502.00  697.00  38.8%
 test-suite...ications/JM/lencod/lencod.test   1023.00 1414.00 38.2%
 test-suite...-typeset/consumer-typeset.test    56.00   65.00  16.1%
 test-suite...6/464.h264ref/464.h264ref.test   804.00  822.00   2.2%
 test-suite...006/453.povray/453.povray.test   3335.00 3357.00  0.7%
 test-suite...CFP2000/177.mesa/177.mesa.test   2110.00 2121.00  0.5%
 test-suite...:: External/Povray/povray.test   2378.00 2382.00  0.2%

Reviewed By: RKSimon, samparker

Differential Revision: https://reviews.llvm.org/D89969
2020-10-29 20:39:50 +00:00
Dávid Bolvanský 7a2abf5aca [InferAttrs] Add nocapture/writeonly to string/mem libcalls
One step closer to fix PR47644.

Differential Revision: https://reviews.llvm.org/D89645
2020-10-29 20:06:43 +01:00
Simon Pilgrim dcb3dc101d [InstCombine] visitShl - ensure inner shifts have inrange amounts
Noticed when fixing OSS Fuzz #26716
2020-10-29 15:28:15 +00:00
Max Kazantsev a5b2e795c3 [NFC][SCEV] Refactor monotonic predicate checks to return enums instead of bools
This patch gets rid of output parameter which is not needed for most users
and prepares this API for further refactoring.
2020-10-29 16:01:25 +07:00
Johannes Doerfert d39f574dcc [Attributor][FIX] Properly promote arguments pointers to arrays
When we promote pointer arguments we did compute a wrong offset and use
a wrong type for the array case.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
2020-10-29 00:45:32 -05:00
Fangrui Song 39856d5d0b [Debugify] Move global namespace functions into llvm::
Also move exportDebugifyStats from tools/opt to Debugify.cpp
2020-10-28 19:11:41 -07:00
Florian Hahn 53f4c4b2cc [InstCombine] Do not introduce bitcasts for swifterror arguments.
The following constraints hold for swifterror values:

    A swifterror value (either the parameter or the alloca) can only
    be loaded and stored from, or used as a swifterror argument.

This patch updates instcombine to not try to convert a bitcast of a
function into a bitcast of a swifterror argument.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D90258
2020-10-28 21:52:12 +00:00
Benjamin Kramer 207cf71fa9 Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API"
This reverts commit d981c7b758 and
a87d7b3d44. Test fails under msan.
2020-10-28 13:58:14 +01:00
Max Kazantsev 160a453138 Return "[IndVars] Remove monotonic checks with unknown exit count"
This reverts commit e038b60d91.
This reverts commit a0d84d8031.

This revert was a mistake. The reason of the failures was
"Use uint64_t for branch weights instead of uint32_t"

Differential Revision: https://reviews.llvm.org/D87832
2020-10-28 18:51:40 +07:00
Florian Hahn b82f80057d [DSE] Use walker to skip noalias stores between current & clobber def.
Instead of getting the defining access we should be able to use
getClobberingMemoryAccess to skip non-aliasing MemoryDefs. No additional
checks should be needed, because we only remove the starting def if it
matches the defining access of the load. All we need to worry about is
that there are no (may)alias stores between the starting def and the
load and getClobberingMemoryAccess should guarantee that.

Partly fixes PR47887.

This improves the number of redundant stores removed in some cases
(numbers below for MultiSource, SPEC2000, SPEC2006 on X86 with -flto
-O3).

Same hash: 226 (filtered out)
Remaining: 11
Metric: dse.NumRedundantStores

Program                                        base   patch1 diff
 test-suite...:: External/Povray/povray.test     1.00   5.00 400.0%
 test-suite...chmarks/MallocBench/gs/gs.test     1.00   3.00 200.0%
 test-suite...0/253.perlbmk/253.perlbmk.test    21.00  37.00 76.2%
 test-suite...0.perlbench/400.perlbench.test    24.00  37.00 54.2%
 test-suite.../Applications/SPASS/SPASS.test     3.00   4.00 33.3%
 test-suite...006/453.povray/453.povray.test    15.00  18.00 20.0%
 test-suite...T2006/445.gobmk/445.gobmk.test    27.00  29.00  7.4%
 test-suite.../CINT2006/403.gcc/403.gcc.test   136.00 137.00  0.7%
 test-suite.../CINT2000/176.gcc/176.gcc.test     6.00   6.00  0.0%
 test-suite.../Benchmarks/Bullet/bullet.test    NaN     3.00  nan%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test    NaN     1.00  nan%

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89647
2020-10-28 11:01:25 +00:00
Luqman Aden 4c0a016927 Rename EHPersonality::MSVC_Win64SEH to EHPersonality::MSVC_TableSEH. NFC.
The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining
vs table-based exception handling. x86-32 is the only arch for which Windows
uses the former. 32-bit ARM would use what is called Win64SEH today, which
is a bit confusing so instead let's just rename it to be a bit more clear.

Reviewed By: compnerd, rnk

Differential Revision: https://reviews.llvm.org/D90117
2020-10-27 23:22:13 -07:00
Kazu Hirata b2f05fae80 [JumpThreading] Remove extraneous calls to setEdgeProbability
This patch removes extraneous calls to setEdgeProbability introduced
in c91487769d.

The follow-up patch, a7b662d0f4, has
since fixed BranchProbabilityInfo::eraseBlock, so we don't need to
worry about getting stale values from getEdgeProbability.

Also, since getEdgeProbability(BB, BB->getSingleSuccessor()) returns
edge probability 1/1 by default for BB with exactly one successor
edge, we don't need to explicitly call setEdgeProbability.

This patch introduces almost no functional change, but we do end up
reducing debug messages from setEdgeProbability.

Differential Revision: https://reviews.llvm.org/D90284
2020-10-27 21:12:54 -07:00
Johannes Doerfert d13daa4018 [Attributor] Finalize the CGUpdater after each SCC
This matches the new PM model.
2020-10-27 22:07:56 -05:00
Johannes Doerfert 50d34958df [Attributor][NFC] Introduce a debug counter for `AA::manifest`
This will simplify debugging and tracking down problems.
2020-10-27 22:07:56 -05:00
Johannes Doerfert 1d57b7f503 [Attributor][NFC] Print the right value in debug output 2020-10-27 22:07:55 -05:00
Johannes Doerfert 1c2531c9e1 [Attributor][FIX] Delete all unreachable static functions
Before we used to only mark unreachable static functions as dead if all
uses were known dead. Now we optimistically assume uses to be dead until
proven otherwise.
2020-10-27 22:07:55 -05:00
Johannes Doerfert bfe05b1aff [Attributor][FIX] Do not attach range metadata to the wrong Instruction
If we are looking at a call site argument it might be a load or call
which is in a different context than the call site argument. We cannot
simply use the call site argument range for the call or load.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
2020-10-27 22:07:55 -05:00
Johannes Doerfert 724fcce109 [Attributor][NFC] Clang-format 2020-10-27 22:07:55 -05:00
Johannes Doerfert d504f7b91a [Attributor][NFC] Hoist call out of a lambda
The call is not free, unsure if  this is needed but it does not make it
worse either.
2020-10-27 22:07:54 -05:00
Johannes Doerfert 30e5a1f0be [Attributor][FIX] Properly check uses in the call not uses of the call
In the AANoAlias logic we determine if a pointer may have been captured
before a call. We need to look at other uses in the call not uses of the
call.

The new code is not perfect as it does not allow trivial cases where the
call has multiple arguments but it is at least not unsound and a TODO
was added.
2020-10-27 22:07:54 -05:00
Johannes Doerfert cb813ab66a [Attributor][NFC] Improve time trace output 2020-10-27 22:07:54 -05:00
Kazu Hirata c91487769d [JumpThreading] Set edge probabilities when creating basic blocks
This patch teaches the jump threading pass to set edge probabilities
whenever the pass creates new basic blocks.

Without this patch, the compiler sometimes produces non-deterministic
results.  The non-determinism comes from the jump threading pass using
stale edge probabilities in BranchProbabilityInfo.  Specifically, when
the jump threading pass creates a new basic block, we don't initialize
its outgoing edge probability.

Edge probabilities are maintained in:

  DenseMap<Edge, BranchProbability> Probs;

in class BranchProbabilityInfo, where Edge is an ordered pair of
BasicBlock * and a successor index declared as:

  using Edge = std::pair<const BasicBlock *, unsigned>;

Probs maps edges to their corresponding probabilities.

Now, we rarely remove entries from this map, so if we happen to
allocate a new basic block at the same address as a previously deleted
basic block with an edge probability assigned, the newly created basic
block appears to have an edge probability, albeit a stale one.

This patch fixes the problem by explicitly setting edge probabilities
whenever the jump threading pass creates new basic blocks.

Differential Revision: https://reviews.llvm.org/D90106
2020-10-27 16:07:27 -07:00
Joseph Huber a87d7b3d44 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.

Reviewers: jdoervert

Differential Revision: https://reviews.llvm.org/D89802
2020-10-27 16:09:19 -04:00
Nicolai Hähnle e025d09b21 Revert multiple patches based on "Introduce CfgTraits abstraction"
These logically belong together since it's a base commit plus
followup fixes to less common build configurations.

The patches are:

Revert "CfgInterface: rename interface() to getInterface()"

This reverts commit a74fc48158.

Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5"

This reverts commit f2a06875b6.

Revert "Try to make GCC5 happy about the CfgTraits thing"

This reverts commit 03a5f7ce12.

Revert "Introduce CfgTraits abstraction"

This reverts commit c0cdd22c72.
2020-10-27 20:33:30 +01:00
Nicolai Hähnle ce6900c6cb Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes"
This reverts commit 848a68a032.
2020-10-27 20:33:29 +01:00
Vedant Kumar 5a3ef55a52 [Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746)
This patch changes MergeBlockIntoPredecessor to skip the call to
RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to
some compile-time issues spotted in LoopUnroll and SimplifyCFG.

The call to RemoveRedundantDbgInstrs appears to have changed the
worst-case behavior of the merging utility. Loosely speaking, it seems
to have gone from O(#phis) to O(#insts).

It might not be possible to mitigate this by scanning a block to
determine whether there are any debug intrinsics to remove, since such a
scan costs O(#insts).

So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly
little fallout from this, and most of it can be addressed by doing
RemoveRedundantDbgInstrs later. The exception is (the block-local
version of) SimplifyCFG, where it might just be too expensive to call
RemoveRedundantDbgInstrs.

Differential Revision: https://reviews.llvm.org/D88928
2020-10-27 10:12:59 -07:00
Raphael Isemann e038b60d91 Revert "[IndVars] Remove monotonic checks with unknown exit count"
This reverts commit c6ca26c0bf.
This breaks stage2 builds due to hitting this assert:
```
   Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights
```
when compiling AArch64RegisterBankInfo.cpp in LLVM.
2020-10-27 15:31:37 +01:00
Raphael Isemann a0d84d8031 Revert "[NFC] Factor away lambda's redundant parameter"
This reverts commit fdc845b361.
It seems to be a follow-up to c6372b3fb495 which will be reverted.
2020-10-27 15:30:52 +01:00
Simon Pilgrim bce770ffa6 Revert rG0905bd5c2fa42bd4c "[InstCombine] collectBitParts - add trunc support."
This reverts commit 0905bd5c2f.

Causing failures in multistage buildbots that I need to investigate
2020-10-27 13:43:54 +00:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Simon Pilgrim 0905bd5c2f [InstCombine] collectBitParts - add trunc support.
This should allow us to remove the rather limited matchOrConcat fold and just use recognizeBSwapOrBitReverseIdiom.
2020-10-27 13:14:54 +00:00
Roman Lebedev 0ac56e8eaa
[InstCombine] Fold `(X >>? C1) << C2` patterns to shift+bitmask (PR37872)
This is essentially finalizes a revert of rL155136,
because nowadays the situation has improved, SCEV can model
all these patterns well, and we canonicalize rotate-like patterns
into a funnel shift intrinsics in InstCombine.
So this should not cause any pessimization.

I've verified the canonicalize-{a,l}shr-shl-to-masking.ll transforms
with alive, which confirms that we can freely preserve exact-ness,
and no-wrap flags.

Profs:
* base: https://rise4fun.com/Alive/gPQ
* exact-ness preservation: https://rise4fun.com/Alive/izi
* nuw preservation: https://rise4fun.com/Alive/DmD
* nsw preservation: https://rise4fun.com/Alive/SLN6N
* nuw nsw preservation: https://rise4fun.com/Alive/Qp7

Refs. https://reviews.llvm.org/D46760
2020-10-27 14:42:53 +03:00
Florian Hahn f067bc3c0a [LoopRotation] Allow loop header duplication if vectorization is forced.
-Oz normally does not allow loop header duplication so this loop wouldn't be
vectorized.  However the vectorization pragma should override this and allow
for loop rotation.

rdar://problem/49281061

Original patch by Adam Nemet.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D59832
2020-10-27 09:28:01 +00:00
Max Kazantsev fdc845b361 [NFC] Factor away lambda's redundant parameter 2020-10-27 12:56:52 +07:00
Serguei Katkov b69919b537 [GVN LoadPRE] Add an option to disable splitting backedge
GVN Load PRE can split the backedge causing breaking the loop structure where the latch
contains the conditional branch with for example induction variable.

Different optimizations expect this form of the loop, so it is better to preserve it for some time.
This CL adds an option to control an ability to split backedge.

Default value is true so technically it is NFC and current behavior is not changed.

Reviewers: fedor.sergeev, mkazantsev, nikic, reames, fhahn
Reviewed By: mkazasntsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89854
2020-10-27 11:59:52 +07:00
Max Kazantsev c6ca26c0bf [IndVars] Remove monotonic checks with unknown exit count
Even if the exact exit count is unknown, we can still prove that this
exit will not be taken. If we can prove that the predicate is monotonic,
fulfilled on first & last iteration, and no overflow happened in between,
then the check can be removed.

Differential Revision: https://reviews.llvm.org/D87832
Reviewed By: apilipenko
2020-10-27 11:35:16 +07:00
Arthur Eubanks 42f76e193b Reland [AlwaysInliner] Pass callee AAResults to InlineFunction()
Test copied from noalias-calls.ll with small changes.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89609
2020-10-26 20:40:46 -07:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Arthur Eubanks 4af5ba1726 Revert "[AlwaysInliner] Pass callee AAResults to InlineFunction()"
This reverts commit 504fbec7a6.

Test failure.
2020-10-26 20:23:38 -07:00
Arthur Eubanks 504fbec7a6 [AlwaysInliner] Pass callee AAResults to InlineFunction()
Test copied from noalias-calls.ll with small changes.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89609
2020-10-26 20:10:09 -07:00
Arthur Eubanks 3dd1c72458 Port -objc-arc-expand to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90182
2020-10-26 20:05:10 -07:00
Arthur Eubanks 90c0b0d3d6 Port -objc-arc-apelim to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90181
2020-10-26 20:01:46 -07:00
Chen Zheng 00e573cadb [LSR] fix typo in comments and rename for a new added hook. 2020-10-26 22:29:22 -04:00
TaWeiTu 0efbfa38ae [NPM] Port -slsr to NPM
`-separate-const-offset-from-gep` has not yet be ported, so some tests are not updated.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D90149
2020-10-27 09:21:40 +08:00
Sriraman Tallam ad1b9daa4b Prepend "__uniq" to symbol names hash with -funique-internal-linkage-names.
Prepend the module name hash with a fixed string ".__uniq." which helps tools
that consume sampled profiles and attribute it to functions to understand
that this symbol belongs to a unique internal linkage type symbol.

Symbols with suffixes can result from various optimizations in the compiler.
Function Multiversioning, function splitting, parameter constant propogation,
unique internal linkage names.

External tools like sampled profile aggregators combine profiles from multiple
runs of a binary. They use various heuristics with symbols that have suffixes
to try and attribute the profile to the right function instance. For instance
multi-versioned symbols like foo.avx, foo.sse4.2, etc even though different
should be attributed to the same source function if a single function is
versioned, using attribute target_clones (supported in GCC but yet to land in
LLVM). Similarly, functions that are split (split part having a .cold suffix)
could have profiles for both the original and split symbols but would be
aggregated and attributed to the original function that was split.

Unique internal linkage functions however have different source instances and
the aggregator must not put them together but attribute it to the appropriate
function instance. To be sure that we are dealing with a symbol of a unique
internal linkage function, we would like to prepend the hash with a known
string ".__uniq." which these tools can check to understand the suffix type.

Differential Revision: https://reviews.llvm.org/D89617
2020-10-26 14:24:28 -07:00
Sanjay Patel 5a6e66ec72 [InstCombine] add folds for icmp+ctpop
https://alive2.llvm.org/ce/z/XjFPQJ

  define void @src(i64 %value) {
    %t0 = call i64 @llvm.ctpop.i64(i64 %value)
    %gt = icmp ugt i64 %t0, 63
    %lt = icmp ult i64 %t0, 64
    call void @use(i1 %gt, i1 %lt)
    ret void
  }

  define void @tgt(i64 %value) {
    %eq = icmp eq i64 %value, -1
    %ne = icmp ne i64 %value, -1
    call void @use(i1 %eq, i1 %ne)
    ret void
  }

  declare i64 @llvm.ctpop.i64(i64) #1
  declare void @use(i1, i1)
2020-10-26 16:48:56 -04:00
Sanjay Patel 437d7551c5 [InstCombine] reduce code duplication in icmp intrinsic folds; NFC 2020-10-26 16:48:56 -04:00
Stanislav Mekhanoshin 00928a1956 Fix SROA with a PHI mergig values from a same block
This fixes the bug 47945. It is legal to have a PHI with values
from from the same block, but values must stay the same. In this
case it is illegal to merge different values.

Differential Revision: https://reviews.llvm.org/D89978
2020-10-26 12:58:27 -07:00
Joe Ellis 0f83505593 [SVE][InstCombine] Fix TypeSize warning in canReplaceGEPIdxWithZero
The warning would fire when calling canReplaceGEPIdxWithZero on a GEP
whose source element type is a scalable vector. The size of scalable
vector types is not known, so this optimization cannot be performed.

This patch fixes the issue by:

- bailing out early in this routine if the GEP instruction's source
  element type is a scalable vector.

- making use of getFixedSize -- this removes the dependency on the
  deprecated interface.

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D89968
2020-10-26 17:40:26 +00:00
Joe Ellis 467e5cf40f [SVE][AArch64] Fix TypeSize warning in loop vectorization legality
The warning would fire when calling isDereferenceableAndAlignedInLoop
with a scalable load. Calling isDereferenceableAndAlignedInLoop with a
scalable load would result in the use of the now deprecated implicit
cast of TypeSize to uint64_t through the overloaded operator.

This patch fixes this issue by:

- no longer considering vector loads as candidates in
  canVectorizeWithIfConvert. This doesn't make sense in the context of
  identifying scalar loads to vectorize.

- making use of getFixedSize inside isDereferenceableAndAlignedInLoop --
  this removes the dependency on the deprecated interface, and will
  trigger an assertion error if the function is ever called with a
  scalable type.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89798
2020-10-26 17:40:04 +00:00
Simon Pilgrim 532f3bec3e [InstCombine] collectBitParts - add bitreverse intrinsic support. 2020-10-26 14:36:36 +00:00
Simon Pilgrim 6b2eb31e1e [InstCombine] Add support for zext(and(neg(amt),width-1)) rotate shift amount patterns
Alive2: https://alive2.llvm.org/ce/z/bCvvHd
2020-10-26 11:22:41 +00:00
Max Kazantsev bfabd7878b Fix broken build after previous commit 2020-10-26 14:55:46 +07:00
Max Kazantsev cdccc82f48 [NFC] Remove unused funciton param 2020-10-26 14:53:22 +07:00
Max Kazantsev 4b5e848bef [NFC] Factor out common code into lambda for further improvement 2020-10-26 14:50:45 +07:00
Max Kazantsev c019099053 [IndVars] Use contextual knowledge when proving trivial conds
No exact example where it would help, but it's a generally a more
powerful way to prove predicates.
2020-10-26 13:48:32 +07:00
Simon Pilgrim 3052e474ec [InstCombine] matchBSwapOrBitReversem - recognise or(fshl(),fshl()) bswap patterns.
I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).
2020-10-25 10:17:45 +00:00
TaWeiTu 65a36bbc3d [NPM] Port -loop-versioning-licm to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89371
2020-10-24 21:51:18 +08:00
TaWeiTu 060a4fccf1 [LoopVersioning] Form dedicated exits for versioned loop to preserve simplify form
The exit blocks of the versioned and non-versioned loops are not dedicated and thus the two loops are not in simplify form.
Insert dummy exit blocks after loop versioning with `formDedicatedExits()` to preserve the simplify form for subsequence passes.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89569
2020-10-24 21:40:46 +08:00
Simon Pilgrim 310f62b4ff [InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155)
As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns.

Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types.

Differential Revision: https://reviews.llvm.org/D89542
2020-10-24 12:42:43 +01:00
Hongtao Yu a16cbdd676 [AutoFDO] Remove a broken assert in merging inlinee samples
Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.

To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like

```
A()
{
  B();  // with nested profile B1 and C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2
}
```

For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into

```
A()
{
  C();  // with nested profile C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2.
}
```

Here is what happens next:

	1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite.
	2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count.
	3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on.
        4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.

It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.

The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D90056
2020-10-23 17:42:21 -07:00
Arthur Eubanks baffd052b0 [StructurizeCFG][NewPM] Port -structurizecfg to NPM
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.

The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.

This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89026
2020-10-23 15:54:03 -07:00
Arthur Eubanks ba22c403b2 [Inliner][NPM] Properly pass callee AAResults
Fixes noalias-calls.ll under NPM.

Differential Revision: https://reviews.llvm.org/D89592
2020-10-23 15:37:18 -07:00
Artur Pilipenko 6ec2c5e402 GC-parseable element atomic memcpy/memmove
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.

See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ

Differential Revision: https://reviews.llvm.org/D88861
2020-10-23 14:06:09 -07:00