Commit Graph

665 Commits

Author SHA1 Message Date
serge-sans-paille 7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Matthias Braun 6a383369f9 PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs
The `SplitIndirectBrCriticalEdges` function was originally designed for
`CodeGenPrepare` and skipped splitting of edges when the destination
block didn't contain any `PHI` instructions. This only makes sense when
reducing COPYs like `CodeGenPrepare`. In the case of
`PGOInstrumentation` or `GCOVProfiling` it would result in missed
counters and wrong result in functions with computed goto.

Differential Revision: https://reviews.llvm.org/D120096
2022-02-23 16:27:37 -08:00
Bjorn Pettersson 0352ee1a22 [CodeGenPrepare] Avoid out-of-bounds shift
AddressingModeMatcher::matchOperationAddr may attempt to shift a
variable by the same amount of steps as found in the IR in a SHL
instruction. This was done without considering that there could be
undefined behavior in the IR, so the shift performed when compiling
could end up having undefined behavior as well.

This patch avoid UB in the codegenprepare by making sure that we
limit the shift amount used, in a similar way as already being done
in CodeGenPrepare::optimizeLoadExt.

Differential Revision: https://reviews.llvm.org/D118602
2022-02-03 21:03:58 +01:00
Kazu Hirata 2bea207d26 [CodeGen] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-01-30 12:32:51 -08:00
Simon Pilgrim 20d46fbd4a [CodeGenPrepare] Use dyn_cast result to check for null pointers
Simplifies logic and helps the static analyzer correctly check for nullptr dereferences
2022-01-23 12:47:52 +00:00
David Sherwood f4515ab858 Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants"
This reverts commit 197f3c0deb.

Reverting after miscompilation errors discovered with ffmpeg.
2022-01-18 08:40:20 +00:00
David Sherwood 197f3c0deb [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

  CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

  CodeGen/AArch64/reduce-and.ll
  CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Differential Revision: https://reviews.llvm.org/D114357
2022-01-17 11:08:57 +00:00
David Sherwood ba471ba8d2 Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants"
This reverts commit 31009f0b5a.

It seems to be causing SVE VLA buildbot failures and has introduced a
genuine regression. Reverting for now.
2022-01-13 15:59:43 +00:00
David Sherwood 31009f0b5a [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

  CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

  CodeGen/AArch64/dag-numsignbits.ll
  CodeGen/AArch64/reduce-and.ll
  CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Differential Revision: https://reviews.llvm.org/D114357
2022-01-13 09:43:07 +00:00
Kazu Hirata bb6447a78c [llvm] Use llvm::reverse (NFC) 2021-12-12 16:13:49 -08:00
Kazu Hirata 3aed282257 [CodeGen] Use range-based for loops (NFC) 2021-12-03 20:45:59 -08:00
Tim Northover f9089accba CodeGenPrep: remove all copies of GEP from list if there are duplicates.
Unfortunately ToT has changed enough from the revision where this actually
caused problems that the test no longer triggers an assertion failure.
2021-10-25 14:00:02 +01:00
Fraser Cormack eabf11f9ea [CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz
This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112141
2021-10-20 16:45:55 +01:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Kazu Hirata f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Craig Topper aeb63d464f [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor.
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110106
2021-09-21 10:07:29 -07:00
Kazu Hirata 84b07c9b3a [llvm] Use pop_back_val (NFC) 2021-09-19 13:44:23 -07:00
Kazu Hirata 48719e3b18 [CodeGen] Use make_early_inc_range (NFC) 2021-09-18 09:29:24 -07:00
Nikita Popov 4189e5fe12 [CGP] Support opaque pointers in address mode fold
Rather than inspecting the pointer element type, use the access
type of the load/store/atomicrmw/cmpxchg.

In the process of doing this, simplify the logic by storing the
address + type in MemoryUses, rather than an Instruction + Operand
pair (which was then used to fetch the address).
2021-09-12 17:43:37 +02:00
Andrew Wei c9066c5d37 [CGP] Fix the crash for combining address mode when having cyclic dependency
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D108635
2021-08-26 22:52:42 +08:00
Tiehu Zhang 9cfa9b44a5 [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107262
2021-08-17 18:58:15 +08:00
Simon Pilgrim 478b22d95a [CGP] despeculateCountZeros - Don't create is-zero branch if cttz/ctlz source is known non-zero
If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from).

Differential Revision: https://reviews.llvm.org/D106685
2021-07-24 13:11:49 +01:00
Paulo Matos 46667a1003 [WebAssembly] Implementation of global.get/set for reftypes in LLVM IR
Reland of 31859f896.

This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D104797
2021-07-22 22:07:24 +02:00
Stephen Tozer 14b62f7e2f [DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.

This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.

Differential Revision: https://reviews.llvm.org/D105129
2021-07-05 10:35:19 +01:00
Roman Lebedev c2c0d3ea89
Revert "[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 4facbf213c.

```
********************
FAIL: LLVM :: CodeGen/WebAssembly/funcref-call.ll (44466 of 44468)
******************** TEST 'LLVM :: CodeGen/WebAssembly/funcref-call.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /builddirs/llvm-project/build-Clang12/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll --mtriple=wasm32-unknown-unknown -asm-verbose=false -mattr=+reference-types | /builddirs/llvm-project/build-Clang12/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll
--
Exit Code: 2

Command Output (stderr):
--
llc: /repositories/llvm-project/llvm/include/llvm/Support/LowLevelTypeImpl.h:44: static llvm::LLT llvm::LLT::scalar(unsigned int): Assertion `SizeInBits > 0 && "invalid scalar size"' failed.

```
2021-07-02 11:49:51 +03:00
Paulo Matos 4facbf213c [WebAssembly] Implementation of global.get/set for reftypes in LLVM IR
Reland of 31859f896.

This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Differential Revision: https://reviews.llvm.org/D104797
2021-07-02 09:46:28 +02:00
Craig Topper 91319534ba [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt.
This optimization pre-promotes the input and constants for a
switch instruction to a legal type so that all the generated compares
share the same extend. Since RISCV prefers sext for i32 to i64
extends, we should honor that to use sext.w instead of a pair
of shifts.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D104612
2021-06-23 15:38:11 -07:00
Arthur Eubanks c0c5a98b2c [NFC][OpaquePtr] Explicitly pass GEP source type in optimizeGatherScatterInst()
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103480
2021-06-11 11:49:59 -07:00
David Spickett 64de8763aa Revert "Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 31859f896c.

Causing SVE and RISCV-V test failures on bots.
2021-06-10 10:11:17 +00:00
Paulo Matos 31859f896c Implementation of global.get/set for reftypes in LLVM IR
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D95425
2021-06-10 10:07:45 +02:00
David Green dd5c52029d [CPG][ARM] Optimize towards branch on zero in codegenprepare
This adds a simple fold into codegenprepare that converts comparison of
branches towards comparison with zero if possible. For example:
  %c = icmp ult %x, 8
  br %c, bla, blb
  %tc = lshr %x, 3
becomes
  %tc = lshr %x, 3
  %c = icmp eq %tc, 0
  br %c, bla, blb

As a first order approximation, this can reduce the number of
instructions needed to perform the branch as the shift is (often) needed
anyway. At the moment this does not effect very much, as llvm tends to
prefer the opposite form. But it can protect against regressions from
commits like rG9423f78240a2.

Simple cases of Add and Sub are added along with Shift, equally as the
comparison to zero can often be folded with cpsr flags.

Differential Revision: https://reviews.llvm.org/D101778
2021-05-16 17:54:06 +01:00
Sander de Smalen f9a50f04ba [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
OCHyams 0ebf9a8e34 [DebugInfo] Move the findDbg* functions into DebugInfo.cpp
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.

D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.

Reviewed By: dblaikie, rnk

Differential Revision: https://reviews.llvm.org/D100632
2021-04-19 10:30:25 +01:00
Hongtao Yu 2a2720a2de [CSSPGO] Move pseudo probes to the beginning of a block to unblock SelectionDAG combine.
Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality.

Test Plan:

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D100002
2021-04-07 22:45:35 -07:00
Philip Reames 908215b346 Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Sanjay Patel 664d0c052c [TargetTransformInfo] move branch probability query from TargetLoweringInfo
This is no-functional-change intended (NFC), but needed to allow
optimizer passes to use the API. See D98898 for a proposed usage
by SimplifyCFG.

I'm simplifying the code by removing the cl::opt. That was added
back with the original commit in D19488, but I don't see any
evidence in regression tests that it was used. Target-specific
overrides can use the usual patterns to adjust as necessary.
We could also restore that cl::opt, but it was not clear to me
exactly how to do it in the convoluted TTI class structure.
2021-03-22 15:55:34 -04:00
Stephen Tozer 3bfddc2593 Reapply "[DebugInfo] Handle multiple variable location operands in IR"
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.

This reverts commit 01ac6d1587.
2021-03-17 16:45:25 +00:00
Hans Wennborg 01ac6d1587 Revert "[DebugInfo] Handle multiple variable location operands in IR"
This caused non-deterministic compiler output; see comment on the
code review.

> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232

This reverts commit df69c69427.
2021-03-17 13:36:48 +01:00
Philip Reames 7d38a91a7f Restore fixed version of "[CodeGenPrepare] Fix isIVIncrement (PR49466)"
Change was reverted in commit 8d20f2c2c6 because it was causing an infinite loop.  9228f2f32 fixed the root issue in the code structure, this change just reapplies the original change w/adaptation to the new code structure.
2021-03-13 15:25:02 -08:00
Philip Reames 9228f2f322 [CGP] Consolidate logic for getIVIncrement and isIVIncrement
This fixes the bug demonstrated by the test case in the commit message of 8d20f2c2 (which was a revert of cf82700).  The root issue was that we have two transforms which are inverses of each other.  We use one for simple induction variables (where we can use the post-inc form), and the other for everything else.  The problem was that the two transforms could disagree about whether something was an induction variable.

The reverted commit made a change to one of the matcher routines which was used for one of the two transforms without updating the other matcher.  However, it's worth noting the existing code w/o the reverted change also has cases where the decision could differ between the two paths.

The fix is simply to consolidate the code such that two paths must agree by construction, and to add an assert to catch any potential future re-divergence.

Triggering the infinite loop requires side stepping the SunkAddrs cache.  The SunkAddrs cache has the effect of suppressing the iteration in the common case, but there are codepaths through CGP which restart iteration and clear this cache.

Unfortunately, I have not been able to construct a standalone IR test case for this.  The original test case is a c++ program which when compiled by clang demonstrates the infinite loop, but all of my attempts at extracting an IR test case runnable through opt/llc have failed to reproduce.  (Including capturing the IR at point of the transform itself!)  I have no idea what weird state clang is creating here.

I also tried creating a test case by hand, but gave up after about an hour of trying to find the right combination to dance through multiple transforms to create the end result needed to trip the bug.
2021-03-13 14:55:25 -08:00
Jordan Rupprecht 8d20f2c2c6 Revert "[CodeGenPrepare] Fix isIVIncrement (PR49466)"
This reverts commit cf82700af8 due to a compile timeout when building the following with `clang -O2`:

```
template <class, class = int> class a;
struct b {
  using d = int *;
};
struct e {
  using f = b::d;
};
class g {
public:
  e::f h;
  e::f i;
};
template <class, class> class a : g {
public:
  long j() const { return i - h; }
  long operator[](long) const noexcept;
};
template <class c, class k> long a<c, k>::operator[](long l) const noexcept {
  return h[l];
}
template <typename m, typename n> int fn1(m, n, const char *);
int o, p;
class D {
  void q(const a<long> &);
  long r;
};
void D::q(const a<long> &l) {
  int s;
  if (l[0])
    for (; l.j(); ++s) {
      if (l[s])
        while (fn1(o, 0, ""))
          ;
      r = l[s] / p;
    }
}
```
2021-03-12 13:59:14 -08:00
Nikita Popov 42eb658f65 [OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.

There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
2021-03-12 21:01:16 +01:00
Philip Reames d6394d86ca [cgp] improve robustness of uadd/usub transforms
LSR prefers to schedule iv increments just before the latch.  The recent 80511565 broadened this to moving increments in the original IR.  This pointed out a robustness problem with the CGP transform.

When we have a use of an induction increment outside of the loop (we canonicalize away from this form, but it happens e.g. unanalyzeable loops) we'd avoid performing the uadd/usub transform.  Interestingly, all of these involve moving the increment closer to it's operands, so there's no concern about dominating all uses.  We can handle that case cheaply, resulting in a more robust transform.
2021-03-09 11:52:08 -08:00
Philip Reames e85d798b5b [cgp] group related code together [nfc] 2021-03-09 11:23:15 -08:00
gbtozers df69c69427 [DebugInfo] Handle multiple variable location operands in IR
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232
2021-03-09 16:44:38 +00:00
Ta-Wei Tu cf82700af8 [CodeGenPrepare] Fix isIVIncrement (PR49466)
In the NFC commit 8d835f42a5, the check for `!L` is
moved to a separate function `getIVIncrement` which, instead of using `BO->getParent()`,
uses `PN->getParent()`. However, these two basic blocks are not necessarily the same.

https://bugs.llvm.org/show_bug.cgi?id=49466 demonstrates a case where `PN` is contained in
a loop while `BO` is not, causing the null-pointer dereference in `L->getLoopLatch()`.

This patch checks whether both `BO` and `PN` belong to the same loop before entering `getIVIncrement`.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D98144
2021-03-09 13:32:34 +08:00
gbtozers e5d958c456 [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
2021-03-08 14:36:13 +00:00
Philip Reames 6af94d22f7 [cgp] Defer lazy domtree usage to last possible point
This is a compile time optimization for d9e93e8e5. Not sure this matters or not, but why not do it just in case.

This does involve querying TLI with a potentially invalid addressing mode for the using instruction, but since we don't actually pass the using instruction to the TLI callback, that should be fine.
2021-03-04 10:19:45 -08:00
Philip Reames e0cfd45171 [CGP] Lazily compute domtree only when needed during address matching
This is a compile time optimization for d9e93e8e5.  As pointed out in post dommit review on the original review (D96399), there was a moderately large compile time regression with this patch and the eager computation of domtree on matcher construction is the first obvious candidate for why.
2021-03-04 09:32:57 -08:00
Jann Horn 91c9dee3fb [CodeGenPrepare] Eliminate llvm.expect before removing empty blocks
CodeGenPrepare currently first removes empty blocks, then in a loop
performs other optimizations. One of those optimizations is the removal
of call instructions that invoke @llvm.assume, which can create new
empty blocks.

This means that when a branch only contains a call to __builtin_assume(),
the empty branch will survive into MIR, and will then only be
half-removed by MIR-level optimizations (e.g. removing the branch but
leaving the condition intact).

Fix it by eliminating @llvm.expect builtin calls before removing empty
blocks.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D97848
2021-03-04 14:48:26 +01:00
Max Kazantsev 9d5af55589 [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the offset, part 2
This patch enables the case where we do not completely eliminate offset.
Supposedly in this case we reduce live range overlap that never harms, but
since there are doubts this is true, this goes as a separate change.

Differential Revision: https://reviews.llvm.org/D96399
Reviewed By: reames
2021-03-04 16:47:43 +07:00
Max Kazantsev d9e93e8e57 [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the offset, part 1
While optimizing the memory instruction, we sometimes need to add
offset to the value of `IV`. We could avoid doing so if the `IV.next` is
already defined at the point of interest. In this case, we may get two
possible advantages from this:

- If the `IV` step happens to match with the offset, we don't need to add
  the offset at all;
- We reduce overlap of live ranges of `IV` and `IV.next`. They may stop overlapping
  and it will lead to better register allocation. Even if the overlap will preserve,
  we are not introducing a new overlap, so it should be a neutral transform (Disabled
  this patch, will come with follow-up).

Currently I've only added support for IVs that get decremented using `usub`
intrinsic. We could also support `AddInstr`, however there is some weird
interaction with some other transform that may lead to infinite compilation
in this case (seems like same transform is done and undone over and over).
I need to investigate why it happens, but generally we could do that too.

The first part only handles case where this reuse fully elimiates the offset.

Differential Revision: https://reviews.llvm.org/D96399
Reviewed By: reames
2021-03-04 15:22:55 +07:00
Max Kazantsev 9fac8496ea [NFC] Detect IV increment expressed as uadd_with_overflow and usub_with_overflow
Current callers do not call it with such argument, so this is NFC.
But for further changes, it can be very useful to detect such cases.
2021-03-01 13:24:01 +07:00
Max Kazantsev 8d835f42a5 [NFC] Introduce function getIVStep for further reuse 2021-03-01 13:04:56 +07:00
Max Kazantsev fdbad5e5ac [NFC] Whitespace fix 2021-03-01 12:14:03 +07:00
Max Kazantsev 2892fcc204 [NFC] Factor out IV detector function for further reuse 2021-03-01 12:11:54 +07:00
Philip Reames 0832a58e22 [cgp] Minor code improvement - reuse an existing named helper [NFC] 2021-02-26 11:51:32 -08:00
Sander de Smalen 5e19208d96 [InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep.
This fixes the types of a few more cost variables to be of type InstructionCost.
2021-02-24 14:30:03 +00:00
Kazu Hirata d61b4cb9d8 [CodeGen] Use range-based for loops (NFC) 2021-02-11 23:31:31 -08:00
Max Kazantsev 418c218efa Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment"
The patch did not account for one corner case where cmp does not dominate
the loop latch. This patch adds this check, hopefully it's cheap because
the CFG does not change during the transform, so DT queries should be
executed quickly.

If you see compile time slowness from this, please revert.

Differential Revision: https://reviews.llvm.org/D96119
2021-02-11 19:49:23 +07:00
Max Kazantsev 90081f3020 Revert "[Codegenprepare][X86] Use usub with overflow opt for IV increment"
This reverts commit 3d15b7e7df.

We've found an internal failure, need to analyze.
2021-02-11 17:52:11 +07:00
Max Kazantsev 3d15b7e7df [Codegenprepare][X86] Use usub with overflow opt for IV increment
Function `replaceMathCmpWithIntrinsic` artificially limits the scope
of the optimization, setting a requirement of two instructions be in
the same block, due to two reasons:
- usage of DT for more general check is costly in terms of compile time;
- risk of creating a new value that lives through multiple blocks.

Because of this, two semantically equivalent tests may be or not be the
subject of this opt depending on where the binary operation is located.
See `test/CodeGen/X86/usub_inc_iv.ll` for motivation

There is one important particular case where this limitation is  too strict:
it is when the binary operation is the increment of the induction variable.
As result, the application of this opt becomes fragile and highly reliant on
where other passes decide to place IV increment. In most cases, they place
it in the end of the latch block, killing the opt opportunity (when in fact it
does not matter where to insert the actual instruction).

This patch handles this particular case separately.
- The detector does not use dom tree and has constant cost;
- The value of IV or IV.next lives through all loop in any case, so this should not
  create a new unexpected long-living value.

As result, the transform becomes more robust. It also seems to lead to
better code generation in some cases (see `test/CodeGen/X86/lsr-loop-exit-cond.ll`).

Differential Revision: https://reviews.llvm.org/D96119
Reviewed By: spatel, reames
2021-02-11 11:59:45 +07:00
Guillaume Chatelet 4b15156dca [NFC] inline variable 2021-02-05 10:17:02 +00:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Jun Ma 54842fa0bb [CodeGenPrepare] Also skip lifetime.end intrinsic when check return block in dupRetToEnableTailCallOpts.
Differential Revision: https://reviews.llvm.org/D95424
2021-02-01 08:18:44 +08:00
Kazu Hirata c5c4dbd279 [CodeGen] Use llvm::append_range (NFC) 2021-01-21 19:59:46 -08:00
Kazu Hirata 23b0ab2acb [llvm] Use the default value of drop_begin (NFC) 2021-01-18 10:16:36 -08:00
Kazu Hirata 2efcbe24a7 [llvm] Use llvm::drop_begin (NFC) 2021-01-14 20:30:33 -08:00
Kazu Hirata 9850d3b10a [CodeGen, DebugInfo] Use llvm::find_if (NFC) 2021-01-10 09:24:53 -08:00
Juneyoung Lee 9f2d9364b0 [CodeGen] Update transformations to use poison for shufflevector/insertelem's initial vector elem
This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element.

The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero.
It doesn't touch the second element (which is poison).

The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than
the size of the first vector operand.
This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad.
It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form.
isDeInterleaveMask has the check that a+km is smaller than the vector size.
To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D94056
2021-01-10 18:03:51 +09:00
Kazu Hirata cfeecdf7b6 [llvm] Use llvm::all_of (NFC) 2021-01-06 18:27:36 -08:00
Kazu Hirata ba82c0b315 [llvm] Call *(Set|Map)::erase directly (NFC)
We can erase an item in a set or map without checking its membership
first.
2021-01-03 09:57:47 -08:00
Juneyoung Lee 5cdf6ed744 [CodeGen] recognize select form of and/ors when splitting branch conditions
Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a || b" in C/C++.
"a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a.
This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353).
In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison.
The discussion at D93065 has more context about this.

This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize
select form of and/or as well using m_LogicalAnd/Or.
Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93853
2021-01-01 04:46:10 +09:00
Kazu Hirata 7bc76fd0ec [CodeGen] Construct SmallVector with iterator ranges (NFC) 2020-12-31 09:39:11 -08:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Kazu Hirata 1e3ed09165 [CodeGen] Use llvm::append_range (NFC) 2020-12-28 19:55:16 -08:00
Rong Xu 3733463dbb [IR][PGO] Add hot func attribute and use hot/cold attribute in func section
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.

This patch adds support of hot function attribute to LLVM IR.  This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).

This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
    is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
    isFunctionColdInCallGraph is true, this function will be marked as
    cold.

The changes are:
(1) user annotated function attribute will used in setting function
    section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.

The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.

Differential Revision: https://reviews.llvm.org/D92493
2020-12-17 18:41:12 -08:00
Paul Walker 6d35bd1d48 [CodeGenPrepare] Update optimizeGatherScatterInst for scalable vectors.
optimizeGatherScatterInst does nothing specific to fixed length vectors
but uses FixedVectorType to extract the number of elements.  This patch
simply updates the code to use VectorType and getElementCount instead.

For testing I just copied Transforms/CodeGenPrepare/X86/gather-scatter-opt.ll
replacing `<4 x ` with `<vscale x 4`.

Differential Revision: https://reviews.llvm.org/D92572
2020-12-15 10:57:51 +00:00
Pan, Tao 7af802994e [CodeGen] Add text section prefix for COFF object file
Text section prefix is created in CodeGenPrepare, it's file format independent implementation,  text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF.
Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix.
Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix.

Reviewed By: pengfei, rnk

Differential Revision: https://reviews.llvm.org/D92073
2020-12-08 18:56:21 +08:00
Yichao Yu a248eca665
Clear NewGEPBases after finish using them in CodeGenPrep pass
AFAICT all other set/map are correctly cleared in `runOnFunction`.

With assertion enabled this causes a crash when the module is freed and potentially if a later pass delete the instruction (not observed in real world though). Without assertion this can potentially cause confusing result when running on a new Function/Module.

Reviewed By: loladiro

Differential Revision: https://reviews.llvm.org/D84031
2020-11-24 12:12:00 -05:00
Kazu Hirata 85d6af393c [CodeGen] Use pred_empty (NFC) 2020-11-22 22:16:13 -08:00
Hongtao Yu f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Vedant Kumar 5a3ef55a52 [Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746)
This patch changes MergeBlockIntoPredecessor to skip the call to
RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to
some compile-time issues spotted in LoopUnroll and SimplifyCFG.

The call to RemoveRedundantDbgInstrs appears to have changed the
worst-case behavior of the merging utility. Loosely speaking, it seems
to have gone from O(#phis) to O(#insts).

It might not be possible to mitigate this by scanning a block to
determine whether there are any debug intrinsics to remove, since such a
scan costs O(#insts).

So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly
little fallout from this, and most of it can be addressed by doing
RemoveRedundantDbgInstrs later. The exception is (the block-local
version of) SimplifyCFG, where it might just be too expensive to call
RemoveRedundantDbgInstrs.

Differential Revision: https://reviews.llvm.org/D88928
2020-10-27 10:12:59 -07:00
David Green 06fb4e9064 [CGP] Limit converting phi types to simple loads and stores
Instcombine limits converting phi types to simple loads and stores. This
does the same in codegenprepare, not processing phis that are not
simple.

Note that volatile loads/store ISel will happily convert between float
and int. Atomics are more likely to always be integer. This just keeps
things simple and doesn't process either.

Differential Revision: https://reviews.llvm.org/D83770
2020-09-14 12:08:34 +01:00
Yevgeny Rouban 88690a9658 [CodeGenPrepare] Fix zapping dead operands of assume
This patch fixes a problem of the commit 52cc97a0.
A test case is created to demonstrate the crash caused by
the instruction iterator invalidated by the recursive
removal of dead operands of assume. The solution restarts
from the blocks's first instruction in case CurInstIterator
is invalidated by RecursivelyDeleteTriviallyDeadInstructions().

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D87434
2020-09-14 11:46:34 +07:00
David Green 9237fde481 [CGP] Prevent optimizePhiType from iterating forever
The recently added optimizePhiType algorithm had no checks to make sure
it didn't continually iterate backward and forth between float and int
types. This means that given an input like store(phi(bitcast(load))), we
could convert that back and forth to store(bitcast(phi(load))). This
particular case would usually have been simplified to a different load
type (folding the bitcast into the load) before CGP, but other cases can
occur. The one that came up was phi(bitcast(phi)), where the two phi's
of different types were bitcast between. That was not helped by a dead
bitcast being kept around which could make conversion look profitable.

This adds an extra check of the bitcast Uses or Defs, to make sure that
at least one is grounded and will not end up being converted back. It
also makes sure that dead bitcasts are removed, and there is a minor
change to include newly created Phi nodes in the Visited set so that
they do not need to be revisited.

Differential Revision: https://reviews.llvm.org/D82676
2020-09-13 16:11:01 +01:00
Benjamin Kramer 5405ee553a [CodeGenPrepare] Simplify code. NFCI. 2020-09-11 11:24:08 +02:00
Craig Topper b16e8687ab [CodeGenPrepare][X86] Teach optimizeGatherScatterInst to turn a splat pointer into GEP with scalar base and 0 index
This helps SelectionDAGBuilder recognize the splat can be used as a uniform base.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D86371
2020-09-02 20:44:12 -07:00
Benjamin Kramer 52cc97a0db [CodeGenPrepare] Zap the argument of llvm.assume when deleting it
We know that the argument is mostly likely dead, so we can purge it
early. Otherwise it would make it to codegen, and can block further
optimizations.
2020-08-28 20:52:22 +02:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
David Green 5b7e27a4db [ARM][CGP] Fix scalar condition selects for MVE
The arm backend does not handle select/select_cc on vectors with scalar
conditions, preferring to expand them in codegenprepare instead. This
usually works except when optimizing for size, where the optsize check
would end up overruling the backend isSelectSupported check.

We could handle the selects in ISel too, but this seems like smaller
code than trying to splat the condition to all lanes.

Differential Revision: https://reviews.llvm.org/D86433
2020-08-25 12:09:06 +01:00
Matt Arsenault 57bd64ff84 Support addrspacecast initializers with isNoopAddrSpaceCast
Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs
with the DataLayout.
2020-07-31 10:42:43 -04:00
Andrew Litteken bcbc6117b5 [CGP] Add Pass Dependencies
Add pass dependecies:
  - TargetTransformInfoWrapperPass
  - TargetPassConfig
  - LoopInfoWrapperPass
  - TargetLibraryInfoWrapperPass

To fix inconsistencies when passes are added to the pipeline.

Reviewers: efriedma, kmclaughlin, paquette

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84346
2020-07-22 12:02:53 -07:00
Tim Northover 37b96d51d0 CodeGenPrep: remove AssertingVH references before deleting dead instructions.
CodeGenPrepare keeps fairly close track of various instructions it's
seen, particularly GEPs, in maps and vectors. However, sometimes those
instructions become dead and get removed while it's still executing.
This triggers AssertingVH references to them in an asserts build and
could lead to miscompiles in a release build (I've only seen a later
segfault though).

So this patch adds a callback to
RecursivelyDeleteTriviallyDeadInstructions which can make sure the
instruction about to be deleted is removed from CodeGenPrepare's data
structures.
2020-07-15 15:19:21 +01:00
Christopher Tetreault ff5b9a7b3b [SVE] Remove calls to VectorType::getNumElements from CodeGen
Reviewers: efriedma, fpetrogalli, sdesmalen, RKSimon, arsenm

Reviewed By: RKSimon

Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82210
2020-07-09 12:43:36 -07:00
David Sherwood 15aeb805dc [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll
For the GetElementPtr case in function
  AddressingModeMatcher::matchOperationAddr
I've changed the code to use the TypeSize class instead of relying
upon the implicit conversion to a uint64_t. As part of this we now
check for scalable types and if we encounter one just bail out for
now as the subsequent optimisations doesn't currently support them.

This changes fixes up all warnings in the following tests:

  llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

Differential Revision: https://reviews.llvm.org/D83124
2020-07-08 09:16:00 +01:00
serge-sans-paille edc7da2405 Upgrade TypePromotionTransaction to be able to report changes in CodeGenPrepare
optimizeMemoryInst was reporting no change while still modifying the IR.
Inspect the status of TypePromotionTransaction to get a better status.

Related to https://reviews.llvm.org/D80916

Differential Revision: https://reviews.llvm.org/D81256
2020-07-08 08:35:44 +02:00
Simon Pilgrim 792e4a8c97 CodeGenPrepare.cpp - remove unused IntrinsicsX86.h header. NFC. 2020-06-25 14:22:19 +01:00
Simon Pilgrim 172c36a100 Fix typos in CodeGenPrepare::splitLargeGEPOffsets comments. 2020-06-25 14:22:19 +01:00
Tres Popp 09d72ad399 Revert "[CGP] Enable CodeGenPrepares phi type convertion."
This reverts commit 67121d7b82.

This is causing compile times to be 2x slower on some large binaries.
2020-06-22 13:06:18 +02:00