Commit Graph

51 Commits

Author SHA1 Message Date
Kazu Hirata 1238378f18 [llvm] Use pop_back_val (NFC) 2021-01-23 10:56:33 -08:00
Francis Visoiu Mistrih 0cc38acfc4 [Matrix] Propagate shape information through fneg
Similar to binary operators like fadd/fmul/fsub, propagate shape info
through unary operators (fneg is the only one?).

Differential Revision: https://reviews.llvm.org/D95252
2021-01-22 14:34:28 -08:00
Roman Lebedev c845c724c2
[Utils][SimplifyCFG] Port SplitBlock() to DomTreeUpdater
This is not nice, but it's the best transient solution possible,
and is better than just duplicating the whole function.

The problem is, this function is widely used,
and it is not at all obvious that all the users
could be painlessly switched to operate on DomTreeUpdater,
and somehow i don't feel like porting all those users first.

This function is one of last three that not operate on DomTreeUpdater.
2021-01-15 23:35:56 +03:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Kazu Hirata 12fc9ca3a4 [llvm] Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-01-12 21:43:46 -08:00
Juneyoung Lee 420d046d6b clang-format, address warnings 2020-12-30 23:05:07 +09:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Arthur Eubanks 513d165b80 Port -lower-matrix-intrinsics-minimal to NPM
This reuses the existing lower-matrix-intrinsics pass rather than going
the legacy pass route of creating a new pass.

Use this new variant in the NPM -O0 pipeline.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91811
2020-11-19 17:42:48 -08:00
Arthur Eubanks aa6c305344 [LowerMatrixIntrinsics][NewPM] Fix PreservedAnalyses result
PreservedCFGCheckerInstrumentation was saying that LowerMatrixIntrinsics
didn't properly preserve CFG even though it claimed to. The legacy pass
says it doesn't. Match the legacy pass's preserved analyses.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89175
2020-10-21 12:42:16 -07:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Braedy Kuzma 24e41a34fe [Matrix] Add asserts for mismatched element types.
This patch clarifies the failing point of having input or output vectors
of differing types. Before, lowering would fail elsewhere (e.g. in
`fmul` creation) which may have been not immediately clear.

As a side effect, the `getElementType` and `getVectoryTy` functions
required the `const` qualifier to be added.

Reviewers: fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D84374
2020-07-23 16:02:48 +01:00
Florian Hahn f13a59bcff [Matrix] Use TileInfo to create tiled loop nest for matrix multiply.
This patch uses the TileInfo introduced in D77550 to generate a loop
nest for tiled matrix multiplication, instead of generating the
unrolled code for the whole multiplication. This makes code-generation
more scalable for larger matrixes.

Initially loops are only used if both the number of rows and columns are
divisible by the tile size. Other cases will be added as follow-up.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D81308
2020-07-20 21:11:53 +01:00
Florian Hahn dc1087d408 [Matrix] Add minimal lowering pass that only requires TTI.
This patch adds a new variant of the matrix lowering pass that only does
a minimal lowering and only depends on TTI. The main purpose of this pass
is to have a pass with minimal dependencies to run as part of the backend
pipeline.

At the moment, the only difference to the regular lowering pass is that it
does not support remarks. But in subsequent patches add support for tiling
to the lowering pass which will require more analysis, which we do not want
to run in the backend, as the lowering should happen in the middle-end in
practice and running it in the backend is mostly for convenience when
running llc.

Reviewers: anemet, Gerolf, efriedma, hfinkel

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76867
2020-07-20 11:16:11 +01:00
Christopher Tetreault c444b1b904 [SVE] Remove calls to VectorType::getNumElements from Scalar
Reviewers: efriedma, fhahn, reames, kmclaughlin, sdesmalen

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, dantrushin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82243
2020-07-08 11:08:20 -07:00
Simon Pilgrim 36bc10e74a [Transforms] Ensure we include CommandLine.h if we declare any cl::opt flags 2020-06-23 12:11:51 +01:00
Florian Hahn 1669fddc9f [Matrix] Use alignment info when lowering loads/stores.
This patch updates LowerMatrixIntrinsics to preserve the alignment
specified at the original load/stores and the align attribute for the
pointer argument of the column.major.load/store intrinsics.

We can always use the specified alignment for the load of the first
column. For subsequent columns, the alignment may need to be reduced.

For ConstantInt strides, compute the offset for the start of the column in
bytes and use commonAlignment to get the largest valid alignment.

For non-ConstantInt strides, we need to take the common alignment of the
initial alignment and the element size in bytes.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, rjmccall

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D81960
2020-06-18 13:19:31 +01:00
Florian Hahn d88acd8f7d [Matrix] Preserve volatile when loading loads/stores.
Currently the matrix lowering turns volatile loads/stores into
non-volatile ones. This patch updates the lowering to preserve the
volatile bit.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D81498
2020-06-18 12:14:19 +01:00
Florian Hahn 6d18c2067e [Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).

The patch performs the following changes:
 * Rename columnwise.load/store to column.major.load/store. This is more
   expressive and also more in line with the naming in Clang.
 * Changes the stride arguments from i32 to i64. The stride can be
   larger than i32 and this makes things more uniform with the way
   things are handled in Clang.
 * A new boolean argument is added to indicate whether the load/store
   is volatile. The lowering respects that when emitting vector
   load/store instructions
 * MatrixBuilder is updated to require both Alignment and IsVolatile
   arguments, which are passed through to the generated intrinsic. The
   alignment is set using the `align` attribute.

The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse

Reviewed By: anemet, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 09:44:52 +01:00
Christopher Tetreault 765ac39db2 [SVE] Eliminate calls to default-false VectorType::get() from Scalar
Reviewers: efriedma, kmclaughlin, sdesmalen, fhahn, bkramer, anna, gchatelet, c-rhodes, david-arm, fpetrogalli

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, dantrushin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80336
2020-06-09 14:09:02 -07:00
Benjamin Kramer 3badd17b69 SmallPtrSet::find -> SmallPtrSet::count
The latter is more readable and more efficient. While there clean up
some double lookups. NFCI.
2020-06-07 22:38:08 +02:00
aartbik f719e7d9e7 [llvm] [MatrixIntrinsics] Add row-major support for llvm.matrix.transpose
Summary:
Only column-major was supported so far. This adds row-major support as well.
Note that we probably also want very efficient SIMD implementations for the
various target platforms.

Bug:
https://bugs.llvm.org/show_bug.cgi?id=46085

Reviewers: nicolasvasilache, reidtatge, bkramer, fhahn, ftynse, andydavis1, craig.topper, dcaballe, mehdi_amini, anemet

Reviewed By: fhahn

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80673
2020-05-28 12:13:32 -07:00
Mircea Trofin 1b6b05a250 [llvm][NFC][CallSite] Remove CallSite from a few trivial locations
Summary: Implementation details and internal (to module) APIs.

Reviewers: craig.topper, dblaikie

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78610
2020-04-22 08:39:21 -07:00
Benjamin Kramer 166467e822 [VectorUtils] Create shufflevector masks as int vectors instead of Constants
No functionality change intended.
2020-04-17 15:28:00 +02:00
Benjamin Kramer 6f64daca8f Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints
No functionality change intended.
2020-04-15 12:51:38 +02:00
Eli Friedman 3f13ee8a00 [NFC] Modernize misc. uses of Align/MaybeAlign APIs.
Use the current getAlign() APIs where it makes sense, and use Align
instead of MaybeAlign when we know the value is non-zero.
2020-04-06 17:53:04 -07:00
Florian Hahn 6babae74c7 [Matrix] Update load/storeMatrix to take indices as Value* (NFC).
This allows using the functions to be used with loop dependent indices.
2020-04-06 14:48:48 +01:00
Florian Hahn 39f2d9aa81 [Matrix] Add option to use row-major matrix layout as default.
This patch adds a -matrix-default-layout option which can be used to
set the default matrix layout to row-major or column-major (default).

The initial patch updates codegen for loads, stores, binary operators
and matrix multiply.

Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76325
2020-04-06 10:00:56 +01:00
Florian Hahn d1fed7081d [Matrix] Add initial tiling for load/multiply/store chains.
This patch adds initial fusion for load/multiply/store chains of matrix
operations.

The patch contains roughly two parts:

1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias the store.
If they do, we create new non-aliasing copies of the operands. Note that this
may introduce new basic block. Finally we process TileSize x TileSize blocks.
That is: load tiles from the input operands, multiply and store them.

2. Identify fusion candidates & matrix instructions.
As a first step, collect all instructions with shape info and fusion candidates
(currently @llvm.matrix.multiply calls). Next, try to fuse candidates and
collect instructions eliminated by fusion. Finally iterate over all matrix
instructions, skip the ones eliminated by fusion and lower the rest as usual.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75566
2020-04-06 09:28:15 +01:00
Florian Hahn 9e81249d76 [Matrix] Rename emitChainedMatrixMultiply to emitMatrixMultiply (NFC).
The Chained in the name potentially leads to confusion. Also updated the
comment to drop the unnecessary mention of tile-sized.
2020-03-30 11:17:25 +01:00
Florian Hahn be86bc76f0 [Matrix] Generalize ColumnMatrixTy to MatrixTy (NFC).
This patch sets the stage for supporting both row and column major
layouts for matrixes. It renames ColumnMatrixTy to MatrixTy, adds
booleans indicating the underlying layout to both MatrixTy and ShapeInfo
and generalizes the methods of MatrixTy to support both row and column
major layouts.

Reviewers: Gerolf, anemet, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D76324
2020-03-20 08:32:13 +00:00
Benjamin Kramer 1db8b341a6 [Matrix] Fold single-use variable into assert
Avoids -Wunused-variable warnings in Release builds.
2020-03-19 21:42:22 +01:00
Florian Hahn 796fb2e474 [Matrix] Move multiply-add code generation into separate function (NFC).
This logic can be shared with the tiled code generation.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75565
2020-03-19 20:26:19 +00:00
Florian Hahn 0cc2d23751 [Matrix] Hoist load/store generation logic, add helpers for tiled access.
This patch slightly generalizes the code to emit loads and stores of a
matrix and adds helpers to load/store a tile of a larger matrix.

This will be used in a follow-up patch introducing initial tiling.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75564
2020-03-19 19:28:21 +00:00
Florian Hahn bc6c8c4bbb [Matrix] Add remark propagation along the inlined-at chain.
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.

To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.

With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.

    load.h:
    template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
      Matrix<Ty, R, C> Result;
      Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
      return Result;
    }

    store.h:
    template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
       *reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
    }

    toplevel.cpp
    void test(double *A, double *B, double *C) {
      store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
    }

For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.

Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D73600
2020-03-11 17:40:08 +00:00
Nicolai Hähnle 58297e4d8f LowerMatrixIntrinsics: Avoid use of deprecated CreateCall methods
Reviewers: t.p.northover

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74675
2020-02-18 00:24:09 +01:00
Nikita Popov 7c362b25d7 [IRBuilder] Fix unnecessary IRBuilder copies; NFC
Fix a few cases where an IRBuilder is passed to a helper function
by value, while a by reference pass was intended.
2020-02-16 17:57:18 +01:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Florian Hahn 5d0ffbeb4d [Matrix] Mark expressions shared between multiple remarks.
This patch adds support for explicitly highlighting sub-expressions
shared by multiple leaf nodes. For example consider the following
code

  %shared.load = tail call <8 x double> @llvm.matrix.columnwise.load.v8f64.p0f64(double* %arg1, i32 %stride, i32 2, i32 4), !dbg !10, !noalias !10
  %trans = tail call <8 x double> @llvm.matrix.transpose.v8f64(<8 x double> %shared.load, i32 2, i32 4), !dbg !10
  tail call void @llvm.matrix.columnwise.store.v8f64.p0f64(<8 x double> %trans, double* %arg3, i32 10, i32 4, i32 2), !dbg !10
  %load.2 = tail call <30 x double> @llvm.matrix.columnwise.load.v30f64.p0f64(double* %arg3, i32 %stride, i32 2, i32 15), !dbg !10, !noalias !10
  %mult = tail call <60 x double> @llvm.matrix.multiply.v60f64.v8f64.v30f64(<8 x double> %trans, <30 x double> %load.2, i32 4, i32 2, i32 15), !dbg !11
  tail call void @llvm.matrix.columnwise.store.v60f64.p0f64(<60 x double> %mult, double* %arg2, i32 10, i32 4, i32 15), !dbg !11

We have two leaf nodes (the 2 stores) and the first store stores %trans
which is also used by the matrix multiply %mult. We generate separate
remarks for each leaf (stores). To denote that parts are shared, the
shared expressions are marked as shared (), with a reference to the
other remark that shares it. The operation summary also denotes the
shared operations separately.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D72526
2020-01-28 09:27:55 -08:00
Florian Hahn 62e228f8fd [Matrix] Add info about number of operations to remarks.
This patch updates the remark to also include a summary of the number of
vector operations generated for each matrix expression.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D72480
2020-01-27 17:43:39 -08:00
Florian Hahn 949294f396 [Matrix] Add optimization remarks for matrix expression.
Generate remarks for matrix operations in a function. To generate remarks
for matrix expressions, the following approach is used:
1. Collect leafs of matrix expressions (done in
   RemarkGenerator::getExpressionLeafs).  Leafs are lowered matrix
   instructions without other matrix users (like stores).

2. For each leaf, create a remark containing a linearizied version of the
   matrix expression.

The following improvements will be submitted as follow-ups:
* Summarize number of vector instructions generated for each expression.
* Account for shared sub-expressions.
* Propagate matrix remarks up the inlining chain.

The information provided by the matrix remarks helps users to spot cases
where matrix expression got split up, e.g. due to inlining not
happening. The remarks allow users to address those issues, ensuring
best performance.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D72453
2020-01-27 16:39:29 -08:00
Guillaume Chatelet 07c9d53266 [Alignment][NFC] Use Align with CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73449
2020-01-27 10:58:36 +01:00
Guillaume Chatelet 59f95222d4 [Alignment][NFC] Use Align with CreateAlignedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73274
2020-01-23 17:34:32 +01:00
Florian Hahn f42994f228 [Matrix] Hide and describe matrix-propagate-shape option. 2020-01-21 14:28:47 -08:00
Florian Hahn ccf24225e3 [Matrix] Update shape propagation to iterate until done.
This patch updates the shape propagation to iterate until no new shape
information is discovered.

As initial seed for the forward propagation, we use the matrix intrinsic
instructions. Both propagateShapeForward and propagateShapeBackward
return new work lists, with the instructions to be used for the next
iteration. When propagating forward, we record all instructions we added
new shape information for. When propagating backward, we record all
users of instructions we added new shape information for.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70901
2020-01-09 10:52:52 +00:00
Florian Hahn 7adf6644f5 [Matrix] Propagate and use shape information for loads.
This patch extends to shape propagation to also include load
instructions and implements shape aware lowering for vector loads.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70900
2020-01-09 10:21:20 +00:00
Florian Hahn 459ad8e97e [Matrix] Implement back-propagation of shape information.
This patch extends the shape propagation for matrix operations to also
propagate the shape of instructions to their operands.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70899
2020-01-09 09:48:07 +00:00
Florian Hahn dc2c9b0fcf [Matrix] Propagate and use shape info for binary operators.
This patch extends the current shape propagation and shape aware
lowering to also support binary operators. Those operators are uniform
with respect to their shape (shape of the input operands is the same as
the shape of their result).

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70898
2019-12-27 15:50:47 +00:00
Florian Hahn 8d6f59b78a [Matrix] Use fmuladd for matrix.multiply if allowed.
If the matrix.multiply calls have the contract fast math flag, we can
use fmuladd. This als adds a command line option to force fmuladd
generation. We can retire this option once there is a clang-level
option.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70951
2019-12-23 14:49:14 +01:00
Florian Hahn 109e4e3851 [Matrix] Add forward shape propagation and first shape aware lowerings.
This patch adds infrastructure for forward shape propagation to
LowerMatrixIntrinsics. It also updates the pass to make use of
the shape information to break up larger vector operations and to
eliminate unnecessary conversion operations between columnwise matrixes
and flattened vectors: if shape information is available for an
instruction, lower the operation to a set of instructions operating on
columns. For example, a store of a matrix is broken down into separate
stores for each column. For users that do not have shape
information (e.g. because they do not yet support shape information
aware lowering), we pack the result columns into a flat vector and
update those users.

It also adds shape aware lowering for the first non-intrinsic
instruction: vector stores.

Example:

For
  %c  = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2)
  store <4 x double> %c, <4 x double>* %Ptr

We generate the code below without shape propagation. Note %9 which
combines the columns of the transposed matrix into a flat vector.

  %split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1>
  %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3>
  %1 = extractelement <2 x double> %split, i64 0
  %2 = insertelement <2 x double> undef, double %1, i64 0
  %3 = extractelement <2 x double> %split1, i64 0
  %4 = insertelement <2 x double> %2, double %3, i64 1
  %5 = extractelement <2 x double> %split, i64 1
  %6 = insertelement <2 x double> undef, double %5, i64 0
  %7 = extractelement <2 x double> %split1, i64 1
  %8 = insertelement <2 x double> %6, double %7, i64 1
  %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  store <4 x double> %9, <4 x double>* %Ptr

With this patch, we propagate the 2x2 shape information from the
transpose to the store and we generate the code below. Note that we
store the columns directly and do not need an extra shuffle.

  %9 = bitcast <4 x double>* %Ptr to double*
  %10 = bitcast double* %9 to <2 x double>*
  store <2 x double> %4, <2 x double>* %10, align 8
  %11 = getelementptr double, double* %9, i32 2
  %12 = bitcast double* %11 to <2 x double>*
  store <2 x double> %8, <2 x double>* %12, align 8

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70897
2019-12-23 13:51:56 +01:00