Commit Graph

109 Commits

Author SHA1 Message Date
River Riddle 9c08540690 Replace usages of Instruction with Operation in the /Analysis directory.
PiperOrigin-RevId: 240569775
2019-03-29 17:44:56 -07:00
River Riddle f9d91531df Replace usages of Instruction with Operation in the /IR directory.
This is step 2/N to renaming Instruction to Operation.

PiperOrigin-RevId: 240459216
2019-03-29 17:43:37 -07:00
Alex Zinenko a7215a9032 Allow creating standalone Regions
Currently, regions can only be constructed by passing in a `Function` or an
`Instruction` pointer referencing the parent object, unlike `Function`s or
`Instruction`s themselves that can be created without a parent.  It leads to a
rather complex flow in operation construction where one has to create the
operation first before being able to work with its regions.  It may be
necessary to work with the regions before the operation is created.  In
particular, in `build` and `parse` functions that are executed _before_ the
operation is created in cases where boilerplate region manipulation is required
(for example, inserting the hypothetical default terminator in affine regions).
Allow creating standalone regions.  Such regions are meant to own a list of
blocks and transfer them to other regions on demand.

Each instruction stores a fixed number of regions as trailing objects and has
ownership of them.  This decreases the size of the Instruction object for the
common case of instructions without regions.  Keep this behavior intact.  To
allow some flexibility in construction, make OperationState store an owning
vector of regions.  When the Builder creates an Instruction from
OperationState, the bodies of the regions are transferred into the
instruction-owned regions to minimize copying.  Thus, it becomes possible to
fill standalone regions with blocks and move them to an operation when it is
constructed, or move blocks from a region to an operation region, e.g., for
inlining.

PiperOrigin-RevId: 240368183
2019-03-29 17:40:59 -07:00
Chris Lattner 46ade282c8 Make FunctionPass::getFunction() return a reference to the function, instead of
a pointer.  This makes it consistent with all the other methods in
FunctionPass, as well as with ModulePass::getModule().  NFC.

PiperOrigin-RevId: 240257910
2019-03-29 17:40:44 -07:00
River Riddle 96ebde9cfd Replace usages of "Op::operator->" with ".".
This is step 2/N of removing the temporary operator-> method as part of the de-const transition.

PiperOrigin-RevId: 240200792
2019-03-29 17:40:09 -07:00
River Riddle af1abcc80b Replace usages of "operator->" with "." for the AffineOps.
Note: The "operator->" method is a temporary helper for the de-const transition and is gradually being phased out.
PiperOrigin-RevId: 240179439
2019-03-29 17:39:19 -07:00
River Riddle 832567b379 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for' and set the namespace of the AffineOps dialect to 'affine'.
PiperOrigin-RevId: 240165792
2019-03-29 17:39:03 -07:00
Chris Lattner d9b5bc8f55 Remove OpPointer, cleaning up a ton of code. This also moves Ops to using
inherited constructors, which is cleaner and means you can now use DimOp()
to get a null op, instead of having to use Instruction::getNull<DimOp>().

This removes another 200 lines of code.

PiperOrigin-RevId: 240068113
2019-03-29 17:36:21 -07:00
Chris Lattner dd2b2ec542 Push a bunch of 'consts' out of the *Op structure, in prep for removing
OpPointer.

PiperOrigin-RevId: 240044712
2019-03-29 17:35:35 -07:00
Chris Lattner 986310a68f Remove const from Value, Instruction, Argument, and the various methods on the
*Op classes.  This is a net reduction by almost 400LOC.

PiperOrigin-RevId: 239972443
2019-03-29 17:34:33 -07:00
Nicolas Vasilache c3b0c6a0dc Cleanups Vectorize and SliceAnalysis - NFC
This CL cleans up and refactors super-vectorization and slice analysis.

PiperOrigin-RevId: 238986866
2019-03-29 17:23:07 -07:00
Alex Zinenko 276fae1b0d Rename BlockList into Region
NFC.  This is step 1/n to specifying regions as parts of any operation.

PiperOrigin-RevId: 238472370
2019-03-29 17:18:04 -07:00
Uday Bondhugula 02af8c22df Change Pass:getFunction() to return pointer instead of ref - NFC
- change this for consistency - everything else similar takes/returns a
  Function pointer - the FuncBuilder ctor,
  Block/Value/Instruction::getFunction(), etc.
- saves a whole bunch of &s everywhere

PiperOrigin-RevId: 236928761
2019-03-29 16:58:35 -07:00
River Riddle f37651c708 NFC. Move all of the remaining operations left in BuiltinOps to StandardOps. The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder.
PiperOrigin-RevId: 236403665
2019-03-29 16:53:35 -07:00
Lei Zhang 85d9b6c8f7 Use consistent names for dialect op source files
This CL changes dialect op source files (.h, .cpp, .td) to follow the following
convention:

  <full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td}

Builtin and standard dialects are specially treated, though. Both of them do
not have dialect namespace; the former is still named as BuiltinOps.* and the
latter is named as Ops.*.

Purely mechanical. NFC.

PiperOrigin-RevId: 236371358
2019-03-29 16:53:19 -07:00
River Riddle ddc6788cc7 Provide a Builder::getNamedAttr and (Instruction|Function)::setAttr(StringRef, Attribute) to simplify attribute manipulation.
PiperOrigin-RevId: 236222504
2019-03-29 16:50:59 -07:00
River Riddle ed5fe2098b Remove PassResult and have the runOnFunction/runOnModule functions return void instead. To signal a pass failure, passes should now invoke the 'signalPassFailure' method. This provides the equivalent functionality when needed, but isn't an intrusive part of the API like PassResult.
PiperOrigin-RevId: 236202029
2019-03-29 16:50:44 -07:00
River Riddle c6c534493d Port all of the existing passes over to the new pass manager infrastructure. This is largely NFC.
PiperOrigin-RevId: 235952357
2019-03-29 16:47:14 -07:00
MLIR Team 8564b274db Internal change
PiperOrigin-RevId: 235191129
2019-03-29 16:38:24 -07:00
River Riddle 3e656599f1 Define a PassID class to use when defining a pass. This allows for the type used for the ID field to be self documenting. It also allows for the compiler to know the set alignment of the ID object, which is useful for storing pointer identifiers within llvm data structures.
PiperOrigin-RevId: 235107957
2019-03-29 16:37:12 -07:00
River Riddle 48ccae2476 NFC: Refactor the files related to passes.
* PassRegistry is split into its own source file.
* Pass related files are moved to a new library 'Pass'.

PiperOrigin-RevId: 234705771
2019-03-29 16:32:56 -07:00
Uday Bondhugula 4ba8c9147d Automated rollback of changelist 232717775.
PiperOrigin-RevId: 232807986
2019-03-29 16:19:33 -07:00
River Riddle 90d10b4e00 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect.
PiperOrigin-RevId: 232717775
2019-03-29 16:17:59 -07:00
River Riddle b499277fb6 Remove remaining usages of OperationInst in lib/Transforms.
PiperOrigin-RevId: 232323671
2019-03-29 16:10:53 -07:00
River Riddle 126ec14e2d Fix the handling of the resizable operands bit of OperationState in a few places.
PiperOrigin-RevId: 232163738
2019-03-29 16:08:28 -07:00
River Riddle 5052bd8582 Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body.
PiperOrigin-RevId: 232060516
2019-03-29 16:06:49 -07:00
River Riddle 9f22a2391b Define an detail::OperandStorage class to handle managing instruction operands. This class stores operands in a similar way to SmallVector except for two key differences. The first is the inline storage, which is a trailing objects array. The second is that being able to dynamically resize the operand list is optional. This means that we can enable the cases where operations need to change the number of operands after construction without losing the spatial locality benefits of the common case (operation instructions / non-control flow instructions with a lifetime fixed number of operands).
PiperOrigin-RevId: 231910497
2019-03-29 16:05:08 -07:00
Nicolas Vasilache d4921f4a96 Address Performance issue in NestedMatcher
A performance issue was reported due to the usage of NestedMatcher in
ComposeAffineMaps. The main culprit was the ubiquitous copies that were
occuring when appending even a single element in `matchOne`.

This CL generally simplifies the implementation and removes one level of indirection by getting rid of
auxiliary storage as well as simplifying the API.
The users of the API are updated accordingly.

The implementation was tested on a heavily unrolled example with
ComposeAffineMaps and is now close in performance with an implementation based
on stateless InstWalker.

As a reminder, the whole ComposeAffineMaps pass is slated to disappear but the bug report was very useful as a stress test for NestedMatchers.

Lastly, the following cleanups reported by @aminim were addressed:
1. make NestedPatternContext scoped within runFunction rather than at the Pass level. This was caused by a previous misunderstanding of Pass lifetime;
2. use defensive assertions in the constructor of NestedPatternContext to make it clear a unique such locally scoped context is allowed to exist.

PiperOrigin-RevId: 231781279
2019-03-29 16:04:07 -07:00
River Riddle 36babbd781 Change the ForInst induction variable to be a block argument of the body instead of the ForInst itself. This is a necessary step in converting ForInst into an operation.
PiperOrigin-RevId: 231064139
2019-03-29 15:40:23 -07:00
Nicolas Vasilache 81c7f2e2f3 Cleanup resource management and rename recursive matchers
This CL follows up on a memory leak issue related to SmallVector growth that
escapes the BumpPtrAllocator.
The fix is to properly use ArrayRef and placement new to define away the
issue.

The following renaming is also applied:
1. MLFunctionMatcher -> NestedPattern
2. MLFunctionMatches -> NestedMatch

As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator.

PiperOrigin-RevId: 231047766
2019-03-29 15:39:53 -07:00
River Riddle 75c21e1de0 Wrap cl::opt flags within passes in a category with the pass name. This improves the help output of tools like mlir-opt.
Example:

dma-generate options:

  -dma-fast-mem-capacity                 - Set fast memory space  ...
  -dma-fast-mem-space=<uint>             - Set fast memory space  ...

loop-fusion options:

  -fusion-compute-tolerance=<number>     - Fractional increase in  ...
  -fusion-maximal                        - Enables maximal loop fusion

loop-tile options:

  -tile-size=<uint>                      - Use this tile size for  ...

loop-unroll options:

  -unroll-factor=<uint>                  - Use this unroll factor  ...
  -unroll-full                           - Fully unroll loops
  -unroll-full-threshold=<uint>          - Unroll all loops with  ...
  -unroll-num-reps=<uint>                - Unroll innermost loops  ...

loop-unroll-jam options:

  -unroll-jam-factor=<uint>              - Use this unroll jam factor ...

PiperOrigin-RevId: 231019363
2019-03-29 15:39:38 -07:00
River Riddle c3424c3c75 Allow operations to hold a blocklist and add support for parsing/printing a block list for verbose printing.
PiperOrigin-RevId: 230951462
2019-03-29 15:37:37 -07:00
River Riddle 451869f394 Add cloning functionality to Block and Function, this also adds support for remapping successor block operands of terminator operations. We define a new BlockAndValueMapping class to simplify mapping between cloned values.
PiperOrigin-RevId: 230768759
2019-03-29 15:34:20 -07:00
River Riddle 6859f33292 Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.
PiperOrigin-RevId: 230605756
2019-03-29 15:33:20 -07:00
Chris Lattner dffc589ad2 Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops.  Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.

This is step 25/n towards merging instructions and statements.

PiperOrigin-RevId: 227243966
2019-03-29 14:46:58 -07:00
Chris Lattner 456ad6a8e0 Standardize naming of statements -> instructions, revisting the code base to be
consistent and moving the using declarations over.  Hopefully this is the last
truly massive patch in this refactoring.

This is step 21/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227178245
2019-03-29 14:44:30 -07:00
Chris Lattner 69d9e990fa Eliminate the using decls for MLFunction and CFGFunction standardizing on
Function.

This is step 18/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227139399
2019-03-29 14:43:13 -07:00
Chris Lattner d798f9bad5 Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(),
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.

This is step 17/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227121537
2019-03-29 14:42:40 -07:00
Chris Lattner 5187cfcf03 Merge Operation into OperationInst and standardize nomenclature around
OperationInst.  This is a big mechanical patch.

This is step 16/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227093712
2019-03-29 14:42:23 -07:00
Chris Lattner 4c05f8cac6 Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new
FuncBuilder class.  Also rename SSAValue.cpp to Value.cpp

This is step 12/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227067644
2019-03-29 14:40:22 -07:00
Chris Lattner 3f190312f8 Merge SSAValue, CFGValue, and MLValue together into a single Value class, which
is the new base of the SSA value hierarchy.  This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate.  This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.

This is step 11/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227064624
2019-03-29 14:40:06 -07:00
Alex Zinenko eb0f9f37af SuperVectorization: fix 'isa' assertion
Supervectorization uses null pointers to SSA values as a means of communicating
the failure to vectorize.  In operation vectorization, all operations producing
the values of operation arguments must be vectorized for the given operation to
be vectorized.  The existing check verified if any of the value "def"
statements was vectorized instead, sometimes leading to assertions inside `isa`
called on a null pointer.  Fix this to check that all "def" statements were
vectorized.

PiperOrigin-RevId: 226941552
2019-03-29 14:37:20 -07:00
Chris Lattner 87ce4cc501 Per review on the previous CL, drop MLFuncBuilder::createOperation, changing
clients to use OperationState instead.  This makes MLFuncBuilder more similiar
to CFGFuncBuilder.  This whole area will get tidied up more when cfg and ml
worlds get unified.  This patch is just gardening, NFC.

PiperOrigin-RevId: 226701959
2019-03-29 14:35:49 -07:00
Alex Zinenko bc52a639f9 Extract vector_transfer_* Ops into a SuperVectorDialect.
From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction.  In particular, they
are lowered to the StandardOps dialect before further processing.  As such, it
does not make sense to keep them at the same level as StandardOps.  Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.

PiperOrigin-RevId: 225554492
2019-03-29 14:28:58 -07:00
Nicolas Vasilache 13bc77045e [MLIR] Drop assert for NYI in Vectorize.cpp
This CLs adds proper error emission, removes NYI assertions and documents
assumptions that are required in the relevant functions.

PiperOrigin-RevId: 224377207
2019-03-29 14:21:37 -07:00
Nicolas Vasilache df0a25efee [MLIR] Add support for permutation_map
This CL hooks up and uses permutation_map in vector_transfer ops.
In particular, when going into the nuts and bolts of the implementation, it
became clear that cases arose that required supporting broadcast semantics.
Broadcast semantics are thus added to the general permutation_map.
The verify methods and tests are updated accordingly.

Examples of interest include.

Example 1:
The following MLIR snippet:
```mlir
   for %i3 = 0 to %M {
     for %i4 = 0 to %N {
       for %i5 = 0 to %P {
         %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32>
   }}}
```
may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into:
```mlir
   for %i3 = 0 to %0 step 32 {
     for %i4 = 0 to %1 {
       for %i5 = 0 to %2 step 256 {
         %4 = vector_transfer_read %arg0, %i4, %i5, %i3
              {permutation_map: (d0, d1, d2) -> (d2, d1)} :
              (memref<?x?x?xf32>, index, index) -> vector<32x256xf32>
   }}}
````
Meaning that vector_transfer_read will be responsible for reading the 2-D slice:
`%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will
require a transposition when vector_transfer_read is further lowered.

Example 2:
The following MLIR snippet:
```mlir
   %cst0 = constant 0 : index
   for %i0 = 0 to %M {
     %a0 = load %A[%cst0, %cst0] : memref<?x?xf32>
   }
```
may vectorize with {permutation_map: (d0) -> (0)} into:
```mlir
   for %i0 = 0 to %0 step 128 {
     %3 = vector_transfer_read %arg0, %c0_0, %c0_0
          {permutation_map: (d0, d1) -> (0)} :
          (memref<?x?xf32>, index, index) -> vector<128xf32>
   }
````
Meaning that vector_transfer_read will be responsible of reading the 0-D slice
`%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector
broadcast when vector_transfer_read is further lowered.

Additionally, some minor cleanups and refactorings are performed.

One notable thing missing here is the composition with a projection map during
materialization. This is because I could not find an AffineMap composition
that operates on AffineMap directly: everything related to composition seems
to require going through SSAValue and only operates on AffinMap at a distance
via AffineValueMap. I have raised this concern a bunch of times already, the
followup CL will actually do something about it.

In the meantime, the projection is hacked at a minimum to pass verification
and materialiation tests are temporarily incorrect.

PiperOrigin-RevId: 224376828
2019-03-29 14:20:07 -07:00
Nicolas Vasilache b39d1f0bdb [MLIR] Add VectorTransferOps
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.

VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.

VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.

Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.

VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.

A vector transfer read has semantics similar to a vector load, with additional
support for:
  1. an optional value of the elemental type of the MemRef. This value
     supports non-effecting padding and is inserted in places where the
     vector read exceeds the MemRef bounds. If the value is not specified,
     the access is statically guaranteed to be within bounds;
  2. an attribute of type AffineMap to specify a slice of the original
     MemRef access and its transposition into the super-vector shape. The
     permutation_map is an unbounded AffineMap that must represent a
     permutation from the MemRef dim space projected onto the vector dim
     space.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
  ...
  %val = `ssa-value` : f32
  // let %i, %j, %k, %l be ssa-values of type index
  %v0 = vector_transfer_read %src, %i, %j, %k, %l
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index) ->
            vector<16x32x64xf32>
  %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index, f32) ->
            vector<16x32x64xf32>
```

VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
  %val = `ssa-value` : vector<16x32x64xf32>
  // let %i, %j, %k, %l be ssa-values of type index
  vector_transfer_write %val, %src, %i, %j, %k, %l
    {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
  (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
2019-03-29 14:15:25 -07:00
Nicolas Vasilache 87d46aaf4b [MLIR][Vectorize] Refactor Vectorize use-def propagation.
This CL refactors a few things in Vectorize.cpp:
1. a clear distinction is made between:
  a. the LoadOp are the roots of vectorization and must be vectorized
  eagerly and propagate their value; and
  b. the StoreOp which are the terminals of vectorization and must be
  vectorized late (i.e. they do not produce values that need to be
  propagated).
2. the StoreOp must be vectorized late because in general it can store a value
that is not reachable from the subset of loads defined in the
current pattern. One trivial such case is storing a constant defined at the
top-level of the MLFunction and that needs to be turned into a splat.
3. a description of the algorithm is given;
4. the implementation matches the algorithm;
5. the last example is made parametric, in practice it will fully rely on the
implementation of vector_transfer_read/write which will handle boundary
conditions and padding. This will happen by lowering to a lower-level
abstraction either:
  a. directly in MLIR (whether DMA or just loops or any async tasks in the
     future) (whiteboxing);
  b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor
  one may want to use;
  c. a partial mix of a. and b. (grey-boxing)
5. minor cleanups are applied;
6. mistakenly disabled unit tests are re-enabled (oopsie).

With this CL, this MLIR snippet:
```
mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> {
  %A = alloc (%M, %N) : memref<?x?xf32>
  %B = alloc (%M, %N) : memref<?x?xf32>
  %C = alloc (%M, %N) : memref<?x?xf32>
  %f1 = constant 1.0 : f32
  %f2 = constant 2.0 : f32
  for %i0 = 0 to %M {
    for %i1 = 0 to %N {
      // non-scoped %f1
      store %f1, %A[%i0, %i1] : memref<?x?xf32>
    }
  }
  for %i4 = 0 to %M {
    for %i5 = 0 to %N {
      %a5 = load %A[%i4, %i5] : memref<?x?xf32>
      %b5 = load %B[%i4, %i5] : memref<?x?xf32>
      %s5 = addf %a5, %b5 : f32
      // non-scoped %f1
      %s6 = addf %s5, %f1 : f32
      store %s6, %C[%i4, %i5] : memref<?x?xf32>
    }
  }
  return %C : memref<?x?xf32>
}
```

vectorized with these arguments:
```
-vectorize -virtual-vector-size 256 --test-fastest-varying=0
```

vectorization produces this standard innermost-loop vectorized code:
```
mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
  %0 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %1 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %2 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %cst = constant 1.000000e+00 : f32
  %cst_0 = constant 2.000000e+00 : f32
  for %i0 = 0 to %arg0 {
    for %i1 = 0 to %arg1 step 256 {
      %cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
      "vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
    }
  }
  for %i2 = 0 to %arg0 {
    for %i3 = 0 to %arg1 step 256 {
      %3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
      %4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
      %5 = addf %3, %4 : vector<256xf32>
      %cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
      %6 = addf %5, %cst_2 : vector<256xf32>
      "vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
    }
  }
  return %2 : memref<?x?xf32>
}
```

Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now.

PiperOrigin-RevId: 222280209
2019-03-29 14:03:50 -07:00
Nicolas Vasilache 89d9913a20 [MLIR][VectorAnalysis] Add a VectorAnalysis and standalone tests
This CL adds some vector support in prevision of the upcoming vector
materialization pass. In particular this CL adds 2 functions to:
1. compute the multiplicity of a subvector shape in a supervector shape;
2. help match operations on strict super-vectors. This is defined for a given
subvector shape as an operation that manipulates a vector type that is an
integral multiple of the subtype, with multiplicity at least 2.

This CL also adds a TestUtil pass where we can dump arbitrary testing of
functions and analysis that operate at a much smaller granularity than a pass
(e.g. an analysis for which it is convenient to write a bit of artificial MLIR
and write some custom test). This is in order to keep using Filecheck for
things that essentially look and feel like C++ unit tests.

PiperOrigin-RevId: 222250910
2019-03-29 14:02:17 -07:00
Nicolas Vasilache fefbf91314 [MLIR] Support for vectorizing operations.
This CL adds support for and a vectorization test to perform scalar 2-D addf.

The support extension notably comprises:
1. extend vectorizable test to exclude vector_transfer operations and
expose them to LoopAnalysis where they are needed. This is a temporary
solution a concrete MLIR Op exists;
2. add some more functional sugar mapKeys, apply and ScopeGuard (which became
relevant again);
3. fix improper shifting during coarsening;
4. rename unaligned load/store to vector_transfer_read/write and simplify the
design removing the unnecessary AllocOp that were introduced prematurely:
vector_transfer_read currently has the form:
  (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32>
vector_transfer_write currently has the form:
  (vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> ()
5. adds vectorizeOperations which traverses the operations in a ForStmt and
rewrites them to their vector form;
6. add support for vector splat from a constant.

The relevant tests are also updated.

PiperOrigin-RevId: 221421426
2019-03-29 13:56:47 -07:00
River Riddle 2fa4bc9fc8 Implement value type abstraction for locations.
Value type abstraction for locations differ from others in that a Location can NOT be null. NOTE: dyn_cast returns an Optional<T>.

PiperOrigin-RevId: 220682078
2019-03-29 13:52:31 -07:00
Jacques Pienaar cc9a6ed09d Initialize Pass with PassID.
The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo.

PiperOrigin-RevId: 220485662
2019-03-29 13:50:34 -07:00
Jacques Pienaar 6f0fb22723 Add static pass registration
Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage.

Change build targets to alwayslink to enforce registration.

PiperOrigin-RevId: 220390178
2019-03-29 13:49:34 -07:00
Nicolas Vasilache 21638dcda9 [MLIR] Extend vectorization to 2+-D patterns
This CL adds support for vectorization using more interesting 2-D and 3-D
patterns. Note in particular the fact that we match some pretty complex
imperfectly nested 2-D patterns with a quite minimal change to the
implementation: we just add a bit of recursion to traverse the matched
patterns and actually vectorize the loops.

For instance, vectorizing the following loop by 128:
```
for %i3 = 0 to %0 {
  %7 = affine_apply (d0) -> (d0)(%i3)
  %8 = load %arg0[%c0_0, %7] : memref<?x?xf32>
}
```

Currently generates:
```
#map0 = ()[s0] -> (s0 + 127)
#map1 = (d0) -> (d0)
for %i3 = 0 to #map0()[%0] step 128 {
  %9 = affine_apply #map1(%i3)
  %10 = alloc() : memref<1xvector<128xf32>>
  %11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) :
    (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) ->
    (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index)
   %12 = load %10[%c0] : memref<1xvector<128xf32>>
}
```

The above is subject to evolution.

PiperOrigin-RevId: 219629745
2019-03-29 13:46:58 -07:00
Uday Bondhugula 8201e19e3d Introduce memref bound checking.
Introduce analysis to check memref accesses (in MLFunctions) for out of bound
ones. It works as follows:

$ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir

/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
      %y = load %B[%idy] : memref<128 x i32>
           ^
/tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
      %y = load %B[%idy] : memref<128 x i32>
           ^
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0 * 128 - d1)
mlfunc @test() {
  %0 = alloc() : memref<9x9xi32>
  %1 = alloc() : memref<128xi32>
  for %i0 = -1 to 9 {
    for %i1 = -1 to 9 {
      %2 = affine_apply #map0(%i0, %i1)
      %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32>
      %4 = affine_apply #map1(%i0, %i1)
      %5 = load %1[%4] : memref<128xi32>
    }
  }
  return
}

- Improves productivity while manually / semi-automatically developing MLIR for
  testing / prototyping; also provides an indirect way to catch errors in
  transformations.

- This pass is an easy way to test the underlying affine analysis
  machinery including low level routines.

Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256.

While on this:

- create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/

- fix a bug in AffineAnalysis.cpp::toAffineExpr

TODO: extend to non-constant loop bounds (straightforward). Will transparently
work for all accesses once floordiv, mod, ceildiv are supported in the
AffineMap -> FlatAffineConstraints conversion.
PiperOrigin-RevId: 219397961
2019-03-29 13:46:08 -07:00
River Riddle 4c465a181d Implement value type abstraction for types.
This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.

PiperOrigin-RevId: 219372163
2019-03-29 13:45:54 -07:00
Nicolas Vasilache af7f56fdf8 [MLIR] Implement 1-D vectorization for fastest varying load/stores
This CL is a first in a series that implements early vectorization of
increasingly complex patterns. In particular, early vectorization will support
arbitrary loop nesting patterns (both perfectly and imperfectly nested), at
arbitrary depths in the loop tree.

This first CL builds the minimal support for applying 1-D patterns.
It relies on an unaligned load/store op abstraction that can be inplemented
differently on different HW.
Future CLs will support higher dimensional patterns, but 1-D patterns already
exhibit interesting properties.
In particular, we want to separate pattern matching (i.e. legality both
structural and dependency analysis based), from profitability analysis, from
application of the transformation.
As a consequence patterns may intersect and we need to verify that a pattern
can still apply by the time we get to applying it.

A non-greedy analysis on profitability that takes into account pattern
intersection is left for future work.

Additionally the CL makes the following cleanups:
1. the matches method now returns a value, not a reference;
2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by
value;
3. added size and empty methods to matches;
4. added a negative vectorization test with a conditional, this exhibited a
but in the iterators. Iterators now return nullptr if the underlying storage
is nullpt.

PiperOrigin-RevId: 219299489
2019-03-29 13:44:26 -07:00
Chris Lattner adbba70d82 Simplify FunctionPass to eliminate the CFGFunctionPass/MLFunctionPass
distinction.  FunctionPasses can now choose to get called on all functions, or
have the driver split CFG/ML Functions up for them.  NFC.

PiperOrigin-RevId: 218775885
2019-03-29 13:40:05 -07:00
Nicolas Vasilache 3013dadb7c [MLIR] Basic infrastructure for vectorization test
This CL implements a very simple loop vectorization **test** and the basic
infrastructure to support it.

The test simply consists in:
1. matching the loops in the MLFunction and all the Load/Store operations
nested under the loop;
2. testing whether all the Load/Store are contiguous along the innermost
memory dimension along that particular loop. If any reference is
non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then
the loop is not-vectorizable.

The simple test above can gradually be extended with more interesting
behaviors to account for the fact that a layout permutation may exist that
enables contiguity etc. All these will come in due time but it is worthwhile
noting that the test already supports detection of outer-vetorizable loops.

In implementing this test, I also added a recursive MLFunctionMatcher and some
sugar that can capture patterns
such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating
on the matched IR structures. For now it just uses in order traversal but
post-order DFS will be useful in the future once IR rewrites start occuring.

One may note that the memory management design decision follows a different
pattern from MLIR. After evaluating different designs and how they quickly
increase cognitive overhead, I decided to opt for the simplest solution in my
view: a class-wide (threadsafe) RAII context.

This way, a pass that needs MLFunctionMatcher can just have its own locally
scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed.
If passes are expected to have a longer lifetime, then the contexts can easily
be scoped inside the runOnMLFunction call and storage lifetime reduced.
Lastly, whatever the scope of threading (module, function, pass), this is
expected to also be future-proof wrt concurrency (but this is a detail atm).

PiperOrigin-RevId: 217622889
2019-03-29 13:32:13 -07:00