278 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Markdown
		
	
	
	
			
		
		
	
	
			278 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Markdown
		
	
	
	
| # Bufferization
 | |
| 
 | |
| [TOC]
 | |
| 
 | |
| ## Overview
 | |
| 
 | |
| Bufferization in MLIR is the process of converting the `tensor` type to the
 | |
| `memref` type. MLIR provides a composable system that allows dialects to
 | |
| systematically bufferize a program. This system is a simple application
 | |
| of MLIR's [dialect conversion](DialectConversion.md) infrastructure. The bulk of
 | |
| the code related to bufferization is a set of ordinary `ConversionPattern`'s
 | |
| that dialect authors write for converting ops that operate on `tensor`'s to ops
 | |
| that operate on `memref`'s. A set of conventions and best practices are followed
 | |
| that allow these patterns to be run across multiple independent passes (rather
 | |
| than requiring a single huge atomic conversion pass), which makes the
 | |
| compilation pipelines scalable, robust, and easy to debug.
 | |
| 
 | |
| This document is targeted at people looking to utilize MLIR's bufferization
 | |
| functionality, along with people who want to extend it to cover their own ops.
 | |
| 
 | |
| <a name="the-talk">**NOTE:**</a> Before reading this document, please watch the
 | |
| talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization
 | |
| Infrastructure"
 | |
| ([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing),
 | |
| [recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)).
 | |
| That talk gives a high-level overview of the bufferization infrastructure and
 | |
| important conceptual details related to using the MLIR dialect conversion
 | |
| infrastructure.
 | |
| 
 | |
| ## Bufferization's place in a compilation pipeline
 | |
| 
 | |
| Bufferization itself does not free any of the buffers that have been allocated,
 | |
| nor does it do anything particularly intelligent with the placement of buffers
 | |
| w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist
 | |
| of:
 | |
| 
 | |
| 1. Bufferization
 | |
| 1. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and
 | |
|    `promote-buffers-to-stack`, which do optimizations that are only exposed
 | |
|    after bufferization.
 | |
| 1. Finally, running the [buffer deallocation](BufferDeallocation.md) pass.
 | |
| 
 | |
| After buffer deallocation has been completed, the program will be quite
 | |
| difficult to transform due to the presence of the deallocation ops. Thus, other
 | |
| optimizations such as linalg fusion on memrefs should be done before that stage.
 | |
| 
 | |
| ## General structure of the bufferization process
 | |
| 
 | |
| Bufferization consists of running multiple _partial_ bufferization passes,
 | |
| followed by one _finalizing_ bufferization pass.
 | |
| 
 | |
| There is typically one partial bufferization pass per dialect (though other
 | |
| subdivisions are possible). For example, for a dialect `X` there will typically
 | |
| be a pass `X-bufferize` that knows how to bufferize all the ops in that dialect.
 | |
| By running pass `X-bufferize` for each dialect `X` in the program, all the ops
 | |
| in the program are incrementally bufferized.
 | |
| 
 | |
| Partial bufferization passes create programs where only some ops have been
 | |
| bufferized. These passes will create _materializations_ (also sometimes called
 | |
| "casts") that convert between the `tensor` and `memref` type, which allows
 | |
| bridging between ops that have been bufferized and ops that have not yet been
 | |
| bufferized.
 | |
| 
 | |
| Finalizing bufferizations complete the bufferization process, and guarantee that
 | |
| there are no tensors remaining in the program. This involves eliminating the
 | |
| materializations. The pass `finalizing-bufferize` provides a minimal pass that
 | |
| only eliminates materializations and issues an error if any unbufferized ops
 | |
| exist in the program.
 | |
| 
 | |
| However, it is possible for a finalizing bufferization to do more than just
 | |
| eliminate materializations. By adding patterns (just as a partial bufferization
 | |
| would), it is possible for a finalizing bufferization pass to simultaneously
 | |
| bufferize ops and eliminate materializations. This has a number of disadvantages
 | |
| discussed in the talk and should generally be avoided.
 | |
| 
 | |
| ### Example
 | |
| 
 | |
| As a concrete example, we will look at the bufferization pipeline from the
 | |
| `mlir-npcomp` reference backend
 | |
| ([code](https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232)).
 | |
| The code, slightly simplified and annotated, is reproduced here:
 | |
| 
 | |
| ```c++
 | |
|   // Partial bufferization passes.
 | |
|   pm.addPass(createTensorConstantBufferizePass());
 | |
|   pm.addNestedPass<FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect.
 | |
|   pm.addNestedPass<FuncOp>(createSCFBufferizePass());
 | |
|   pm.addNestedPass<FuncOp>(createLinalgBufferizePass());
 | |
|   pm.addNestedPass<FuncOp>(createStdBufferizePass());
 | |
|   pm.addNestedPass<FuncOp>(createTensorBufferizePass());
 | |
|   pm.addPass(createFuncBufferizePass());
 | |
| 
 | |
|   // Finalizing bufferization pass.
 | |
|   pm.addNestedPass<FuncOp>(createFinalizingBufferizePass());
 | |
| ```
 | |
| 
 | |
| Looking first at the partial bufferization passes, we see that there are a
 | |
| sequence of `FuncOp` passes (which run in parallel on functions). These function
 | |
| passes are bracketed by `tensor-constant-bufferize` and `func-bufferize`, which
 | |
| are module passes (and thus serialize the parallel compilation process). These
 | |
| two passes must be module passes because they make changes to the top-level
 | |
| module.
 | |
| 
 | |
| The bulk of the bufferization work is done by the function passes. Most of these
 | |
| passes are provided as part of the upstream MLIR distribution and bufferize
 | |
| their respective dialects (e.g. `scf-bufferize` bufferizes the `scf` dialect).
 | |
| The `tcp-bufferize` pass is an exception -- it is a partial bufferization pass
 | |
| used to bufferize the downstream `tcp` dialect, and fits in perfectly with all
 | |
| the other passes provided upstream.
 | |
| 
 | |
| The last pass is the finalizing bufferization pass. The `mlir-npcomp` reference
 | |
| backend has arranged that all ops are bufferized by partial bufferizations, so
 | |
| that the upstream `finalizing-bufferize` pass can be used as the finalizing
 | |
| bufferization pass. This gives excellent diagnostics when something goes wrong
 | |
| with the bufferization process, such as due to an op that wasn't handled by any
 | |
| pattern.
 | |
| 
 | |
| ## How to write a partial bufferization pass
 | |
| 
 | |
| The contract of a partial bufferization pass is that a subset of ops (or kinds
 | |
| of ops, customizable by a ConversionTarget) get bufferized.
 | |
| 
 | |
| A partial bufferization pass is just a pass that uses the
 | |
| [dialect conversion](DialectConversion.md) framework to apply
 | |
| `ConversionPattern`s with a `tensor` to `memref` type conversion.
 | |
| 
 | |
| To describe how to write such a pass, we will walk through an example, the
 | |
| `tensor-bufferize` pass
 | |
| ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23),
 | |
| [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1))
 | |
| that bufferizes the `tensor` dialect.
 | |
| 
 | |
| The bulk of the code in the pass will be a set of conversion patterns, with a
 | |
| simple example being
 | |
| [BufferizeCastOp](https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23)).
 | |
| 
 | |
| ```
 | |
| class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> {
 | |
| public:
 | |
|   using OpConversionPattern::OpConversionPattern;
 | |
|   LogicalResult
 | |
|   matchAndRewrite(tensor::CastOp op, ArrayRef<Value> operands,
 | |
|                   ConversionPatternRewriter &rewriter) const override {
 | |
|     auto resultType = getTypeConverter()->convertType(op.getType());
 | |
|     rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, operands[0]);
 | |
|     return success();
 | |
|   }
 | |
| };
 | |
| ```
 | |
| 
 | |
| See [the talk](#the-talk) for more details on how to write these patterns.
 | |
| 
 | |
| The
 | |
| [pass itself](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57)
 | |
| is very small, and follows the basic pattern of any dialect conversion pass.
 | |
| 
 | |
| ```
 | |
| void mlir::populateTensorBufferizePatterns(
 | |
|     BufferizeTypeConverter &typeConverter, RewritePatternSet &patterns) {
 | |
|   patterns.add<BufferizeCastOp, BufferizeExtractOp>(typeConverter,
 | |
|                                                     patterns.getContext());
 | |
| }
 | |
| 
 | |
| struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> {
 | |
|   void runOnFunction() override {
 | |
|     auto *context = &getContext();
 | |
|     BufferizeTypeConverter typeConverter;
 | |
|     RewritePatternSet patterns(context);
 | |
|     ConversionTarget target(*context);
 | |
| 
 | |
|     populateTensorBufferizePatterns(typeConverter, patterns);
 | |
|     target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>();
 | |
|     target.addLegalDialect<StandardOpsDialect>();
 | |
| 
 | |
|     if (failed(
 | |
|             applyPartialConversion(getFunction(), target, std::move(patterns))))
 | |
|       signalPassFailure();
 | |
|   }
 | |
| };
 | |
| ```
 | |
| 
 | |
| The pass has all the hallmarks of a dialect conversion pass that does type
 | |
| conversions: a `TypeConverter`, a `RewritePatternSet`, and a
 | |
| `ConversionTarget`, and a call to `applyPartialConversion`. Note that a function
 | |
| `populateTensorBufferizePatterns` is separated, so that power users can use the
 | |
| patterns independently, if necessary (such as to combine multiple sets of
 | |
| conversion patterns into a single conversion call, for performance).
 | |
| 
 | |
| One convenient utility provided by the MLIR bufferization infrastructure is the
 | |
| `BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions
 | |
| and materializations between `tensor` and `memref`.
 | |
| 
 | |
| In this case, the `MemRefOpsDialect` is marked as legal, so the `tensor_load`
 | |
| and `buffer_cast` ops, which are inserted automatically by the dialect
 | |
| conversion framework as materializations, are legal. There is a helper
 | |
| `populateBufferizeMaterializationLegality`
 | |
| ([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53))
 | |
| which helps with this in general.
 | |
| 
 | |
| ### Other partial bufferization examples
 | |
| 
 | |
| - `linalg-bufferize`
 | |
|   ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L1),
 | |
|   [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Linalg/bufferize.mlir#L1))
 | |
| 
 | |
|   - Bufferizes the `linalg` dialect.
 | |
|   - This is an example of how to simultaneously bufferize all the ops that
 | |
|     satisfy a certain OpInterface with a single pattern. Specifically,
 | |
|     `BufferizeAnyLinalgOp`
 | |
|     ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L170))
 | |
|     bufferizes any ops that implements the `LinalgOp` interface.
 | |
| 
 | |
| - `scf-bufferize`
 | |
|   ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/SCF/Transforms/Bufferize.cpp#L1),
 | |
|   [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/SCF/bufferize.mlir#L1))
 | |
| 
 | |
|   - Bufferizes ops from the `scf` dialect.
 | |
|   - This is an example of how to bufferize ops that implement
 | |
|     `RegionBranchOpInterface` (that is, they use regions to represent control
 | |
|     flow).
 | |
|   - The bulk of the work is done by
 | |
|     `lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp`
 | |
|     ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L1)),
 | |
|     which is well-commented and covers how to correctly convert ops that contain
 | |
|     regions.
 | |
| 
 | |
| - `func-bufferize`
 | |
|   ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp#L1),
 | |
|   [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/func-bufferize.mlir#L1))
 | |
| 
 | |
|   - Bufferizes `func`, `call`, and `BranchOpInterface` ops.
 | |
|   - This is an example of how to bufferize ops that have multi-block regions.
 | |
|   - This is an example of a pass that is not split along dialect subdivisions.
 | |
| 
 | |
| - `tensor-constant-bufferize`
 | |
|   ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L1),
 | |
|   [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir#L1))
 | |
|   - Bufferizes only `std.constant` ops of `tensor` type.
 | |
|   - This is an example of setting up the legality so that only a subset of
 | |
|     `std.constant` ops get bufferized.
 | |
|   - This is an example of a pass that is not split along dialect subdivisions.
 | |
| 
 | |
| ## How to write a finalizing bufferization pass
 | |
| 
 | |
| The contract of a finalizing bufferization pass is that all tensors are gone
 | |
| from the program.
 | |
| 
 | |
| The easiest way to write a finalizing bufferize pass is to not write one at all!
 | |
| MLIR provides a pass `finalizing-bufferize` which eliminates the `tensor_load` /
 | |
| `buffer_cast` materialization ops inserted by partial bufferization passes
 | |
| and emits an error if that is not sufficient to remove all tensors from the
 | |
| program.
 | |
| 
 | |
| This pass is sufficient when partial bufferization passes have bufferized all
 | |
| the ops in the program, leaving behind only the materializations. When possible,
 | |
| it is recommended to structure your pass pipeline this way, as this has the
 | |
| significant advantage that if an op does not get bufferized (due to a missing
 | |
| pattern, bug in the code, etc.), `finalizing-bufferize` will emit a nice clean
 | |
| error, and the IR seen by `finalizing-bufferize` will only contain only one
 | |
| unbufferized op.
 | |
| 
 | |
| However, before the current bufferization infrastructure was put in place,
 | |
| bufferization could only be done as a single finalizing bufferization
 | |
| mega-pass that used the `populate*BufferizePatterns` functions from multiple
 | |
| dialects to simultaneously bufferize everything at once. Thus, one might see
 | |
| code in downstream projects structured this way. This structure is not
 | |
| recommended in new code. A helper,
 | |
| `populateEliminateBufferizeMaterializationsPatterns`
 | |
| ([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58))
 | |
| is available for such passes to provide patterns that eliminate `tensor_load`
 | |
| and `buffer_cast`.
 | |
| 
 | |
| ## Changes since [the talk](#the-talk)
 | |
| 
 | |
| - `func-bufferize` was changed to be a partial conversion pass, and there is a
 | |
|   new `finalizing-bufferize` which serves as a general finalizing bufferization
 | |
|   pass.
 |