Commit Graph

140 Commits

Author SHA1 Message Date
MLIR Team 0cd589c337 Create a LoopUtil function to return perfectly nested loop set
--

PiperOrigin-RevId: 242019230
2019-04-05 07:42:01 -07:00
Andy Davis 7c1fc9e795 Enable producer-consumer fusion for liveout memrefs if consumer read region matches producer write region.
--

PiperOrigin-RevId: 241517207
2019-04-02 13:39:50 -07:00
MLIR Team 9d30b36aaf Enable input-reuse fusion to search function arguments for fusion candidates (takes care of a TODO, enables another tutorial test case).
PiperOrigin-RevId: 240979894
2019-03-29 17:54:36 -07:00
MLIR Team 9d9675fc8f Remove overly conservative check in LoopFusion pass (enables fusion in tutorial example).
PiperOrigin-RevId: 240859227
2019-03-29 17:51:16 -07:00
River Riddle 99b87c9707 Replace usages of Instruction with Operation in the Transforms/ directory.
PiperOrigin-RevId: 240636130
2019-03-29 17:47:26 -07:00
River Riddle 9c08540690 Replace usages of Instruction with Operation in the /Analysis directory.
PiperOrigin-RevId: 240569775
2019-03-29 17:44:56 -07:00
Alex Zinenko 5a5bba0279 Introduce affine terminator
Due to legacy reasons (ML/CFG function separation), regions in affine control
flow operations require contained blocks not to have terminators.  This is
inconsistent with the notion of the block and may complicate code motion
between regions of affine control operations and other regions.

Introduce `affine.terminator`, a special terminator operation that must be used
to terminate blocks inside affine operations and transfers the control back to
he region enclosing the affine operation.  For brevity and readability reasons,
allow `affine.for` and `affine.if` to omit the `affine.terminator` in their
regions when using custom printing and parsing format.  The custom parser
injects the `affine.terminator` if it is missing so as to always have it
present in constructed operations.

Update transformations to account for the presence of terminator.  In
particular, most code motion transformation between loops should leave the
terminator in place, and code motion between loops and non-affine blocks should
drop the terminator.

PiperOrigin-RevId: 240536998
2019-03-29 17:44:24 -07:00
River Riddle f9d91531df Replace usages of Instruction with Operation in the /IR directory.
This is step 2/N to renaming Instruction to Operation.

PiperOrigin-RevId: 240459216
2019-03-29 17:43:37 -07:00
Chris Lattner 46ade282c8 Make FunctionPass::getFunction() return a reference to the function, instead of
a pointer.  This makes it consistent with all the other methods in
FunctionPass, as well as with ModulePass::getModule().  NFC.

PiperOrigin-RevId: 240257910
2019-03-29 17:40:44 -07:00
River Riddle 96ebde9cfd Replace usages of "Op::operator->" with ".".
This is step 2/N of removing the temporary operator-> method as part of the de-const transition.

PiperOrigin-RevId: 240200792
2019-03-29 17:40:09 -07:00
River Riddle af1abcc80b Replace usages of "operator->" with "." for the AffineOps.
Note: The "operator->" method is a temporary helper for the de-const transition and is gradually being phased out.
PiperOrigin-RevId: 240179439
2019-03-29 17:39:19 -07:00
River Riddle 832567b379 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for' and set the namespace of the AffineOps dialect to 'affine'.
PiperOrigin-RevId: 240165792
2019-03-29 17:39:03 -07:00
Chris Lattner d9b5bc8f55 Remove OpPointer, cleaning up a ton of code. This also moves Ops to using
inherited constructors, which is cleaner and means you can now use DimOp()
to get a null op, instead of having to use Instruction::getNull<DimOp>().

This removes another 200 lines of code.

PiperOrigin-RevId: 240068113
2019-03-29 17:36:21 -07:00
Chris Lattner 986310a68f Remove const from Value, Instruction, Argument, and the various methods on the
*Op classes.  This is a net reduction by almost 400LOC.

PiperOrigin-RevId: 239972443
2019-03-29 17:34:33 -07:00
Jacques Pienaar 57270a9a99 Remove some statements that required >C++11, add includes and qualify names. NFC.
PiperOrigin-RevId: 239197784
2019-03-29 17:24:53 -07:00
Alex Zinenko 276fae1b0d Rename BlockList into Region
NFC.  This is step 1/n to specifying regions as parts of any operation.

PiperOrigin-RevId: 238472370
2019-03-29 17:18:04 -07:00
Uday Bondhugula a228b7d477 Change getMemoryFootprintBytes emitError to a warning
- this is really not a hard error; emit a warning instead (for inability to compute
  footprint due to the union failing due to unimplemented cases)
- remove a misleading warning from LoopFusion.cpp

PiperOrigin-RevId: 238118711
2019-03-29 17:16:12 -07:00
Uday Bondhugula ce7e59536c Add a basic model to set tile sizes + some cleanup
- compute tile sizes based on a simple model that looks at memory footprints
  (instead of using the hardcoded default value)
- adjust tile sizes to make them factors of trip counts based on an option
- update loop fusion CL options to allow setting maximal fusion at pass creation
- change an emitError to emitWarning (since it's not a hard error unless the client
  treats it that way, in which case, it can emit one)

$ mlir-opt -debug-only=loop-tile -loop-tile test/Transforms/loop-tiling.mlir

test/Transforms/loop-tiling.mlir:81:3: note: using tile sizes [4 4 5 ]

  for %i = 0 to 256 {

for %i0 = 0 to 256 step 4 {
    for %i1 = 0 to 256 step 4 {
      for %i2 = 0 to 250 step 5 {
        for %i3 = #map4(%i0) to #map11(%i0) {
          for %i4 = #map4(%i1) to #map11(%i1) {
            for %i5 = #map4(%i2) to #map12(%i2) {
              %0 = load %arg0[%i3, %i5] : memref<8x8xvector<64xf32>>
              %1 = load %arg1[%i5, %i4] : memref<8x8xvector<64xf32>>
              %2 = load %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
              %3 = mulf %0, %1 : vector<64xf32>
              %4 = addf %2, %3 : vector<64xf32>
              store %4, %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
            }
          }
        }
      }
    }
  }

PiperOrigin-RevId: 237461836
2019-03-29 17:06:51 -07:00
River Riddle 1e55ae19a0 Convert ambiguous bool returns in /Analysis to use Status instead.
PiperOrigin-RevId: 237390240
2019-03-29 17:06:21 -07:00
MLIR Team c1ff9e866e Use FlatAffineConstraints::unionBoundingBox to perform slice bounds union for loop fusion pass (WIP).
Adds utility to convert slice bounds to a FlatAffineConstraints representation.
Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers.

PiperOrigin-RevId: 236973261
2019-03-29 16:59:21 -07:00
Uday Bondhugula 02af8c22df Change Pass:getFunction() to return pointer instead of ref - NFC
- change this for consistency - everything else similar takes/returns a
  Function pointer - the FuncBuilder ctor,
  Block/Value/Instruction::getFunction(), etc.
- saves a whole bunch of &s everywhere

PiperOrigin-RevId: 236928761
2019-03-29 16:58:35 -07:00
MLIR Team d42ef78a75 Handle MemRefRegion::compute return value in loop fusion pass (NFC).
PiperOrigin-RevId: 236685849
2019-03-29 16:55:20 -07:00
Uday Bondhugula eee85361bb Remove hidden flag from fusion CL options
PiperOrigin-RevId: 236409185
2019-03-29 16:54:05 -07:00
River Riddle f37651c708 NFC. Move all of the remaining operations left in BuiltinOps to StandardOps. The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder.
PiperOrigin-RevId: 236403665
2019-03-29 16:53:35 -07:00
Lei Zhang 85d9b6c8f7 Use consistent names for dialect op source files
This CL changes dialect op source files (.h, .cpp, .td) to follow the following
convention:

  <full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td}

Builtin and standard dialects are specially treated, though. Both of them do
not have dialect namespace; the former is still named as BuiltinOps.* and the
latter is named as Ops.*.

Purely mechanical. NFC.

PiperOrigin-RevId: 236371358
2019-03-29 16:53:19 -07:00
MLIR Team d038e34735 Loop fusion for input reuse.
*) Breaks fusion pass into multiple sub passes over nodes in data dependence graph:
- first pass fuses single-use producers into their unique consumer.
- second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges.
- third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer).
*) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved.
*) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation.
*) Adds a generic loop utilitiy for finding all sequential loops in a loop nest.
*) Adds and updates unit tests.

PiperOrigin-RevId: 236350987
2019-03-29 16:52:35 -07:00
River Riddle ed5fe2098b Remove PassResult and have the runOnFunction/runOnModule functions return void instead. To signal a pass failure, passes should now invoke the 'signalPassFailure' method. This provides the equivalent functionality when needed, but isn't an intrusive part of the API like PassResult.
PiperOrigin-RevId: 236202029
2019-03-29 16:50:44 -07:00
River Riddle c6c534493d Port all of the existing passes over to the new pass manager infrastructure. This is largely NFC.
PiperOrigin-RevId: 235952357
2019-03-29 16:47:14 -07:00
Uday Bondhugula 7aa60a383f Temp change in FlatAffineConstraints::getSliceBounds() to deal with TODO in
LoopFusion

- getConstDifference in LoopFusion is pending a refactoring to handle bounds
  with min's and max's; it currently asserts on some useful test cases that we
  want to experiment with. This CL changes getSliceBounds to be more
  conservative so as to not trigger the assertion. Filed b/126426796 to track this.

PiperOrigin-RevId: 235826538
2019-03-29 16:45:23 -07:00
Uday Bondhugula d4b3ff1096 Loop fusion comand line options cleanup
- clean up loop fusion CL options for promoting local buffers to fast memory
  space
- add parameters to loop fusion pass instantiation

PiperOrigin-RevId: 235813419
2019-03-29 16:44:38 -07:00
Uday Bondhugula dfe07b7bf6 Refactor AffineExprFlattener and move FlatAffineConstraints out of IR into
Analysis - NFC

- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
  doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
  could be moved out of IR/; the simplification that the IR needs for
  AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
  all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
  class

- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor

PiperOrigin-RevId: 235283610
2019-03-29 16:39:32 -07:00
MLIR Team 8564b274db Internal change
PiperOrigin-RevId: 235191129
2019-03-29 16:38:24 -07:00
River Riddle 3e656599f1 Define a PassID class to use when defining a pass. This allows for the type used for the ID field to be self documenting. It also allows for the compiler to know the set alignment of the ID object, which is useful for storing pointer identifiers within llvm data structures.
PiperOrigin-RevId: 235107957
2019-03-29 16:37:12 -07:00
Uday Bondhugula a1dad3a5d9 Extend/improve getSliceBounds() / complete TODO + update unionBoundingBox
- compute slices precisely where the destination iteration depends on multiple source
  iterations (instead of over-approximating to the whole source loop extent)
- update unionBoundingBox to deal with input with non-matching symbols
- reenable disabled backend test case

PiperOrigin-RevId: 234714069
2019-03-29 16:33:11 -07:00
River Riddle 48ccae2476 NFC: Refactor the files related to passes.
* PassRegistry is split into its own source file.
* Pass related files are moved to a new library 'Pass'.

PiperOrigin-RevId: 234705771
2019-03-29 16:32:56 -07:00
MLIR Team 58aa383e60 Support fusing producer loop nests which write to a memref which is live out, provided that the write region of the consumer loop nest to the same memref is a super set of the producer's write region.
PiperOrigin-RevId: 234240958
2019-03-29 16:30:11 -07:00
MLIR Team 8f5f2c765d LoopFusion: perform a series of loop interchanges to increase the loop depth at which slices of producer loop nests can be fused into constumer loop nests.
*) Adds utility to LoopUtils to perform loop interchange of two AffineForOps.
*) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges.
*) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential.
*) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them.
*) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation.
*) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest.
*) Adds and updates related unit tests.

PiperOrigin-RevId: 234158370
2019-03-29 16:29:26 -07:00
Uday Bondhugula 4ba8c9147d Automated rollback of changelist 232717775.
PiperOrigin-RevId: 232807986
2019-03-29 16:19:33 -07:00
River Riddle 90d10b4e00 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect.
PiperOrigin-RevId: 232717775
2019-03-29 16:17:59 -07:00
MLIR Team b9dde91ea6 Adds the ability to compute the MemRefRegion of a sliced loop nest. Utilizes this feature during loop fusion cost computation, to compute what the write region of a fusion candidate loop nest slice would be (without having to materialize the slice or change the IR).
*) Adds parameter to public API of MemRefRegion::compute for passing in the slice loop bounds to compute the memref region of the loop nest slice.
*) Exposes public method MemRefRegion::getRegionSize for computing the size of the memref region in bytes.

PiperOrigin-RevId: 232706165
2019-03-29 16:17:15 -07:00
River Riddle 10237de8eb Refactor the affine analysis by moving some functionality to IR and some to AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring:
* AffineStructures has moved to IR.

* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.

* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.

* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.

PiperOrigin-RevId: 232586468
2019-03-29 16:15:41 -07:00
MLIR Team a78edcda5b Loop fusion improvements:
*) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion.
*) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check.

PiperOrigin-RevId: 232477853
2019-03-29 16:13:50 -07:00
River Riddle bf9c381d1d Remove InstWalker and move all instruction walking to the api facilities on Function/Block/Instruction.
PiperOrigin-RevId: 232388113
2019-03-29 16:12:59 -07:00
Uday Bondhugula 0f50414fa4 Refactor common code getting memref access in getMemRefRegion - NFC
- use getAccessMap() instead of repeating it
- fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap
  allocation and unique_ptr where possible)

- change extractForInductionVars - MutableArrayRef -> ArrayRef for the
  arguments. Since the method is just returning copies of 'Value *', the client
  can't mutate the pointers themselves; it's fine to mutate the 'Value''s
  themselves, but that doesn't mutate the pointers to those.

- change the way extractForInductionVars returns (see b/123437690)

PiperOrigin-RevId: 232359277
2019-03-29 16:12:25 -07:00
River Riddle b499277fb6 Remove remaining usages of OperationInst in lib/Transforms.
PiperOrigin-RevId: 232323671
2019-03-29 16:10:53 -07:00
River Riddle a3d9ccaecb Replace the walkOps/visitOperationInst variants from the InstWalkers with the Instruction variants.
PiperOrigin-RevId: 232322030
2019-03-29 16:10:24 -07:00
Uday Bondhugula b26900dce5 Update dma-generate pass to (1) work on blocks of instructions (instead of just
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors

- change dma-generate pass to work on blocks of instructions (start/end
  iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
  straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
  in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
  generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
  capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
  option (besides command line flag)

- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
  replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods

Eg. output

$ mlir-opt  -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
            ^

$ mlir-opt -debug-only=dma-generate  -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {

PiperOrigin-RevId: 232297044
2019-03-29 16:09:52 -07:00
Uday Bondhugula 8be2627436 Promote local buffers created post fusion to higher memory space
- fusion already includes the necessary analysis to create small/local buffers
  post fusion; allocate these buffers in a higher memory space if the necessary
  pass parameters are provided (threshold size, memory space id)

- although there will be a separate utility at some point to directly detect
  and promote small local buffers to higher memory spaces, doing it while fusion
  when possible is much less expensive, comes free with fusion analysis, and covers
  a key common case.

PiperOrigin-RevId: 232063894
2019-03-29 16:07:23 -07:00
River Riddle 5052bd8582 Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body.
PiperOrigin-RevId: 232060516
2019-03-29 16:06:49 -07:00
MLIR Team 1e85191d07 Fix ASAN issue: snapshot edge list before loop which can modify this list.
PiperOrigin-RevId: 231686040
2019-03-29 16:03:38 -07:00
MLIR Team d7c824451f LoopFusion: insert the source loop nest slice at a depth in the destination loop nest which preserves dependences (above any loop carried or other dependences). This is accomplished by updating the maximum destination loop depth based on dependence checks between source loop nest loads and stores which access the memref on which the source loop nest has a store op. In addition, prevent fusing in source loop nests which write to memrefs which escape or are live out.
PiperOrigin-RevId: 231684492
2019-03-29 16:03:23 -07:00
MLIR Team a0f3db4024 Support fusing loop nests which require insertion into a new instruction Block position while preserving dependences, opening up additional fusion opportunities.
- Adds SSA Value edges to the data dependence graph used in the loop fusion pass.

PiperOrigin-RevId: 231417649
2019-03-29 16:00:04 -07:00
River Riddle 755538328b Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231342063
2019-03-29 15:59:30 -07:00
Nicolas Vasilache ae772b7965 Automated rollback of changelist 231318632.
PiperOrigin-RevId: 231327161
2019-03-29 15:42:38 -07:00
River Riddle 5ecef2b3f6 Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231318632
2019-03-29 15:42:08 -07:00
Nicolas Vasilache 0e7a8a9027 Drop AffineMap::Null and IntegerSet::Null
Addresses b/122486036

This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing
the Null method and cleaning up the constructors.

As the ::Null uses were tracked down, opportunities appeared to untangle some
of the Parsing logic and make it explicit where AffineMap/IntegerSet have
ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit
pointer values of AffineMap* and IntegerSet* that were passed as function
parameters. Depending the values of those pointers one of 3 behaviors could
occur.

This parsing logic convolution is one of the rare cases where I would advocate
for code duplication. The more proper fix would be to make the syntax
unambiguous or to allow some lookahead.

PiperOrigin-RevId: 231058512
2019-03-29 15:40:08 -07:00
River Riddle 75c21e1de0 Wrap cl::opt flags within passes in a category with the pass name. This improves the help output of tools like mlir-opt.
Example:

dma-generate options:

  -dma-fast-mem-capacity                 - Set fast memory space  ...
  -dma-fast-mem-space=<uint>             - Set fast memory space  ...

loop-fusion options:

  -fusion-compute-tolerance=<number>     - Fractional increase in  ...
  -fusion-maximal                        - Enables maximal loop fusion

loop-tile options:

  -tile-size=<uint>                      - Use this tile size for  ...

loop-unroll options:

  -unroll-factor=<uint>                  - Use this unroll factor  ...
  -unroll-full                           - Fully unroll loops
  -unroll-full-threshold=<uint>          - Unroll all loops with  ...
  -unroll-num-reps=<uint>                - Unroll innermost loops  ...

loop-unroll-jam options:

  -unroll-jam-factor=<uint>              - Use this unroll jam factor ...

PiperOrigin-RevId: 231019363
2019-03-29 15:39:38 -07:00
Uday Bondhugula b4a1443508 Update replaceAllMemRefUsesWith to generate single result affine_apply's for
index remapping
- generate a sequence of single result affine_apply's for the index remapping
  (instead of one multi result affine_apply)
- update dma-generate and loop-fusion test cases; while on this, change test cases
  to use single result affine apply ops
- some fusion comment fix/cleanup

PiperOrigin-RevId: 230985830
2019-03-29 15:38:23 -07:00
MLIR Team 5c5739d42b Change the dependence check in the loop fusion pass to use the MLIR instruction list ordering (instead of the dependence graph node id ordering). This breaks the overloading of dependence graph node ids as both edge endpoints and instruction list position.
PiperOrigin-RevId: 230849232
2019-03-29 15:35:53 -07:00
Uday Bondhugula 06d21d9f64 loop-fusion: debug info cleanup
PiperOrigin-RevId: 230817383
2019-03-29 15:35:08 -07:00
River Riddle 6859f33292 Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.
PiperOrigin-RevId: 230605756
2019-03-29 15:33:20 -07:00
MLIR Team b28009b681 Fix single producer check in loop fusion pass.
PiperOrigin-RevId: 230565482
2019-03-29 15:32:20 -07:00
Uday Bondhugula 864d9e02a1 Update fusion cost model + some additional infrastructure and debug information for -loop-fusion
- update fusion cost model to fuse while tolerating a certain amount of redundant
  computation; add cl option -fusion-compute-tolerance
  evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC

PiperOrigin-RevId: 230541857
2019-03-29 15:32:06 -07:00
Uday Bondhugula 94a03f864f Allocate private/local buffers for slices accurately during fusion
- the size of the private memref created for the slice should be based on
  the memref region accessed at the depth at which the slice is being
  materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
  to the region accessed based on the entire domain.

- leads to a significant contraction of the temporary / intermediate memref
  whenever the memref isn't reduced to a single scalar (through store fwd'ing).

Other changes

- update to promoteIfSingleIteration - avoid introducing unnecessary identity
  map affine_apply from IV; makes it much easier to write and read test cases
  and pass output for all passes that use promoteIfSingleIteration; loop-fusion
  test cases become much simpler

- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
  'domInstFilter' could be one of the ops erased due to a memref replacement in
  it.

- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
  missing (the latter need not always be 1); add lbFloorDivisors output argument

- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape

PiperOrigin-RevId: 230405218
2019-03-29 15:30:31 -07:00
MLIR Team 71495d58a7 Handle escaping memrefs in loop fusion pass:
*) Do not remove loop nests which write to memrefs which escape the function.
*) Do not remove memrefs which escape the function (e.g. are used in the return instruction).

PiperOrigin-RevId: 230398630
2019-03-29 15:30:14 -07:00
Lei Zhang 1e484b5ef4 Mark (void)indexRemap to please compiler for unused variable check
PiperOrigin-RevId: 229957023
2019-03-29 15:26:59 -07:00
MLIR Team c4237ae990 LoopFusion: Creates private MemRefs which are used only by operations in the fused loop.
*) Enables reduction of private memref size based on MemRef region accessed by fused slice.
*) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence.
*) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice).

PiperOrigin-RevId: 229936698
2019-03-29 15:26:15 -07:00
Uday Bondhugula c1ca23ef6e Some loop fusion code cleanup/simplification post cl/229575126
- enforce the assumptions better / in a simpler way

PiperOrigin-RevId: 229612424
2019-03-29 15:23:43 -07:00
MLIR Team 27d067e164 LoopFusion improvements:
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.

PiperOrigin-RevId: 229575126
2019-03-29 15:21:59 -07:00
Uday Bondhugula f99a44a7cd Address documentation/readability related comments from cl/227252907 on memref
store forwarding - NFC.

PiperOrigin-RevId: 229561933
2019-03-29 15:20:59 -07:00
Uday Bondhugula 03e15e1b9f Minor code cleanup - NFC.
- readability changes

PiperOrigin-RevId: 229443430
2019-03-29 15:19:41 -07:00
MLIR Team 38c2fe3158 LoopFusion: automate selection of source loop nest slice depth and destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time).
*) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality).
*) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest.
*) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests).
*) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths).
*) Test: Adds multiple unit tests to test the new functionality.

PiperOrigin-RevId: 229219757
2019-03-29 15:13:53 -07:00
Uday Bondhugula 21baf86a2f Extend loop-fusion's slicing utility + other fixes / updates
- refactor toAffineFromEq and the code surrounding it; refactor code into
  FlatAffineConstraints::getSliceBounds
- add FlatAffineConstraints methods to detect identifiers as mod's and div's of other
  identifiers
- add FlatAffineConstraints::getConstantLower/UpperBound
- Address b/122118218 (don't assert on invalid fusion depths cmdline flags -
  instead, don't do anything; change cmdline flags
  src-loop-depth -> fusion-src-loop-depth
- AffineExpr/Map print method update: don't fail on null instances (since we have
  a wrapper around a pointer, it's avoidable); rationale: dump/print methods should
  never fail if possible.
- Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to
  IsRangeOneToOne when it's trivially going to be true.
- Add additional test cases to exercise the new support
- update a few existing test cases since the maps are now generated uniformly with
  all destination loop operands appearing for the backward slice
- Fix projectOut - fix wrong range for getBestElimCandidate.
- Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since
  we didn't have any non-hyperrectangular ones.

PiperOrigin-RevId: 228265152
2019-03-29 15:03:20 -07:00
Uday Bondhugula 56b3640b94 Misc readability and doc / code comment related improvements - NFC
- when SSAValue/MLValue existed, code at several places was forced to create additional
  aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
  rid of such redundant code

- use filling ctors instead of explicit loops

- for smallvectors, change insert(list.end(), ...) -> append(...

- improve comments at various places

- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
  getMemRefAccess. In the next CL, provide getAccess() accessors for load,
  store, DMA op's to return a MemRefAccess.

PiperOrigin-RevId: 228243638
2019-03-29 15:02:41 -07:00
Chris Lattner 7974889f54 Update and generalize various passes to work on both CFG and ML functions,
simplifying them in minor ways.  The only significant cleanup here
is the constant folding pass.  All the other changes are simple and easy,
but this is still enough to shrink the compiler by 45LOC.

The one pass left to merge is the CSE pass, which will be move involved, so I'm
splitting it out to its own patch (which I'll tackle right after this).

This is step 28/n towards merging instructions and statements.

PiperOrigin-RevId: 227328115
2019-03-29 14:49:52 -07:00
Uday Bondhugula b9fe6be6d4 Introduce memref store to load forwarding - a simple memref dataflow analysis
- the load/store forwarding relies on memref dependence routines as well as
  SSA/dominance to identify the memref store instance uniquely supplying a value
  to a memref load, and replaces the result of that load with the value being
  stored. The memref is also deleted when possible if only stores remain.

- add methods for post dominance for MLFunction blocks.

- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
  getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
  earlier static)

- add a helper method in FlatAffineConstraints - isRangeOneToOne.

PiperOrigin-RevId: 227252907
2019-03-29 14:47:28 -07:00
Chris Lattner dffc589ad2 Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops.  Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.

This is step 25/n towards merging instructions and statements.

PiperOrigin-RevId: 227243966
2019-03-29 14:46:58 -07:00
Chris Lattner 456ad6a8e0 Standardize naming of statements -> instructions, revisting the code base to be
consistent and moving the using declarations over.  Hopefully this is the last
truly massive patch in this refactoring.

This is step 21/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227178245
2019-03-29 14:44:30 -07:00
Chris Lattner 315a466aed Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand.
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.

This is step 19/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227163082
2019-03-29 14:43:58 -07:00
Chris Lattner 69d9e990fa Eliminate the using decls for MLFunction and CFGFunction standardizing on
Function.

This is step 18/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227139399
2019-03-29 14:43:13 -07:00
Chris Lattner 5187cfcf03 Merge Operation into OperationInst and standardize nomenclature around
OperationInst.  This is a big mechanical patch.

This is step 16/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227093712
2019-03-29 14:42:23 -07:00
Chris Lattner 3f190312f8 Merge SSAValue, CFGValue, and MLValue together into a single Value class, which
is the new base of the SSA value hierarchy.  This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate.  This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.

This is step 11/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227064624
2019-03-29 14:40:06 -07:00
Chris Lattner d613f5ab65 Refactor MLFunction to contain a StmtBlock for its body instead of inheriting
from it.  This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future.  This also removes IfClause and ForStmtBody.

This is step 5/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226936541
2019-03-29 14:36:35 -07:00
MLIR Team 4eef795a1d Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice.
Updates LoopFusion pass to take these parameters as command line flags for experimentation.

PiperOrigin-RevId: 226514297
2019-03-29 14:35:03 -07:00
MLIR Team 6892ffb896 Improve loop fusion algorithm by using a memref dependence graph.
Fixed TODO for reduction fusion unit test.

PiperOrigin-RevId: 226277226
2019-03-29 14:33:02 -07:00
MLIR Team 3b69230b3a Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion.
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.

PiperOrigin-RevId: 225842944
2019-03-29 14:30:13 -07:00
MLIR Team b5424dd0cb Adds support for returning the direction of the dependence between memref accesses (distance/direction vectors).
Updates MemRefDependenceCheck to check and report on all memref access pairs at all loop nest depths.
Updates old and adds new memref dependence check tests.
Resolves multiple TODOs.

PiperOrigin-RevId: 220816515
2019-03-29 13:53:28 -07:00
Jacques Pienaar cc9a6ed09d Initialize Pass with PassID.
The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo.

PiperOrigin-RevId: 220485662
2019-03-29 13:50:34 -07:00
Jacques Pienaar 6f0fb22723 Add static pass registration
Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage.

Change build targets to alwayslink to enforce registration.

PiperOrigin-RevId: 220390178
2019-03-29 13:49:34 -07:00
MLIR Team f28e4df666 Adds a dependence check to test whether two accesses to the same memref access the same element.
- Builds access functions and iterations domains for each access.
- Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds.
- Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses.
- Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops.
- The LoopFusion pass will be extended in subsequent CLs.

PiperOrigin-RevId: 219630898
2019-03-29 13:47:13 -07:00