64 Commits

Author SHA1 Message Date
Uday Bondhugula
5d22044b5f Fix for getMemRefSizeInBytes: unsigned -> uint64_t
PiperOrigin-RevId: 234829637
2019-03-29 16:35:26 -07:00
Uday Bondhugula
f97c1c5b06 Misc. updates/fixes to analysis utils used for DMA generation; update DMA
generation pass to make it drop certain assumptions, complete TODOs.

- multiple fixes for getMemoryFootprintBytes
  - pass loopDepth correctly from getMemoryFootprintBytes()
  - use union while computing memory footprints

- bug fixes for addAffineForOpDomain
  - take into account loop step
  - add domains of other loop IVs in turn that might have been used in the bounds

- dma-generate: drop assumption of "non-unit stride loops being tile space loops
  and skipping those and recursing to inner depths"; DMA generation is now purely
  based on available fast mem capacity and memory footprint's calculated

- handle memory region compute failures/bailouts correctly from dma-generate

- loop tiling cleanup/NFC

- update some debug and error messages to use emitNote/emitError in
  pipeline-data-transfer pass - NFC

PiperOrigin-RevId: 234245969
2019-03-29 16:30:26 -07:00
Uday Bondhugula
f5eed89df0 Fix + cleanup for getMemRefRegion()
- determine symbols for the memref region correctly

- this wasn't exposed earlier since we didn't have any test cases where the
  portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of
  one IV  depending on other IVs within that part)

PiperOrigin-RevId: 233493872
2019-03-29 16:23:53 -07:00
Uday Bondhugula
c419accea3 Automated rollback of changelist 232728977.
PiperOrigin-RevId: 232944889
2019-03-29 16:21:38 -07:00
Uday Bondhugula
4ba8c9147d Automated rollback of changelist 232717775.
PiperOrigin-RevId: 232807986
2019-03-29 16:19:33 -07:00
River Riddle
fd2d7c857b Rename the 'if' operation in the AffineOps dialect to 'affine.if' and namespace
the AffineOps dialect with 'affine'.

PiperOrigin-RevId: 232728977
2019-03-29 16:18:59 -07:00
River Riddle
90d10b4e00 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect.
PiperOrigin-RevId: 232717775
2019-03-29 16:17:59 -07:00
River Riddle
905d84851d Address post submit review comments for removing Block::findInstPositionInBlock.
PiperOrigin-RevId: 232713514
2019-03-29 16:17:44 -07:00
MLIR Team
b9dde91ea6 Adds the ability to compute the MemRefRegion of a sliced loop nest. Utilizes this feature during loop fusion cost computation, to compute what the write region of a fusion candidate loop nest slice would be (without having to materialize the slice or change the IR).
*) Adds parameter to public API of MemRefRegion::compute for passing in the slice loop bounds to compute the memref region of the loop nest slice.
*) Exposes public method MemRefRegion::getRegionSize for computing the size of the memref region in bytes.

PiperOrigin-RevId: 232706165
2019-03-29 16:17:15 -07:00
River Riddle
42a2d7d6e1 Remove findInstPositionInBlock from the Block api.
PiperOrigin-RevId: 232704766
2019-03-29 16:16:43 -07:00
River Riddle
10237de8eb Refactor the affine analysis by moving some functionality to IR and some to AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring:
* AffineStructures has moved to IR.

* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.

* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.

* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.

PiperOrigin-RevId: 232586468
2019-03-29 16:15:41 -07:00
River Riddle
c9ad4621ce NFC: Move AffineApplyOp to the AffineOps dialect. This also moves the isValidDim/isValidSymbol methods from Value to the AffineOps dialect.
PiperOrigin-RevId: 232386632
2019-03-29 16:12:40 -07:00
Uday Bondhugula
0f50414fa4 Refactor common code getting memref access in getMemRefRegion - NFC
- use getAccessMap() instead of repeating it
- fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap
  allocation and unique_ptr where possible)

- change extractForInductionVars - MutableArrayRef -> ArrayRef for the
  arguments. Since the method is just returning copies of 'Value *', the client
  can't mutate the pointers themselves; it's fine to mutate the 'Value''s
  themselves, but that doesn't mutate the pointers to those.

- change the way extractForInductionVars returns (see b/123437690)

PiperOrigin-RevId: 232359277
2019-03-29 16:12:25 -07:00
Uday Bondhugula
b26900dce5 Update dma-generate pass to (1) work on blocks of instructions (instead of just
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors

- change dma-generate pass to work on blocks of instructions (start/end
  iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
  straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
  in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
  generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
  capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
  option (besides command line flag)

- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
  replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods

Eg. output

$ mlir-opt  -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
            ^

$ mlir-opt -debug-only=dma-generate  -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {

PiperOrigin-RevId: 232297044
2019-03-29 16:09:52 -07:00
River Riddle
870d778350 Begin the process of fully removing OperationInst. This patch cleans up references to OperationInst in the /include, /AffineOps, and lib/Analysis.
PiperOrigin-RevId: 232199262
2019-03-29 16:09:36 -07:00
River Riddle
5052bd8582 Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body.
PiperOrigin-RevId: 232060516
2019-03-29 16:06:49 -07:00
River Riddle
755538328b Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231342063
2019-03-29 15:59:30 -07:00
Nicolas Vasilache
ae772b7965 Automated rollback of changelist 231318632.
PiperOrigin-RevId: 231327161
2019-03-29 15:42:38 -07:00
River Riddle
5ecef2b3f6 Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231318632
2019-03-29 15:42:08 -07:00
River Riddle
36babbd781 Change the ForInst induction variable to be a block argument of the body instead of the ForInst itself. This is a necessary step in converting ForInst into an operation.
PiperOrigin-RevId: 231064139
2019-03-29 15:40:23 -07:00
Nicolas Vasilache
0e7a8a9027 Drop AffineMap::Null and IntegerSet::Null
Addresses b/122486036

This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing
the Null method and cleaning up the constructors.

As the ::Null uses were tracked down, opportunities appeared to untangle some
of the Parsing logic and make it explicit where AffineMap/IntegerSet have
ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit
pointer values of AffineMap* and IntegerSet* that were passed as function
parameters. Depending the values of those pointers one of 3 behaviors could
occur.

This parsing logic convolution is one of the rare cases where I would advocate
for code duplication. The more proper fix would be to make the syntax
unambiguous or to allow some lookahead.

PiperOrigin-RevId: 231058512
2019-03-29 15:40:08 -07:00
River Riddle
c3424c3c75 Allow operations to hold a blocklist and add support for parsing/printing a block list for verbose printing.
PiperOrigin-RevId: 230951462
2019-03-29 15:37:37 -07:00
Uday Bondhugula
f94b15c247 Update dma-generate: update for multiple load/store op's per memref
- introduce a way to compute union using symbolic rectangular bounding boxes
- handle multiple load/store op's to the same memref by taking a union of the regions
- command-line argument to provide capacity of the fast memory space
- minor change to replaceAllMemRefUsesWith to not generate affine_apply if the
  supplied index remap was identity

PiperOrigin-RevId: 230848185
2019-03-29 15:35:38 -07:00
River Riddle
451869f394 Add cloning functionality to Block and Function, this also adds support for remapping successor block operands of terminator operations. We define a new BlockAndValueMapping class to simplify mapping between cloned values.
PiperOrigin-RevId: 230768759
2019-03-29 15:34:20 -07:00
River Riddle
6859f33292 Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.
PiperOrigin-RevId: 230605756
2019-03-29 15:33:20 -07:00
Uday Bondhugula
864d9e02a1 Update fusion cost model + some additional infrastructure and debug information for -loop-fusion
- update fusion cost model to fuse while tolerating a certain amount of redundant
  computation; add cl option -fusion-compute-tolerance
  evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC

PiperOrigin-RevId: 230541857
2019-03-29 15:32:06 -07:00
Uday Bondhugula
94a03f864f Allocate private/local buffers for slices accurately during fusion
- the size of the private memref created for the slice should be based on
  the memref region accessed at the depth at which the slice is being
  materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
  to the region accessed based on the entire domain.

- leads to a significant contraction of the temporary / intermediate memref
  whenever the memref isn't reduced to a single scalar (through store fwd'ing).

Other changes

- update to promoteIfSingleIteration - avoid introducing unnecessary identity
  map affine_apply from IV; makes it much easier to write and read test cases
  and pass output for all passes that use promoteIfSingleIteration; loop-fusion
  test cases become much simpler

- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
  'domInstFilter' could be one of the ops erased due to a memref replacement in
  it.

- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
  missing (the latter need not always be 1); add lbFloorDivisors output argument

- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape

PiperOrigin-RevId: 230405218
2019-03-29 15:30:31 -07:00
MLIR Team
27d067e164 LoopFusion improvements:
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.

PiperOrigin-RevId: 229575126
2019-03-29 15:21:59 -07:00
Uday Bondhugula
03e15e1b9f Minor code cleanup - NFC.
- readability changes

PiperOrigin-RevId: 229443430
2019-03-29 15:19:41 -07:00
MLIR Team
38c2fe3158 LoopFusion: automate selection of source loop nest slice depth and destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time).
*) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality).
*) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest.
*) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests).
*) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths).
*) Test: Adds multiple unit tests to test the new functionality.

PiperOrigin-RevId: 229219757
2019-03-29 15:13:53 -07:00
Nicolas Vasilache
362557e11c Simplify compositions of AffineApply
This CL is the 6th and last on the path to simplifying AffineMap composition.
This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple
calls to `fullyComposeAffineMapAndOperands`.

PiperOrigin-RevId: 228962580
2019-03-29 15:11:56 -07:00
Chris Lattner
2b902f1288 Delete FuncBuilder::createChecked. It is perhaps still a good idea, but has no
clients.  Let's re-add it in the future if there is ever a reason to.  NFC.

Unrelatedly, add a use of a variable to unbreak the non-assert build.

PiperOrigin-RevId: 228284026
2019-03-29 15:05:08 -07:00
Uday Bondhugula
e94ba6815a Fix 0-d memref corner case for getMemRefRegion()
- fix crash on test/Transforms/canonicalize.mlir with
  -memref-bound-check

PiperOrigin-RevId: 228268486
2019-03-29 15:03:50 -07:00
Uday Bondhugula
21baf86a2f Extend loop-fusion's slicing utility + other fixes / updates
- refactor toAffineFromEq and the code surrounding it; refactor code into
  FlatAffineConstraints::getSliceBounds
- add FlatAffineConstraints methods to detect identifiers as mod's and div's of other
  identifiers
- add FlatAffineConstraints::getConstantLower/UpperBound
- Address b/122118218 (don't assert on invalid fusion depths cmdline flags -
  instead, don't do anything; change cmdline flags
  src-loop-depth -> fusion-src-loop-depth
- AffineExpr/Map print method update: don't fail on null instances (since we have
  a wrapper around a pointer, it's avoidable); rationale: dump/print methods should
  never fail if possible.
- Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to
  IsRangeOneToOne when it's trivially going to be true.
- Add additional test cases to exercise the new support
- update a few existing test cases since the maps are now generated uniformly with
  all destination loop operands appearing for the backward slice
- Fix projectOut - fix wrong range for getBestElimCandidate.
- Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since
  we didn't have any non-hyperrectangular ones.

PiperOrigin-RevId: 228265152
2019-03-29 15:03:20 -07:00
Uday Bondhugula
56b3640b94 Misc readability and doc / code comment related improvements - NFC
- when SSAValue/MLValue existed, code at several places was forced to create additional
  aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
  rid of such redundant code

- use filling ctors instead of explicit loops

- for smallvectors, change insert(list.end(), ...) -> append(...

- improve comments at various places

- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
  getMemRefAccess. In the next CL, provide getAccess() accessors for load,
  store, DMA op's to return a MemRefAccess.

PiperOrigin-RevId: 228243638
2019-03-29 15:02:41 -07:00
Uday Bondhugula
8496f2c30b Complete TODOs / cleanup for loop-fusion utility
- this is CL 1/2 that does a clean up and gets rid of one limitation in an
  underlying method - as a result, fusion works for more cases.
- fix bugs/incomplete impl. in toAffineMapFromEq
- fusing across rank changing reshapes for example now just works

  For eg. given a rank 1 memref to rank 2 memref reshape (64 -> 8 x 8) like this,
  -loop-fusion -memref-dataflow-opt now completely fuses and inlines/store-forward
  to get rid of the temporary:

INPUT

  // Rank 1 -> Rank 2 reshape
  for %i0 = 0 to 64 {
     %v = load %A[%i0]
     store %v, %B[%i0 floordiv 8, i0 mod 8]
  }

  for %i1 = 0 to 8
    for %i2 = 0 to 8
      %w = load %B[%i1, i2]
      "foo"(%w) : (f32) -> ()

OUTPUT

$ mlir-opt -loop-fusion -memref-dataflow-opt fuse_reshape.mlir

#map0 = (d0, d1) -> (d0 * 8 + d1)
mlfunc @fuse_reshape(%arg0: memref<64xf32>) {
  for %i0 = 0 to 8 {
    for %i1 = 0 to 8 {
      %0 = affine_apply #map0(%i0, %i1)
      %1 = load %arg0[%0] : memref<64xf32>
      "foo"(%1) : (f32) -> ()
    }
  }
}

AFAIK, there is no polyhedral tool / compiler that can perform such fusion -
because it's not really standard loop fusion, but possible through a
generalized slicing-based approach such as ours.

PiperOrigin-RevId: 227918338
2019-03-29 14:57:22 -07:00
Uday Bondhugula
f12182157e Introduce PostDominanceInfo, fix properlyDominates() for Instructions
- introduce PostDominanceInfo in the right/complete way and use that for post
  dominance check in store-load forwarding
- replace all uses of Analysis/Utils::dominates/properlyDominates with
  DominanceInfo::dominates/properlyDominates
- drop all redundant copies of dominance methods in Analysis/Utils/
- in pipeline-data-transfer, replace dominates call with a much less expensive
  check; similarly, substitute dominates() in checkMemRefAccessDependence with
  a simpler check suitable for that context
- fix a bug in properlyDominates
- improve doc for 'for' instruction 'body'

PiperOrigin-RevId: 227320507
2019-03-29 14:48:44 -07:00
Uday Bondhugula
b9fe6be6d4 Introduce memref store to load forwarding - a simple memref dataflow analysis
- the load/store forwarding relies on memref dependence routines as well as
  SSA/dominance to identify the memref store instance uniquely supplying a value
  to a memref load, and replaces the result of that load with the value being
  stored. The memref is also deleted when possible if only stores remain.

- add methods for post dominance for MLFunction blocks.

- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
  getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
  earlier static)

- add a helper method in FlatAffineConstraints - isRangeOneToOne.

PiperOrigin-RevId: 227252907
2019-03-29 14:47:28 -07:00
Chris Lattner
56e2a6cc3b Merge the verifier logic for all functions into a unified framework, this
requires enhancing DominanceInfo to handle the structure of an ML function,
which is required anyway.  Along the way, this also fixes a const correctness
problem with Instruction::getBlock().

This is step 24/n towards merging instructions and statements.

PiperOrigin-RevId: 227228900
2019-03-29 14:45:43 -07:00
Chris Lattner
456ad6a8e0 Standardize naming of statements -> instructions, revisting the code base to be
consistent and moving the using declarations over.  Hopefully this is the last
truly massive patch in this refactoring.

This is step 21/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227178245
2019-03-29 14:44:30 -07:00
Chris Lattner
315a466aed Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand.
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.

This is step 19/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227163082
2019-03-29 14:43:58 -07:00
Chris Lattner
69d9e990fa Eliminate the using decls for MLFunction and CFGFunction standardizing on
Function.

This is step 18/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227139399
2019-03-29 14:43:13 -07:00
Chris Lattner
d798f9bad5 Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(),
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.

This is step 17/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227121537
2019-03-29 14:42:40 -07:00
Chris Lattner
5187cfcf03 Merge Operation into OperationInst and standardize nomenclature around
OperationInst.  This is a big mechanical patch.

This is step 16/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227093712
2019-03-29 14:42:23 -07:00
Chris Lattner
4c05f8cac6 Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new
FuncBuilder class.  Also rename SSAValue.cpp to Value.cpp

This is step 12/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227067644
2019-03-29 14:40:22 -07:00
Chris Lattner
3f190312f8 Merge SSAValue, CFGValue, and MLValue together into a single Value class, which
is the new base of the SSA value hierarchy.  This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate.  This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.

This is step 11/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227064624
2019-03-29 14:40:06 -07:00
Chris Lattner
abf72a8bb1 Rename findFunction from the ML side of the house to be named getFunction(),
making it more similar to the CFG side of things.  It is true that in a deeply
nested case that this is not a guaranteed O(1) time operation, and that 'get'
could lead compiler hackers to think this is cheap, but we need to merge these
and we can look into solutions for this in the future if it becomes a problem
in practice.

This is step 9/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226983931
2019-03-29 14:38:49 -07:00
Chris Lattner
1301f907a1 Refactor ForStmt: having it contain a StmtBlock instead of subclassing
StmtBlock.  This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.

This is step 1/N towards merging BasicBlock and StmtBlock.  This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.

This is a mechanical patch, NFC.

PiperOrigin-RevId: 226684158
2019-03-29 14:35:19 -07:00
MLIR Team
4eef795a1d Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice.
Updates LoopFusion pass to take these parameters as command line flags for experimentation.

PiperOrigin-RevId: 226514297
2019-03-29 14:35:03 -07:00
MLIR Team
4f5ef1619e Pass loop depth 1 to memref dependence check when constructing dependence constraints used to calculate computation slice for loop fusion.
This done so that the dominance check between ancestors of op statements from src/dst memref accesses will be run.

PiperOrigin-RevId: 226350443
2019-03-29 14:33:46 -07:00