* [Pipeline] Add cross-block canonicalization break op
In participation of the upcoming removal of the cross-block canonicalization guards in the `comb` canonicalizers, we're introducing a `pipeline.src` operation to the pipeline dialect.
This operation is intended to alleviate the case of unintentional cross-block canonicalizations when a scheduled pipeline has not yet had its registers materialized.
This operation, `pipeline.src`, effectively acts as a canonicalization boundary, creating an in-block reference to a value that is defined in a predecessor block.
This means that we can keep our existing structure for unmaterialized, scheduled pipelines - values can be referenced from _any_ predecessor op through the `pipeline.src` operation, which still allows for said values to be easily moved across stages in a retiming scenario.
Upon register materialization, these `pipeline.src` operations are trivially removed, effectively becoming the block arguments of the pipeline stages.
* scheduling
* extra test
* some docs
* integration tests
* Review
---------
Co-authored-by: Morten Borup Petersen <mpetersen@microsoft.com>
* [Pipeline] Make `reset` signal optional
In some usecases, having a reset signal to a pipeline may do more harm than good in terms of synthetis/routing overhead. Instead, users can reset pipelines in other ways - e.g. by holder `go` low for `N stages` cycles. This, in turn, will set the pipeline stage enable registers to a deterministic state.
Allow for this by making the top-level `reset` signal optional.
* review
---------
Co-authored-by: Morten Borup Petersen <mpetersen@microsoft.com>
Prepares `sblocks` for scheduling by:
1. Defining an operation `ibis.pipelineheader` that provides place-holders for clk/reset/go/stall.
2. places all operations inside a `pipeline.unscheduled` operation
3. feeds any block arguments that are trivially returned by the sblock, through the pipeline
Having this allows us to run any scheduling we want on the `pipeline.unscheduled` operation, regardless of the surroundings.
I was considering whether to include more of the control logic surrounding the pipeline operation into this PR. However, any performant control logic of the pipeline itself (which eventually need to interact with the ready/valid network that drives the `sblock`) needs information about the scheduled pipeline. Hence, prepare for scheduling, schedule, then create the control network based on scheduling results.
This also simplifies the stage enable signals, seeing as the entry enable signal will now always be the last argument of any mlir block inside the pipeline.
This PR restricts the clock gate op to solely use the clock type.
Uses in other dialects, especially Pipeline, were adjusted.
To maintain canonicalization behaviour, a clock constant op along with an attribute is also introduced to represent constant clocks and to fold ops to them.
Be a little bit more strict about naming here - the `i1` block argument of any given stage represents the stage **enable** signal - stage **valid** is reserved for the **output** signal/register of a stage, that is fed to its successor stage (as the successor stage enable signal).
Instead, remove `IsolatedFromAbove` from the pipeline, and define any used value defined outside of the pipeline as an external input. The motiviation for this is to reduce the headache introduced by having to explicitly modify the `ext` input list of a pipeline if more external inputs are to be added into a pipeline (or hoisted out of).
The motivation for this refactor is to:
* remove the entry block - there's a bunch of different entry block arguments that symbolize different things, so we're asking for errors if these are printed as default block arguments.
* Add names to more things (pipeline, in and output ports).
Changes are:
* In the new format, there is no entry block (the entry block arguments is defined by the pipeline op). Furthermore, there is no `%s0_valid` signal, seeing as this is identical to the entry `%go` signal.
* Pipeline inputs and external inputs are expressed as an initializer list with type information. LHS is the name of the SSA value inside the pipeline, and RHS is the value that is passed into the pipeline.
* Furthermore, also adds the ability to access clock, reset and stall signals from anywhere within the pipeline.
* Adds an (optional) name to the pipeline which can be used during lowering.
* Also makes the outputs named, which likewise can be used during lowering.
Also changes to that the `clock-gate-regs` options actually implements clock gates instead of a `seq.comp_reg.ce` operation. To cover all cases, i think there needs to be three kinds of gating implementations - clock gate, clock enable (`seq.compreg.ce`) and input muxing. The first and last are what we have now.
This commit adds an explicit 'go' signal to the pipeline abstraction.
This signal is used to indicate when the pipeline should start. The
signal will propagate through each stage as the stage valid signal.
As a result of this, each stage now has a `s#_valid : i1` signal as the
last value in its block argument list. This value can be used within
each stage for any operations which requires access to the pipeline
control circuitry.
Given that pipeline stage validity is now explicit within the block
arguments of a stage, a `pipeline.stage` operation no longer has an
`enable` signal. As a small 'bugfix' here, a `seq.clock_gate` is emitted
to gate the pipeline stage separating registers on the pipeline stage
valid signal.
`ext` inputs are inputs which are accessible in any pipeline stage, and which will never be registered during register materialization. If you imagine a pipeline as signals going from left to right with registers in between, `ext` inputs are inputs that hook in from the top or bottom, into any pipeline stage.
In hardware (outlined) lowering, the external inputs are only provided to stages which actually reference them.
**Note**: The pipeline op signature is getting unruly, given that all inputs are bundled into the single `^bb0` definition. In a future commit, i plan to specialize the printer/parser for the pipeline op to mimick something like the `scf` ops, such that we have a bit more sane format.
```mlir
%out:2 = pipeline.scheduled(%arg0) ext(%arg1 : i32) clock %clk reset %rst : (i32, i32) -> (i32, i32) {
^bb0(%a0 : i32, %ext0: i32):
// vs
%out:2 = pipeline.scheduled(%a0 : i32 = %arg0) ext(%ext0 = %arg1 : i32) clock %clk reset %rst -> (i32, i32) {
```
Adds a bunch more verification to the `pipeline.scheduled` and `pipeline.latency` ops, as well as fixing the register materialization pass.
The register materialization pass has now been modified to accomodate arbitrarily nested operations within each stage, which is a superset of the support required for `pipeline.latency`, in case operations in the inner body references values defined outside of its stage.
Removes the notion that `pipeline` pipelines can be either latency insensitive or sensitive.
This is done because:
1. only latency sensitive lowerings are supported at the moment
2. there seems to be no clear way to infer backpressure through the pipeline without this being explicitly defined in the IR.
The latter is obviously the more grave concern. If/until this is eventually found out, we can add back latency-insensitivity. But until then, we should not advertise a feature that has no clear path of support.
The `pipeline.latency` operation represents an operation for wrapping
multi-cycle operations. The operation declares a single block
wherein any operation may be placed within. The operation is not
`IsolatedFromAbove` meaning that the operation can reference values
defined outside of the operation (subject to the materialization
phase of the parent pipeline).
This commit includes changes to the register materialization pass
(differentiating between `regs` and `pass` inputs to stages) and
pipeline HW lowering. Currently, `pipeline.latency` operations are
just inlined into the current insertion point.
This signal is intended to connect to all stages within the pipeline, and is used to stall the entirety of the pipeline. It is lowering defined how stages choose to use this signal, although in the common case, a `stall` signal would typically connect to the clock-enable input of the stage-separating registers.
Future PRs will implement hardware lowerings.
This PR adds a new operation, `pipeline.stage` to the `Pipeline` dialect.
This operation is used to make stages explicit - that is, a region-defining
operation with inputs and outputs (much like the `loopschedule.pipeline.stage`).
The motivation for this is that once a `pipeline.pipeline` has had registers
materialized, a representational change to make the stages explicit allows
for much cleaner RTL lowering whenever stages are _not_ to be lowered to
hardware directly inline in the parent `hw.module`.
To (hopefully) avoid confusion, this commit also renames some existing
operations in the dialect:
* `pipeline.stage` becomes `pipeline.ss` - a *stage separating* operation.
... which is really what it is - it doesn't define a stage, it merely separates
them.
* `pipeline.stage.regs` becomes `pipeline.ss.reg` - as above, it separates
stages, but also defines the registers that are formed across the
boundary.
* LLVM.h: drop mlir::Optional.
It's an alias for std::optional now, just drop
instead of adding the templated using alias here as well
(which we'd have to remember to drop).
* Convert Optional -> std::optional.
Drop some unnecessary initializations, but keep them for fields for now.
Particularly:
StandardToHandshake.h: keep init for field, for omission in {} initializers.
FSMToSV: similarly keep the current initialization.
[Pipeline] Add pipeline stage register materialization pass
This commit adds an intermediate transformation to the Pipeline dialect
which is responsible for converting `pipeline.stage` to `pipeline.stage.register`
operations. The purpose of this transformation is to 'fix' where
registers needs to be placed in the pipeline, after all stages have been
defined and placed.
In short, the transformation will scan through the pipeline (in order,
top to bottom) and insert `pipeline.stage.register` operations in place
of `pipeline.stage` operations. Any operand used in any operation will
be analyzed to determine if it originates in between the last seen stage
and the operation itself. If not, this means that the operand crossed
a pipeline stage, and as such, the value will be routed through the
predecessor stage (`routeThroughStage`).
The generated code will try to generate a `getOperands` method, but
that conflicts with a "keyword" in MLIR that is always
generated. Rename the arguments to generate a unique name and squash
the error.
This is a fairly straight-forward transformation since the brunt of the
work of detecting which values will be registered in a given pipeline
stage has already been performed by a prior pass.
This pass simply elaborates the `pipeline.stage.register` operations
into `seq.compreg` operations and stitches up the circuit.
It is probably fair to conclude that naming this dialect `StaticLogic` has been a pain point for a while. This commit proposes a dialect renaming to `Pipeline`, for a couple of reaons:
1. So far, we've only been working with pipeline abstractions within this dialect.
2. Pipeline representations aren't necessarily statically scheduled - we plan on adding switches to select between latency sensitive and latency insensitive lowerings of pipelines.
This name change does not preclude renamings in the future if we want to fit more stuff into this dialect. Personally, i think it is prudent to maintain a dialect name which reflects what's actually being done within the dialect, as well as the (near/mid/"someone actually intends to work on this"-term) future plans for the dialect.