Add Python bindings guide.
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, Kayjukh, jurahul, msifontes Tags: #mlir Differential Revision: https://reviews.llvm.org/D83527
This commit is contained in:
parent
553dbb6d7b
commit
c20c1960c1
|
|
@ -0,0 +1,328 @@
|
|||
# MLIR Python Bindings
|
||||
|
||||
Current status: Under development and not enabled by default
|
||||
|
||||
|
||||
## Building
|
||||
|
||||
### Pre-requisites
|
||||
|
||||
* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to
|
||||
be located by CMake.
|
||||
* A relatively recent Python3 installation
|
||||
|
||||
### CMake variables
|
||||
|
||||
* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL`
|
||||
|
||||
Enables building the Python bindings. Defaults to `OFF`.
|
||||
|
||||
* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL`
|
||||
|
||||
Links the native extension against the Python runtime library, which is
|
||||
optional on some platforms. While setting this to `OFF` can yield some greater
|
||||
deployment flexibility, linking in this way allows the linker to report
|
||||
compile time errors for unresolved symbols on all platforms, which makes for a
|
||||
smoother development workflow. Defaults to `ON`.
|
||||
|
||||
* **`PYTHON_EXECUTABLE`**:`STRING`
|
||||
|
||||
Specifies the `python` executable used for the LLVM build, including for
|
||||
determining header/link flags for the Python bindings. On systems with
|
||||
multiple Python implementations, setting this explicitly to the preferred
|
||||
`python3` executable is strongly recommended.
|
||||
|
||||
|
||||
## Design
|
||||
|
||||
### Use cases
|
||||
|
||||
There are likely two primary use cases for the MLIR python bindings:
|
||||
|
||||
1. Support users who expect that an installed version of LLVM/MLIR will yield
|
||||
the ability to `import mlir` and use the API in a pure way out of the box.
|
||||
|
||||
2. Downstream integrations will likely want to include parts of the API in their
|
||||
private namespace or specially built libraries, probably mixing it with other
|
||||
python native bits.
|
||||
|
||||
|
||||
### Composable modules
|
||||
|
||||
In order to support use case #2, the Python bindings are organized into
|
||||
composable modules that downstream integrators can include and re-export into
|
||||
their own namespace if desired. This forces several design points:
|
||||
|
||||
* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE`
|
||||
global constructor.
|
||||
|
||||
* Introduce headers for C++-only wrapper classes as other related C++ modules
|
||||
will need to interop with it.
|
||||
|
||||
* Separate any initialization routines that depend on optional components into
|
||||
its own module/dependency (currently, things like `registerAllDialects` fall
|
||||
into this category).
|
||||
|
||||
There are a lot of co-related issues of shared library linkage, distribution
|
||||
concerns, etc that affect such things. Organizing the code into composable
|
||||
modules (versus a monolithic `cpp` file) allows the flexibility to address many
|
||||
of these as needed over time. Also, compilation time for all of the template
|
||||
meta-programming in pybind scales with the number of things you define in a
|
||||
translation unit. Breaking into multiple translation units can significantly aid
|
||||
compile times for APIs with a large surface area.
|
||||
|
||||
### Submodules
|
||||
|
||||
Generally, the C++ codebase namespaces most things into the `mlir` namespace.
|
||||
However, in order to modularize and make the Python bindings easier to
|
||||
understand, sub-packages are defined that map roughly to the directory structure
|
||||
of functional units in MLIR.
|
||||
|
||||
Examples:
|
||||
|
||||
* `mlir.ir`
|
||||
* `mlir.passes` (`pass` is a reserved word :( )
|
||||
* `mlir.dialect`
|
||||
* `mlir.execution_engine` (aside from namespacing, it is important that
|
||||
"bulky"/optional parts like this are isolated)
|
||||
|
||||
In addition, initialization functions that imply optional dependencies should
|
||||
be in underscored (notionally private) modules such as `_init` and linked
|
||||
separately. This allows downstream integrators to completely customize what is
|
||||
included "in the box" and covers things like dialect registration,
|
||||
pass registration, etc.
|
||||
|
||||
### Loader
|
||||
|
||||
LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with
|
||||
other non-trivial native extensions. As such, the native extension (i.e. the
|
||||
`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol
|
||||
(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py`
|
||||
and siblings which loads and re-exports it. This split provides a place to stage
|
||||
code that needs to prepare the environment *before* the shared library is loaded
|
||||
into the Python runtime, and also provides a place that one-time initialization
|
||||
code can be invoked apart from module constructors.
|
||||
|
||||
To start with the `mlir/__init__.py` loader shim can be very simple and scale to
|
||||
future need:
|
||||
|
||||
```python
|
||||
from _mlir import *
|
||||
```
|
||||
|
||||
### Limited use of globals
|
||||
|
||||
For normal operations, parent-child constructor relationships are realized with
|
||||
constructor methods on a parent class as opposed to requiring
|
||||
invocation/creation from a global symbol.
|
||||
|
||||
For example, consider two code fragments:
|
||||
|
||||
```python
|
||||
|
||||
op = build_my_op()
|
||||
|
||||
region = mlir.Region(op)
|
||||
|
||||
```
|
||||
|
||||
vs
|
||||
|
||||
```python
|
||||
|
||||
op = build_my_op()
|
||||
|
||||
region = op.new_region()
|
||||
|
||||
```
|
||||
|
||||
For tightly coupled data structures like `Operation`, the latter is generally
|
||||
preferred because:
|
||||
|
||||
* It is syntactically less possible to create something that is going to access
|
||||
illegal memory (less error handling in the bindings, less testing, etc).
|
||||
|
||||
* It reduces the global-API surface area for creating related entities. This
|
||||
makes it more likely that if constructing IR based on an Operation instance of
|
||||
unknown providence, receiving code can just call methods on it to do what they
|
||||
want versus needing to reach back into the global namespace and find the right
|
||||
`Region` class.
|
||||
|
||||
* It leaks fewer things that are in place for C++ convenience (i.e. default
|
||||
constructors to invalid instances).
|
||||
|
||||
### Use the C-API
|
||||
|
||||
The Python APIs should seek to layer on top of the C-API to the degree possible.
|
||||
Especially for the core, dialect-independent parts, such a binding enables
|
||||
packaging decisions that would be difficult or impossible if spanning a C++ ABI
|
||||
boundary. In addition, factoring in this way side-steps some very difficult
|
||||
issues that arise when combining RTTI-based modules (which pybind derived things
|
||||
are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM).
|
||||
|
||||
|
||||
## Style
|
||||
|
||||
In general, for the core parts of MLIR, the Python bindings should be largely
|
||||
isomorphic with the underlying C++ structures. However, concessions are made
|
||||
either for practicality or to give the resulting library an appropriately
|
||||
"Pythonic" flavor.
|
||||
|
||||
### Properties vs get*() methods
|
||||
|
||||
Generally favor converting trivial methods like `getContext()`, `getName()`,
|
||||
`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is
|
||||
primarily a matter of calling `def_property_readonly` vs `def` in binding code,
|
||||
and makes things feel much nicer to the Python side.
|
||||
|
||||
For example, prefer:
|
||||
|
||||
```c++
|
||||
m.def_property_readonly("context", ...)
|
||||
```
|
||||
|
||||
Over:
|
||||
|
||||
```c++
|
||||
m.def("getContext", ...)
|
||||
```
|
||||
|
||||
### __repr__ methods
|
||||
|
||||
Things that have nice printed representations are really great :) If there is a
|
||||
reasonable printed form, it can be a significant productivity boost to wire that
|
||||
to the `__repr__` method (and verify it with a [doctest](#sample-doctest)).
|
||||
|
||||
### CamelCase vs snake_case
|
||||
|
||||
Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As
|
||||
a mechanical concession to Python style, this can go a long way to making the
|
||||
API feel like it fits in with its peers in the Python landscape.
|
||||
|
||||
If in doubt, choose names that will flow properly with other
|
||||
[PEP 8 style names](https://pep8.org/#descriptive-naming-styles).
|
||||
|
||||
### Prefer pseudo-containers
|
||||
|
||||
Many core IR constructs provide methods directly on the instance to query count
|
||||
and begin/end iterators. Prefer hoisting these to dedicated pseudo containers.
|
||||
|
||||
For example, a direct mapping of blocks within regions could be done this way:
|
||||
|
||||
```python
|
||||
region = ...
|
||||
|
||||
for block in region:
|
||||
|
||||
pass
|
||||
```
|
||||
|
||||
However, this way is preferred:
|
||||
|
||||
```python
|
||||
region = ...
|
||||
|
||||
for block in region.blocks:
|
||||
|
||||
pass
|
||||
|
||||
print(len(region.blocks))
|
||||
print(region.blocks[0])
|
||||
print(region.blocks[-1])
|
||||
```
|
||||
|
||||
Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate
|
||||
them to appropriate `__dunder__` methods and iterator wrappers in the bindings.
|
||||
|
||||
Note that this can be taken too far, so use good judgment. For example, block
|
||||
arguments may appear container-like but have defined methods for lookup and
|
||||
mutation that would be hard to model properly without making semantics
|
||||
complicated. If running into these, just mirror the C/C++ API.
|
||||
|
||||
### Provide one stop helpers for common things
|
||||
|
||||
One stop helpers that aggregate over multiple low level entities can be
|
||||
incredibly helpful and are encouraged within reason. For example, making
|
||||
`Context` have a `parse_asm` or equivalent that avoids needing to explicitly
|
||||
construct a SourceMgr can be quite nice. One stop helpers do not have to be
|
||||
mutually exclusive with a more complete mapping of the backing constructs.
|
||||
|
||||
## Testing
|
||||
|
||||
Tests should be added in the `test/Bindings/Python` directory and should
|
||||
typically be `.py` files that have a lit run line.
|
||||
|
||||
While lit can run any python module, prefer to lay tests out according to these
|
||||
rules:
|
||||
|
||||
* For tests of the API surface area, prefer
|
||||
[`doctest`](https://docs.python.org/3/library/doctest.html).
|
||||
* For generative tests (those that produce IR), define a Python module that
|
||||
constructs/prints the IR and pipe it through `FileCheck`.
|
||||
* Parsing should be kept self-contained within the module under test by use of
|
||||
raw constants and an appropriate `parse_asm` call.
|
||||
* Any file I/O code should be staged through a tempfile vs relying on file
|
||||
artifacts/paths outside of the test module.
|
||||
|
||||
### Sample Doctest
|
||||
|
||||
```python
|
||||
# RUN: %PYTHON %s
|
||||
|
||||
"""
|
||||
>>> m = load_test_module()
|
||||
Test basics:
|
||||
>>> m.operation.name
|
||||
"module"
|
||||
>>> m.operation.is_registered
|
||||
True
|
||||
>>> ... etc ...
|
||||
|
||||
Verify that repr prints:
|
||||
>>> m.operation
|
||||
<operation 'module'>
|
||||
"""
|
||||
|
||||
import mlir
|
||||
|
||||
TEST_MLIR_ASM = r"""
|
||||
func @test_operation_correct_regions() {
|
||||
// ...
|
||||
}
|
||||
"""
|
||||
|
||||
# TODO: Move to a test utility class once any of this actually exists.
|
||||
def load_test_module():
|
||||
ctx = mlir.ir.Context()
|
||||
ctx.allow_unregistered_dialects = True
|
||||
module = ctx.parse_asm(TEST_MLIR_ASM)
|
||||
return module
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import doctest
|
||||
doctest.testmod()
|
||||
```
|
||||
|
||||
### Sample FileCheck test
|
||||
|
||||
```python
|
||||
# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck
|
||||
|
||||
# TODO: Move to a test utility class once any of this actually exists.
|
||||
def print_module(f):
|
||||
m = f()
|
||||
print("// -----")
|
||||
print("// TEST_FUNCTION:", f.__name__)
|
||||
print(m.to_asm())
|
||||
return f
|
||||
|
||||
# CHECK-LABEL: TEST_FUNCTION: create_my_op
|
||||
@print_module
|
||||
def create_my_op():
|
||||
m = mlir.ir.Module()
|
||||
builder = m.new_op_builder()
|
||||
# CHECK: mydialect.my_operation ...
|
||||
builder.my_op()
|
||||
return m
|
||||
```
|
||||
Loading…
Reference in New Issue