This commit is contained in:
Dun Liang 2021-08-17 13:35:18 +08:00
commit 4e38190483
160 changed files with 11104 additions and 751 deletions

25
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@ -0,0 +1,25 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
## Describe the bug
A clear and concise description of what the bug is. 使用中文也可以。
## Full Log
Provide a full log of Jittor execution, Jittor will log environment information which help us to locate your bugs. Provide a screenshot is also acceptable.
## Minimal Reproduce
Reproduce this error with a file or several lines of code.
If it is not possible, leave it blank.
## Expected behavior
A clear and concise description of what you expected to happen.
If you are submitting an issue for the first time, please refer to [our guideline](https://github.com/Jittor/jittor/issues/238)

1
.gitignore vendored
View File

@ -1,5 +1,6 @@
my
.refresh
.DS_Store
__pycache__
.ipynb_checkpoints/
.vscode/

View File

@ -1,3 +1,5 @@
exclude __data__
exclude __pycache__
prune **/__data__/
prune **/__pycache__
prune *.pyc

View File

@ -89,7 +89,7 @@ for i,(x,y) in enumerate(get_data(n)):
Jittor框架对环境要求如下:
* 操作系统: **Linux**(e.g. Ubuntu/CentOS/Arch) 或 **Windows Subsystem of LinuxWSL**
* 操作系统: **Linux**(e.g. Ubuntu/CentOS/Arch), **macOS**x86_64或 **Windows Subsystem of LinuxWSL**
* Python版本 >= 3.7
* C++编译器 (需要下列至少一个)
- g++ >=5.4.0
@ -100,7 +100,9 @@ Jittor框架对环境要求如下:
如果您不希望手动配置环境,我们推荐使用 Docker 进行安装。
除此之外,您还可以使用 pip 安装和手动安装。
注意目前Jittor通过WSL的方式在Windows操作系统上运行WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10)WSL版本目前尚不支持CUDA。
注意1目前Jittor通过WSL的方式在Windows操作系统上运行WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10)WSL版本目前尚不支持CUDA。
注意2macOS 用户需要安装额外依赖,请参考 [macOS 安装](#macOS-安装)。
Jittor 提供了三种安装方法dockerpip和手动安装
@ -112,6 +114,7 @@ Jittor 提供了三种安装方法dockerpip和手动安装
## Docker 安装
我们提供了Docker安装方式免去您配置环境Docker安装方法如下
@ -145,6 +148,27 @@ python3.7 -m jittor.test.test_example
如果测试运行通过,恭喜你已经安装完成.
jittor会自动在路径中寻找合适的编译器, 如果您希望手动指定编译器, 请使用环境变量 `cc_path``nvcc_path`(可选).
## macOS 安装
macOS 请使用 [homebrew](https://brew.sh) 安装额外的依赖 (python>=3.7, onednn)。
```bash
brew install python@3.7 onednn libomp
```
之后您可以通过 pip 安装 jittor并测试是否可以成功运行。
```bash
python3.7 -m pip install jittor
python3.7 -m jittor.test.test_example
```
目前在macOS中jittor 只支持 CPU 计算。
## 手动安装
@ -313,11 +337,11 @@ help(jt.ops)
[1]: notebook/example.src.md "示例"
[2]: notebook/basics.src.md "基本概念"
[3]: notebook/meta_op.src.md "元算子"
[4]: notebook/custom_op.src.md "自定义算子"
[5]: notebook/profiler.src.md "性能分析器"
[1]: python/jittor/notebook/example.src.md "示例"
[2]: python/jittor/notebook/basics.src.md "基本概念"
[3]: python/jittor/notebook/meta_op.src.md "元算子"
[4]: python/jittor/notebook/custom_op.src.md "自定义算子"
[5]: python/jittor/notebook/profiler.src.md "性能分析器"
这些notebooks可以通过python3.7 -m jittor.notebook在您自己的计算机中运行。

View File

@ -92,9 +92,10 @@ We provide some jupyter notebooks to help you quick start with Jittor.
Jittor environment requirements:
* System: **Linux**(e.g. Ubuntu/CentOS/Arch) (or **Windows** Subsystem of Linux)
* System: **Linux**(e.g. Ubuntu/CentOS/Arch), **macOS**, or **Windows Subsystem of Linux (WSL)**
* Python version >= 3.7
* CPU compiler (require at least one of the following)
* g++ (>=5.4.0)
@ -105,7 +106,9 @@ Jittor environment requirements:
Note: Currently Jittor runs on the Windows operating system through WSL. For the installation method of WSL, please refer to [Microsoft official website](https://docs.microsoft.com/en-us/windows/wsl/install-win10). WSL does not yet support CUDA.
Note#1: Currently Jittor runs on the Windows operating system through WSL. For the installation method of WSL, please refer to [Microsoft official website](https://docs.microsoft.com/en-us/windows/wsl/install-win10). WSL does not yet support CUDA.
Note#2: macOS users have to install additional dependencies, see [macOS install](#macOS-install).
Jittor offers three ways to install: docker, pip, or manual.
@ -139,6 +142,27 @@ python3.7 -m jittor.test.test_example
```
## macOS install
Please first install additional dependencies with [homebrew](https://brew.sh).
```bash
brew install python@3.7 onednn libomp
```
Then you can install jittor through pip and run the example.
```bash
python3.7 -m pip install jittor
python3.7 -m jittor.test.test_example
```
Currently jittor only supports CPU in macOS.
## manual install
We will show how to install Jittor in Ubuntu 16.04 step by step, Other Linux distributions may have similar commands.
@ -307,11 +331,11 @@ If you want to know more about Jittor, please check out the notebooks below:
[1]: notebook/example.src.md "example"
[2]: notebook/basics.src.md "basics"
[3]: notebook/meta_op.src.md "meta_op"
[4]: notebook/custom_op.src.md "custom_op"
[5]: notebook/profiler.src.md "profiler"
[1]: python/jittor/notebook/example.src.md "example"
[2]: python/jittor/notebook/basics.src.md "basics"
[3]: python/jittor/notebook/meta_op.src.md "meta_op"
[4]: python/jittor/notebook/custom_op.src.md "custom_op"
[5]: python/jittor/notebook/profiler.src.md "profiler"
Those notebooks can be started in your own computer by `python3.7 -m jittor.notebook`
@ -366,7 +390,7 @@ Jittor is currently maintained by the [Tsinghua CSCG Group](https://cg.cs.tsingh
@article{hu2020jittor,
title={Jittor: a novel deep learning framework with meta-operators and unified graph execution},
author={Hu, Shi-Min and Liang, Dun and Yang, Guo-Ye and Yang, Guo-Wei and Zhou, Wen-Yang},
journal={Information Sciences},
journal={Science China Information Sciences},
volume={63},
number={222103},
pages={1--21},

View File

@ -113,7 +113,7 @@ We provide some jupyter notebooks to help you quick start with Jittor.
Jittor框架对环境要求如下:
* 操作系统: **Linux**(e.g. Ubuntu/CentOS/Arch) 或 **Windows Subsystem of LinuxWSL**
* 操作系统: **Linux**(e.g. Ubuntu/CentOS/Arch), **macOS**x86_64或 **Windows Subsystem of LinuxWSL**
* Python版本 >= 3.7
* C++编译器 (需要下列至少一个)
- g++ >=5.4.0
@ -124,13 +124,15 @@ Jittor框架对环境要求如下:
如果您不希望手动配置环境,我们推荐使用 Docker 进行安装。
除此之外,您还可以使用 pip 安装和手动安装。
注意目前Jittor通过WSL的方式在Windows操作系统上运行WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10)WSL版本目前尚不支持CUDA。
注意1目前Jittor通过WSL的方式在Windows操作系统上运行WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10)WSL版本目前尚不支持CUDA。
注意2macOS 用户需要安装额外依赖,请参考 [macOS 安装](#macOS-安装)。
Jittor 提供了三种安装方法dockerpip和手动安装
Jittor environment requirements:
* System: **Linux**(e.g. Ubuntu/CentOS/Arch) (or **Windows** Subsystem of Linux)
* System: **Linux**(e.g. Ubuntu/CentOS/Arch), **macOS**, or **Windows Subsystem of Linux (WSL)**
* Python version >= 3.7
* CPU compiler (require at least one of the following)
* g++ (>=5.4.0)
@ -141,7 +143,9 @@ Jittor environment requirements:
Note: Currently Jittor runs on the Windows operating system through WSL. For the installation method of WSL, please refer to [Microsoft official website](https://docs.microsoft.com/en-us/windows/wsl/install-win10). WSL does not yet support CUDA.
Note#1: Currently Jittor runs on the Windows operating system through WSL. For the installation method of WSL, please refer to [Microsoft official website](https://docs.microsoft.com/en-us/windows/wsl/install-win10). WSL does not yet support CUDA.
Note#2: macOS users have to install additional dependencies, see [macOS install](#macOS-install).
Jittor offers three ways to install: docker, pip, or manual.
@ -183,6 +187,31 @@ python3.7 -m jittor.test.test_example
如果测试运行通过,恭喜你已经安装完成.
jittor会自动在路径中寻找合适的编译器, 如果您希望手动指定编译器, 请使用环境变量 `cc_path``nvcc_path`(可选).
## macOS 安装
## macOS install
macOS 请使用 [homebrew](https://brew.sh) 安装额外的依赖 (python>=3.7, onednn)。
Please first install additional dependencies with [homebrew](https://brew.sh).
```bash
brew install python@3.7 onednn libomp
```
之后您可以通过 pip 安装 jittor并测试是否可以成功运行。
Then you can install jittor through pip and run the example.
```bash
python3.7 -m pip install jittor
python3.7 -m jittor.test.test_example
```
目前在macOS中jittor 只支持 CPU 计算。
Currently jittor only supports CPU in macOS.
## 手动安装
## manual install
@ -387,16 +416,16 @@ If you want to know more about Jittor, please check out the notebooks below:
[1]: notebook/example.src.md "example"
[2]: notebook/basics.src.md "basics"
[3]: notebook/meta_op.src.md "meta_op"
[4]: notebook/custom_op.src.md "custom_op"
[5]: notebook/profiler.src.md "profiler"
[1]: notebook/example.src.md "示例"
[2]: notebook/basics.src.md "基本概念"
[3]: notebook/meta_op.src.md "元算子"
[4]: notebook/custom_op.src.md "自定义算子"
[5]: notebook/profiler.src.md "性能分析器"
[1]: python/jittor/notebook/example.src.md "example"
[2]: python/jittor/notebook/basics.src.md "basics"
[3]: python/jittor/notebook/meta_op.src.md "meta_op"
[4]: python/jittor/notebook/custom_op.src.md "custom_op"
[5]: python/jittor/notebook/profiler.src.md "profiler"
[1]: python/jittor/notebook/example.src.md "示例"
[2]: python/jittor/notebook/basics.src.md "基本概念"
[3]: python/jittor/notebook/meta_op.src.md "元算子"
[4]: python/jittor/notebook/custom_op.src.md "自定义算子"
[5]: python/jittor/notebook/profiler.src.md "性能分析器"
Those notebooks can be started in your own computer by `python3.7 -m jittor.notebook`

View File

@ -27,6 +27,9 @@
jittor.mpi
jittor.linalg
jittor.console
jittor.distributions
jittor.attention
jittor.loss3d
.. toctree::
@ -34,7 +37,6 @@
:caption: 其他:
教程 <https://cg.cs.tsinghua.edu.cn/jittor/tutorial/>
todo
Indices and tables
==================

View File

@ -1,7 +1,7 @@
jittor.attention
=====================
这里是Jittor的 数据变换 模块的API文档您可以通过`from jittor import attention`来获取该模块。
这里是Jittor的 注意力 模块的API文档您可以通过`from jittor import attention`来获取该模块。
```eval_rst
.. automodule:: jittor.attention

View File

@ -0,0 +1,10 @@
jittor.distributions
=====================
这里是Jittor的随机分布模块的API文档您可以通过`from jittor import distributions`来获取该模块。
```eval_rst
.. automodule:: jittor.distributions
:members:
:undoc-members:
```

View File

@ -0,0 +1,10 @@
jittor.loss3d
=====================
这里是Jittor的 3d 损失函数 模块的API文档您可以通过`from jittor import loss3d`来获取该模块。
```eval_rst
.. automodule:: jittor.loss3d
:members: chamfer_loss, ChamferLoss, earth_mover_distance, EarthMoverDistance
:undoc-members:
```

View File

@ -95,6 +95,9 @@ def val(epoch):
下面是 jittor 的 mpi api reference.
* `jt.world_rank`: 获取当前进程总数量如果没有用mpi则为1。
* `jt.rank`: 获取当前进程的编号,区间为`0 jt.world_rank-1` 如果没有用mpi则为0。
```eval_rst
.. automodule:: jittor_mpi_core
:members:

View File

@ -10,7 +10,7 @@ jittor.nn
.. automodule:: jittor.nn
:imported-members:
:members: Pool, pool, AdaptiveAvgPool2d
:members: Pool, pool, AdaptiveAvgPool2d, Pool3d, AdaptiveMaxPool2d, AdaptiveAvgPool3d, AdaptiveMaxPool2d, pool3d, AvgPool2d, AvgPool3d, avg_pool2d, MaxPool2d, MaxPool3d, max_pool2d, max_pool3d, MaxUnpool2d, MaxUnpool3d
:undoc-members:
.. autoclass:: jittor.nn.ReLU

View File

@ -9,7 +9,7 @@
# file 'LICENSE.txt', which is part of this source code package.
# ***************************************************************
__version__ = '1.2.3.12'
__version__ = '1.2.3.92'
from jittor_utils import lock
with lock.lock_scope():
ori_int = int
@ -23,7 +23,7 @@ with lock.lock_scope():
from jittor_core import *
from jittor_core.ops import *
from . import compile_extern
from .compile_extern import mkl_ops, mpi, mpi_ops, in_mpi, rank
from .compile_extern import mkl_ops, mpi, mpi_ops, in_mpi, rank, world_size
if core.get_device_count() == 0:
has_cuda = compile_extern.has_cuda = compiler.has_cuda = False
if has_cuda:
@ -437,11 +437,11 @@ def pow(x, y):
Var.pow = Var.__pow__ = pow
def argmax(x, dim, keepdims:bool=False):
return x.arg_reduce("max", dim, keepdims)
return jt.arg_reduce(x, "max", dim, keepdims)
Var.argmax = argmax
def argmin(x, dim, keepdims:bool=False):
return x.arg_reduce("min", dim, keepdims)
return jt.arg_reduce(x, "min", dim, keepdims)
Var.argmin = argmin
def randn(*size, dtype="float32", requires_grad=True) -> Var:
@ -562,7 +562,7 @@ def randint(low, high=None, shape=(1,), dtype="int32") -> Var:
'''
if high is None: low, high = 0, low
v = (jt.random(shape) * (high - low) + low).clamp(low, high-0.5)
v = jt.floor(v)
v = jt.floor_int(v)
return v.astype(dtype)
def randint_like(x, low, high=None) -> Var:
@ -780,11 +780,16 @@ class Module:
stack = []
def callback(parents, k, v, n):
stack.append(str(k))
for k2, p in v.__dict__.items():
if k2.startswith("_"): continue
dc = v.__dict__
if isinstance(v, nn.ParameterList):
dc = v.params
for k2, p in dc.items():
if isinstance(k2, str) and k2.startswith("_"): continue
if isinstance(p, Var):
ps.append(p)
p.name(".".join(stack[1:]+[str(k2)]))
pname = ".".join(stack[1:]+[str(k2)])
if len(pname) > len(p.name()):
p.name(pname)
def callback_leave(parents, k, v, n):
stack.pop()
self.dfs([], None, callback, callback_leave)
@ -822,53 +827,101 @@ class Module:
self.dfs([], "", callback, callback_leave)
return ms
def register_forward_hook(self, func):
def requires_grad_(self, requires_grad=True):
self._requires_grad = requires_grad
self._place_hooker()
return self
def __hooked_call__(self, *args, **kw):
if hasattr(self, "__fhook2__"):
if len(kw):
self.__fhook2__(self, args, kw)
else:
self.__fhook2__(self, args)
if hasattr(self, "__bihook__"):
if len(kw):
LOG.w("backward hook not support kw")
args = grad_hooker(args, self.__bihook__)
if hasattr(self, "_requires_grad") and not self._requires_grad:
with jt.no_grad():
ret = self.__hooked_call__(*args, **kw)
else:
ret = self.__hooked_call__(*args, **kw)
if hasattr(self, "__bohook__"):
if len(kw):
LOG.w("backward hook not support kw")
if isinstance(ret, Var):
ret = grad_hooker((ret,), self.__bohook__)[0]
else:
ret = grad_hooker(ret, self.__bohook__)
if hasattr(self, "__fhook__"):
if len(kw):
self.__fhook__(self, args, ret, kw)
else:
self.__fhook__(self, args, ret)
return ret
def _place_hooker(self):
cls = self.__class__
self.__fhook__ = func
if hasattr(cls, "__hooked__"):
return
cls.__hooked__ = True
origin_call = cls.__call__
def new_call(self, *args, **kw):
ret = origin_call(self, *args, **kw)
if hasattr(self, "__fhook__"):
if len(kw):
self.__fhook__(self, args, ret, kw)
else:
self.__fhook__(self, args, ret)
return ret
self.__class__.__call__ = new_call
cls.__call__, cls.__hooked_call__ = \
cls.__hooked_call__, cls.__call__
def register_forward_hook(self, func):
self.__fhook__ = func
self._place_hooker()
def remove_forward_hook(self):
cls = self.__class__
if hasattr(cls,"__hooked__"):
delattr(cls,"__hooked__")
if hasattr(self,"__fhook__"):
delattr(self,"__fhook__")
def register_pre_forward_hook(self, func):
cls = self.__class__
self.__fhook2__ = func
if hasattr(cls, "__hooked2__"):
return
cls.__hooked2__ = True
origin_call = cls.__call__
def new_call(self, *args, **kw):
if hasattr(self, "__fhook2__"):
if len(kw):
self.__fhook2__(self, args, kw)
else:
self.__fhook2__(self, args)
return origin_call(self, *args, **kw)
self.__class__.__call__ = new_call
self._place_hooker()
def remove_pre_forward_hook(self):
cls = self.__class__
if hasattr(cls,"__hooked2__"):
delattr(cls,"__hooked2__")
if hasattr(self,"__fhook2__"):
delattr(self,"__fhook2__")
def register_input_backward_hook(self, func):
self.__bihook__ = func
self._place_hooker()
def remove_input_backward_hook(self):
if hasattr(self,"__bihook__"):
delattr(self,"__bihook__")
def register_output_backward_hook(self, func):
self.__bohook__ = func
self._place_hooker()
def remove_output_backward_hook(self):
if hasattr(self,"__bohook__"):
delattr(self,"__bohook__")
def register_backward_hook(self, func):
''' hook both input and output on backpropergation of this module.
Arguments of hook are defined as::
hook(module, grad_input:tuple(jt.Var), grad_output:tuple(jt.Var)) -> tuple(jt.Var) or None
`grad_input` is the origin gradients of input of this module, `grad_input` is the gradients of output of this module, return value is used to replace the gradient of input.
'''
_grad_output = None
def bohook(grad_output):
nonlocal _grad_output
_grad_output = grad_output
def bihook(grad_input):
return func(self, grad_input, _grad_output)
self.register_input_backward_hook(bihook)
self.register_output_backward_hook(bohook)
def remove_backward_hook(self):
self.remove_input_backward_hook()
self.remove_output_backward_hook()
def children(self):
cd = []
def callback(parents, k, v, n):
@ -1051,7 +1104,7 @@ can store value for backward computation)::
return grad0 * self.y, grad1 * self.x
a = jt.array(3.0)
b = jt.array(4.0)
func = MyFunc()
func = MyFunc.apply
c,d = func(a, b)
da, db = jt.grad(c+d*3, [a, b])
assert da.data == 4
@ -1074,7 +1127,7 @@ can also be None)::
return grad0 * self.y, None
a = jt.array(3.0)
b = jt.array(4.0)
func = MyFunc()
func = MyFunc.apply
c,d = func(a, b)
d.stop_grad()
da, db = jt.grad(c+d*3, [a, b])
@ -1144,6 +1197,44 @@ can also be None)::
func = cls()
return func(*args, **kw)
class GradHooker(Function):
def __init__(self, hook):
self.hook = hook
def execute(self, *args):
return args
def grad(self, *grad_input):
ret = self.hook(grad_input)
if ret: grad_input = ret
return grad_input
def grad_hooker(args, hook):
hooker = GradHooker(hook)
return hooker(*args)
def register_hook(v, hook):
""" register hook of any jittor Variables, if hook return not None,
the gradient of this variable will be alter, Example::
x = jt.array([0.0, 0.0])
y = x * [1,2]
y.register_hook(lambda g: g*2)
dx = jt.grad(y, x)
print(dx)
# will be [2, 4]
"""
def _hook(grads):
g = hook(grads[0])
if g is not None:
return (g,)
return None
hooker = GradHooker(_hook)
v.swap(hooker(v)[0])
return v
Var.register_hook = register_hook
def make_module(func, exec_n_args=1):
class MakeModule(Module):
@ -1169,9 +1260,11 @@ def dirty_fix_pytorch_runtime_error():
jt.dirty_fix_pytorch_runtime_error()
import torch
'''
import os
os.RTLD_GLOBAL = os.RTLD_GLOBAL | os.RTLD_DEEPBIND
import os, platform
if platform.system() == 'Linux':
os.RTLD_GLOBAL = os.RTLD_GLOBAL | os.RTLD_DEEPBIND
import atexit
@ -1249,11 +1342,15 @@ def get_len(var):
Var.__len__ = get_len
int = int32
Var.int = Var.int32
Var.long = Var.int32
float = float32
Var.float = Var.float32
double = float64
Var.double = Var.float64
def is_var(v):
return isinstance(v, Var)
# __array__ interface is used for np.array(jt_var)
Var.__array__ = Var.numpy
Var.__array_priority__ = 2000
@ -1271,3 +1368,5 @@ from . import numpy2cupy
from .contrib import concat
from .misc import *
from . import sparse
from . import optim
from . import dataset

View File

@ -5,6 +5,7 @@
# file 'LICENSE.txt', which is part of this source code package.
# ***************************************************************
import os, sys, shutil
import platform
from .compiler import *
from jittor_utils import run_cmd, get_version, get_int_version
from jittor_utils.misc import download_url_to_local
@ -23,71 +24,99 @@ def search_file(dirs, name, prefer_version=()):
def install_mkl(root_folder):
# origin url is
# url = "https://github.com/intel/mkl-dnn/releases/download/v1.0.2/mkldnn_lnx_1.0.2_cpu_gomp.tgz"
url = "https://cloud.tsinghua.edu.cn/f/da02bf62b55b4aa3b8ee/?dl=1"
filename = "mkldnn_lnx_1.0.2_cpu_gomp.tgz"
# newest version for oneDNN
# url = "https://github.com/oneapi-src/oneDNN/releases/download/v2.2/dnnl_lnx_2.2.0_cpu_gomp.tgz"
# filename = "dnnl_lnx_2.2.0_cpu_gomp.tgz"
import platform
if platform.system()=="Linux":
if platform.machine()=='x86_64':
filename = "dnnl_lnx_2.2.0_cpu_gomp.tgz"
md5 = "35bbbdf550a9d8ad54db798e372000f6"
elif platform.machine()=='aarch64':
filename = "dnnl_lnx_2.2.0_cpu_gomp_aarch64.tgz"
md5 = "72cf9b0b8fd6c3c786d35a9daaee22b8"
else:
raise RuntimeError(f"platform.machine()=={platform.machine()} not support yet,"
" Please contact us on https://github.com/jittor/jittor ")
else:
raise RuntimeError(f"platform.machine()=={platform.machine()} not support yet,"
" Please contact us on https://github.com/jittor/jittor ")
url = "https://cg.cs.tsinghua.edu.cn/jittor/assets/" + filename
fullname = os.path.join(root_folder, filename)
dirname = os.path.join(root_folder, filename.replace(".tgz",""))
if not os.path.isfile(os.path.join(dirname, "examples", "test")):
if not os.path.isfile(os.path.join(dirname, "lib", "libmkldnn.so")):
LOG.i("Downloading mkl...")
download_url_to_local(url, filename, root_folder, "47187284ede27ad3bd64b5f0e7d5e730")
# newest version for oneDNN
# download_url_to_local(url, filename, root_folder, "35bbbdf550a9d8ad54db798e372000f6")
download_url_to_local(url, filename, root_folder, md5)
import tarfile
with tarfile.open(fullname, "r") as tar:
tar.extractall(root_folder)
assert 0 == os.system(f"cd {dirname}/examples && "
f"{cc_path} -std=c++14 cpu_cnn_inference_f32.cpp -Ofast -lmkldnn -I ../include -L ../lib -o test && LD_LIBRARY_PATH=../lib/ ./test")
# newest version for oneDNN
# assert 0 == os.system(f"cd {dirname}/examples && "
# f"{cc_path} -std=c++14 cnn_inference_f32.cpp -Ofast -lmkldnn -I ../include -L ../lib -o test && LD_LIBRARY_PATH=../lib/ ./test")
f"{cc_path} -std=c++14 cnn_inference_f32.cpp -Ofast -lmkldnn -I ../include -L ../lib -o test && LD_LIBRARY_PATH=../lib/ ./test")
def setup_mkl():
global mkl_ops, use_mkl
use_mkl = os.environ.get("use_mkl", "1")=="1"
mkl_ops = None
if not use_mkl: return
# pytorch mkl is conflict with jittor mkl
# yield error "free: invalide size" or
# "mmap error"
# import pytorch(>1.8) first can fix this problem
# try:
# # jt.dirty_fix_pytorch_runtime_error()
# import torch
# from torch import nn
# except:
# torch = None
mkl_include_path = os.environ.get("mkl_include_path")
mkl_lib_path = os.environ.get("mkl_lib_path")
if mkl_lib_path is None or mkl_include_path is None:
mkl_install_sh = os.path.join(jittor_path, "script", "install_mkl.sh")
LOG.v("setup mkl...")
# mkl_path = os.path.join(cache_path, "mkl")
# mkl_path decouple with cc_path
from pathlib import Path
mkl_path = os.path.join(str(Path.home()), ".cache", "jittor", "mkl")
make_cache_dir(mkl_path)
install_mkl(mkl_path)
mkl_home = ""
for name in os.listdir(mkl_path):
if name.startswith("mkldnn_lnx") and os.path.isdir(os.path.join(mkl_path, name)):
mkl_home = os.path.join(mkl_path, name)
break
assert mkl_home!=""
if platform.system() == 'Linux':
if mkl_lib_path is None or mkl_include_path is None:
mkl_install_sh = os.path.join(jittor_path, "script", "install_mkl.sh")
LOG.v("setup mkl...")
# mkl_path = os.path.join(cache_path, "mkl")
# mkl_path decouple with cc_path
from pathlib import Path
mkl_path = os.path.join(str(Path.home()), ".cache", "jittor", "mkl")
make_cache_dir(mkl_path)
install_mkl(mkl_path)
mkl_home = ""
for name in os.listdir(mkl_path):
if name.startswith("dnnl") and os.path.isdir(os.path.join(mkl_path, name)):
mkl_home = os.path.join(mkl_path, name)
break
assert mkl_home!=""
mkl_include_path = os.path.join(mkl_home, "include")
mkl_lib_path = os.path.join(mkl_home, "lib")
mkl_lib_name = os.path.join(mkl_lib_path, "libmkldnn.so")
assert os.path.isdir(mkl_include_path)
assert os.path.isdir(mkl_lib_path)
assert os.path.isfile(mkl_lib_name)
LOG.v(f"mkl_include_path: {mkl_include_path}")
LOG.v(f"mkl_lib_path: {mkl_lib_path}")
LOG.v(f"mkl_lib_name: {mkl_lib_name}")
# We do not link manualy, link in custom ops
# ctypes.CDLL(mkl_lib_name, dlopen_flags)
mkl_lib_name = os.path.join(mkl_lib_path, "libmkldnn.so")
assert os.path.isdir(mkl_include_path)
assert os.path.isdir(mkl_lib_path)
assert os.path.isfile(mkl_lib_name)
LOG.v(f"mkl_include_path: {mkl_include_path}")
LOG.v(f"mkl_lib_path: {mkl_lib_path}")
LOG.v(f"mkl_lib_name: {mkl_lib_name}")
# We do not link manualy, link in custom ops
# ctypes.CDLL(mkl_lib_name, dlopen_flags)
extra_flags = f" -I'{mkl_include_path}' -L'{mkl_lib_path}' -lmkldnn -Wl,-rpath='{mkl_lib_path}' "
elif platform.system() == 'Darwin':
mkl_lib_paths = [
"/usr/local/lib/libmkldnn.dylib", # x86_64
"/opt/homebrew/lib/libmkldnn.dylib", # arm64
]
if not any([os.path.exists(lib) for lib in mkl_lib_paths]):
raise RuntimeError("Not found onednn, please install it by the command 'brew install onednn'")
extra_flags = f" -lmkldnn "
mkl_op_dir = os.path.join(jittor_path, "extern", "mkl", "ops")
mkl_op_files = [os.path.join(mkl_op_dir, name) for name in os.listdir(mkl_op_dir)]
mkl_ops = compile_custom_ops(mkl_op_files,
extra_flags=f" -I'{mkl_include_path}' -L'{mkl_lib_path}' -lmkldnn -Wl,-rpath='{mkl_lib_path}' ")
mkl_ops = compile_custom_ops(mkl_op_files, extra_flags=extra_flags)
LOG.vv("Get mkl_ops: "+str(dir(mkl_ops)))
@ -154,13 +183,20 @@ def setup_cuda_extern():
line = traceback.format_exc()
LOG.w(f"CUDA found but {lib_name} is not loaded:\n{line}")
if lib_name == "cudnn":
LOG.w(f"""Develop version of CUDNN not found,
msg = """Develop version of CUDNN not found,
please refer to CUDA offical tar file installation:
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar"""
if platform.machine() == "x86_64":
msg += """
or you can let jittor install cuda and cudnn for you:
>>> python3.{sys.version_info.minor} -m jittor_utils.install_cuda""")
>>> python3.{sys.version_info.minor} -m jittor_utils.install_cuda
"""
LOG.w(msg)
def setup_cuda_lib(lib_name, link=True, extra_flags=""):
arch_key = "x86_64"
if platform.machine() != "x86_64":
arch_key = "aarch64"
globals()[lib_name+"_ops"] = None
globals()[lib_name] = None
if not has_cuda: return
@ -172,20 +208,26 @@ def setup_cuda_lib(lib_name, link=True, extra_flags=""):
link_flags = ""
if link:
extra_include_path = os.path.abspath(os.path.join(cuda_include, "..", "targets/x86_64-linux/include"))
extra_lib_path = os.path.abspath(os.path.join(cuda_lib, "..", "targets/x86_64-linux/lib"))
extra_include_path = os.path.abspath(os.path.join(cuda_include, "..", f"targets/{arch_key}-linux/include"))
extra_lib_path = os.path.abspath(os.path.join(cuda_lib, "..", f"targets/{arch_key}-linux/lib"))
cuda_include_name = search_file([cuda_include, extra_include_path, "/usr/include"], lib_name+".h")
# cuda11 prefer cudnn 8
nvcc_version = get_int_version(nvcc_path)
prefer_version = ()
if nvcc_version[0] == 11:
prefer_version = ("8",)
culib_path = search_file([cuda_lib, extra_lib_path, "/usr/lib/x86_64-linux-gnu"], f"lib{lib_name}.so", prefer_version)
culib_path = search_file([cuda_lib, extra_lib_path, f"/usr/lib/{arch_key}-linux-gnu", "/usr/lib"], f"lib{lib_name}.so", prefer_version)
if lib_name == "cublas":
if lib_name == "cublas" and nvcc_version[0] >= 10:
# manual link libcublasLt.so
cublas_lt_lib_path = search_file([cuda_lib, extra_lib_path, "/usr/lib/x86_64-linux-gnu"], f"libcublasLt.so", nvcc_version)
ctypes.CDLL(cublas_lt_lib_path, dlopen_flags)
try:
cublas_lt_lib_path = search_file([cuda_lib, extra_lib_path, f"/usr/lib/{arch_key}-linux-gnu", "/usr/lib"], f"libcublasLt.so", nvcc_version)
ctypes.CDLL(cublas_lt_lib_path, dlopen_flags)
except:
# some aarch64 os, such as uos with FT2000 cpu,
# it's cuda 10 doesn't have libcublasLt.so
pass
if lib_name == "cudnn":
@ -193,7 +235,7 @@ def setup_cuda_lib(lib_name, link=True, extra_flags=""):
if nvcc_version >= (11,0,0):
libs = ["libcudnn_ops_infer.so", "libcudnn_ops_train.so", "libcudnn_cnn_infer.so", "libcudnn_cnn_train.so"]
for l in libs:
ex_cudnn_path = search_file([cuda_lib, extra_lib_path, "/usr/lib/x86_64-linux-gnu"], l, prefer_version)
ex_cudnn_path = search_file([cuda_lib, extra_lib_path, f"/usr/lib/{arch_key}-linux-gnu", "/usr/lib"], l, prefer_version)
ctypes.CDLL(ex_cudnn_path, dlopen_flags)
# dynamic link cuda library
@ -326,7 +368,7 @@ def install_nccl(root_folder):
if len(flags.cuda_archs):
arch_flag = f" -arch=compute_{min(flags.cuda_archs)} "
arch_flag += ''.join(map(lambda x:f' -code=sm_{x} ', flags.cuda_archs))
run_cmd(f"make -j8 src.build CUDA_HOME='{cuda_home}' NVCC_GENCODE='{arch_flag} --cudart=shared -ccbin=\"{cc_path}\" ' ", cwd=dirname)
run_cmd(f"CC=\"{cc_path}\" CXX=\"{cc_path}\" make -j8 src.build CUDA_HOME='{cuda_home}' NVCC_GENCODE='{arch_flag} --cudart=shared ' ", cwd=dirname)
return dirname
def setup_nccl():
@ -453,12 +495,22 @@ def setup_mpi():
if k == "mpi_test": continue
setattr(core.Var, k, warper(mpi_ops.__dict__[k]))
if os.environ.get("FIX_TORCH_ERROR", "0") == "1":
try:
import torch
except:
pass
setup_mpi()
in_mpi = inside_mpi()
rank = mpi.world_rank() if in_mpi else 0
world_size = mpi.world_size() if in_mpi else 1
setup_nccl()
setup_cutt()
setup_mkl()
try:
setup_mkl()
except Exception as e:
LOG.w("MKL install failed, msg:", e)
setup_cuda_extern()

View File

@ -12,7 +12,9 @@ import glob
import inspect
import datetime
import threading
import platform
import ctypes
import platform
from ctypes import cdll
from ctypes.util import find_library
@ -22,6 +24,7 @@ from . import pyjt_compiler
from jittor_utils import lock
from jittor_utils import install_cuda
from jittor import __version__
import hashlib
def find_jittor_path():
return os.path.dirname(__file__)
@ -71,10 +74,11 @@ def compile(compiler, flags, inputs, output, combind_build=False):
output = os.path.join(cache_path, output)
# don't recompile object file in inputs
obj_files = []
ex_obj_files = []
new_inputs = []
for name in inputs:
if name[-1] in 'oa':
obj_files.append(name)
ex_obj_files.append(name)
else:
new_inputs.append(os.path.join(jittor_path, name))
obj_files.append(os.path.join(
@ -104,12 +108,11 @@ def compile(compiler, flags, inputs, output, combind_build=False):
cmd = cmd.replace("-Ofast", "-O2")
cmds.append(cmd)
jit_utils.run_cmds(cmds, cache_path, jittor_path, "Compiling "+base_output)
obj_files += ex_obj_files
cmd = f"\"{compiler}\" {' '.join(obj_files)} {flags} {lto_flags} {link} -o {output}"
return do_compile(cmd)
def gen_jit_tests():
# all_src = run_cmd('find -L src/ | grep "cc$"', jittor_path).splitlines()
# all_src = glob.glob(os.path.join(jittor_path,"src","**","*.cc"), recursive=True)
all_src = glob.glob(jittor_path+"/src/**/*.cc", recursive=True)
jit_declares = []
re_def = re.compile("JIT_TEST\\((.*?)\\)")
@ -159,7 +162,6 @@ def gen_jit_tests():
f.write(jit_src)
def gen_jit_flags():
# all_src = run_cmd('find -L src/ | grep "cc$"', jittor_path).splitlines()
all_src = glob.glob(jittor_path+"/src/**/*.cc", recursive=True)
jit_declares = []
re_def = re.compile("DEFINE_FLAG(_WITH_SETTER)?\\((.*?)\\);", re.DOTALL)
@ -608,7 +610,7 @@ def compile_custom_ops(
filenames,
extra_flags="",
return_module=False,
dlopen_flags=os.RTLD_GLOBAL | os.RTLD_NOW | os.RTLD_DEEPBIND,
dlopen_flags=None,
gen_name_ = ""):
"""Compile custom ops
filenames: path of op source files, filenames must be
@ -618,6 +620,11 @@ def compile_custom_ops(
return_module: return module rather than ops(default: False)
return: compiled ops
"""
if dlopen_flags is None:
dlopen_flags = os.RTLD_GLOBAL | os.RTLD_NOW
if platform.system() == 'Linux':
dlopen_flags |= os.RTLD_DEEPBIND
srcs = {}
headers = {}
builds = []
@ -650,7 +657,7 @@ def compile_custom_ops(
if gen_name_ != "":
gen_name = gen_name_
if len(gen_name) > 100:
gen_name = gen_name[:80] + "___hash" + str(hash(gen_name))
gen_name = gen_name[:80] + "___hash" + hashlib.md5(gen_name.encode()).hexdigest()
includes = sorted(list(set(includes)))
includes = "".join(map(lambda x: f" -I'{x}' ", includes))
@ -716,7 +723,7 @@ def get_full_path_of_executable(name):
def compile_extern():
# compile llvm passes
if cc_type != "clang":
if cc_type != "clang" or platform.system() != 'Linux':
return
global kernel_opt_flags
cache_path_llvm = os.path.join(cache_path, "llvm")
@ -777,7 +784,7 @@ def compile_extern():
LOG.vv(f"Compile extern llvm passes: {str(files)}")
def check_cuda():
if nvcc_path == "":
if not nvcc_path:
return
global cc_flags, has_cuda, core_link_flags, cuda_dir, cuda_lib, cuda_include, cuda_home
cuda_dir = os.path.dirname(get_full_path_of_executable(nvcc_path))
@ -859,11 +866,15 @@ def check_debug_flags():
cc_flags = " "
# os.RTLD_NOW | os.RTLD_GLOBAL cause segfault when import torch first
import_flags = os.RTLD_NOW | os.RTLD_GLOBAL | os.RTLD_DEEPBIND
import_flags = os.RTLD_NOW | os.RTLD_GLOBAL
if platform.system() == 'Linux':
import_flags |= os.RTLD_DEEPBIND
# if cc_type=="icc":
# # weird link problem, icc omp library may conflict and cause segfault
# import_flags = os.RTLD_NOW | os.RTLD_GLOBAL
dlopen_flags = os.RTLD_NOW | os.RTLD_GLOBAL | os.RTLD_DEEPBIND
dlopen_flags = os.RTLD_NOW | os.RTLD_GLOBAL
if platform.system() == 'Linux':
import_flags |= os.RTLD_DEEPBIND
with jit_utils.import_scope(import_flags):
jit_utils.try_import_jit_utils_core()
@ -885,9 +896,21 @@ python_path = sys.executable
ex_python_path = python_path + '.' + str(sys.version_info.minor)
if os.path.isfile(ex_python_path):
python_path = ex_python_path
nvcc_path = env_or_try_find('nvcc_path', 'nvcc') or try_find_exe('/usr/local/cuda/bin/nvcc') or try_find_exe('/usr/bin/nvcc')
# if jtcuda is already installed
nvcc_path = None
if install_cuda.has_installation():
nvcc_path = install_cuda.install_cuda()
if nvcc_path:
nvcc_path = try_find_exe(nvcc_path)
# check system installed cuda
if not nvcc_path:
nvcc_path = env_or_try_find('nvcc_path', 'nvcc') or \
try_find_exe('/usr/local/cuda/bin/nvcc') or \
try_find_exe('/usr/bin/nvcc') or \
try_find_exe('/opt/cuda/bin/nvcc')
# if system has no cuda, install jtcuda
if not nvcc_path:
cuda_driver = install_cuda.get_cuda_driver()
nvcc_path = install_cuda.install_cuda()
if nvcc_path:
nvcc_path = try_find_exe(nvcc_path)
@ -897,14 +920,44 @@ gdb_path = try_find_exe('gdb')
addr2line_path = try_find_exe('addr2line')
has_pybt = check_pybt(gdb_path, python_path)
cc_flags += " -Wall -Werror -Wno-unknown-pragmas -std=c++14 -fPIC -march=native "
def check_clang_latest_supported_cpu():
output = run_cmd('clang --print-supported-cpus')
apple_cpus = [l.strip() for l in output.split('\n') if 'apple-a' in l]
apple_cpus_id = max([int(cpu[7:]) for cpu in apple_cpus])
return f'apple-a{apple_cpus_id}'
# cc_flags += " -Wall -Werror -Wno-unknown-pragmas -std=c++14 -fPIC "
cc_flags += " -Wall -Wno-unknown-pragmas -std=c++14 -fPIC "
# 1. Arch/CPU specific optimization
if platform.machine() == "x86_64":
cc_flags += " -march=native "
elif platform.machine() == 'arm64' and platform.system() == "Darwin":
cc_flags += f" -mcpu={check_clang_latest_supported_cpu()} "
cc_flags += " -fdiagnostics-color=always "
# 2. Non standard include path
if platform.system() == 'Darwin' and platform.machine() == 'arm64':
cc_flags += " -I/opt/homebrew/include "
# 3. User specified flags
if "cc_flags" in os.environ:
cc_flags += os.environ["cc_flags"] + ' '
link_flags = " -lstdc++ -ldl -shared "
if platform.system() == 'Darwin':
# TODO: if not using apple clang, there is no need to add -lomp
link_flags += "-undefined dynamic_lookup -lomp "
if platform.machine() == "arm64":
link_flags += " -L/opt/homebrew/lib "
core_link_flags = ""
opt_flags = ""
kernel_opt_flags = os.environ.get("kernel_flags", "") + opt_flags + " -fopenmp "
kernel_opt_flags = os.environ.get("kernel_flags", "") + opt_flags
if platform.system() == 'Darwin':
# TODO: if not using apple clang, cannot add -Xpreprocessor
kernel_opt_flags = kernel_opt_flags + " -Xpreprocessor -fopenmp "
else:
kernel_opt_flags = kernel_opt_flags + " -fopenmp "
if os.name == 'nt':
link_flags = link_flags.replace('-ldl', '')
py3_link_path = '-L"' + os.path.join(
@ -978,7 +1031,6 @@ if has_cuda:
gen_jit_flags()
gen_jit_tests()
op_headers = glob.glob(jittor_path+"/src/ops/**/*op.h", recursive=True)
# op_headers = run_cmd('find -L src/ops/ | grep "op.h$"', jittor_path).splitlines()
jit_src = gen_jit_op_maker(op_headers)
LOG.vvvv(jit_src)
with open(os.path.join(cache_path, "gen", "jit_op_maker.h"), 'w') as f:
@ -1023,54 +1075,49 @@ for file in jit_utils_core_files:
files.remove(file)
LOG.vv("compile order:", files)
if os.name != 'nt':
# manual Link omp using flags(os.RTLD_NOW | os.RTLD_GLOBAL)
# if cc_type=="icc":
# os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
if platform.system() == 'Linux':
libname = {"clang":"omp", "icc":"iomp5", "g++":"gomp"}[cc_type]
libname = ctypes.util.find_library(libname)
assert libname is not None, "openmp library not found"
ctypes.CDLL(libname, os.RTLD_NOW | os.RTLD_GLOBAL)
# get os release
if platform.machine()=='sw_64':
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
with open("/etc/os-release", "r", encoding='utf8') as f:
s = f.read().splitlines()
os_release = {}
for line in s:
a = line.split('=')
if len(a) != 2: continue
os_release[a[0]] = a[1].replace("\"", "")
os_type = {
"ubuntu": "ubuntu",
"debian": "ubuntu",
"centos": "centos",
"rhel": "ubuntu",
"fedora": "ubuntu",
}
version_file = os.path.join(jittor_path, "version")
if os.path.isfile(version_file) and not os.path.isdir(os.path.join(jittor_path, "src", "__data__")):
with open(version_file, 'r') as f:
version = f.read().strip()
# key = f"{version}-{cc_type}-{'cuda' if has_cuda else 'cpu'}.o"
key = f"{version}-g++-cpu"
os_id = os_release["ID"]
os_key = os_type.get(os_id, "ubuntu")
if "os_key" in os.environ:
os_key = os.environ['os_key']
LOG.i("OS type:", os_id, " OS key:", os_key)
key += '-' + os_key + '.o'
# TODO: open the website
extra_obj = os.path.join(cache_path, key)
url = os.path.join("https://cg.cs.tsinghua.edu.cn/jittor/assets/build/"+key)
jit_utils.download(url, extra_obj)
files.append(extra_obj)
data_gz_path = os.path.join(jittor_path, "utils", "data.gz")
use_data_gz = os.path.isfile(data_gz_path)
if os.environ.get("use_data_gz", "1") == "0":
use_data_gz = False
if use_data_gz:
import gzip
with gzip.open(data_gz_path, 'rb') as f:
data = f.read()
md5 = hashlib.md5(data).hexdigest()
target_md5 = None
data_gz_md5_path = os.path.join(cache_path, "data.md5")
if os.path.isfile(data_gz_md5_path):
with open(data_gz_md5_path, 'r') as f:
target_md5 = f.read()
data_o_path = os.path.join(cache_path, "data.o")
if target_md5 != md5:
data_s_path = os.path.join(cache_path, "data.cc")
with open(data_s_path, "w") as f:
f.write(data.decode("utf8"))
dflags = (cc_flags+opt_flags)\
.replace("-Wall", "") \
.replace("-Werror", "")
run_cmd(f"{cc_path} {dflags} \"-D_P(...)=\" {data_s_path} -c -o {data_o_path}")
os.remove(data_s_path)
with open(data_gz_md5_path, 'w') as f:
f.write(md5)
files.append(data_o_path)
files = [f for f in files if "__data__" not in f]
compile(cc_path, cc_flags+opt_flags, files, 'jittor_core'+extension_suffix)
# TODO: move to compile_extern.py
compile_extern()
# compile_extern()
with jit_utils.import_scope(import_flags):
import jittor_core as core

View File

@ -206,6 +206,23 @@ def setitem(x, slices, value):
jt.Var.__getitem__ = jt.Var.slice_var = getitem
jt.Var.__setitem__ = setitem
def _merge_dtypes(dtypes):
s = -1
e = -1
names = ["bool","uint","int","float"]
dbytes = ["8","16","32","64"]
for d in dtypes:
for name in names:
if d.startswith(name):
s = max(s,names.index(name))
for db in dbytes:
if d.endswith(db):
e = max(e,dbytes.index(db))
assert s>=0 and s<4 and e<4
dtype = names[s]+("" if e ==-1 else dbytes[e])
return dtype
def concat(arr, dim=0):
'''Concat Operator can concat a list of jt Var at a specfic dimension.
@ -226,12 +243,14 @@ Example::
raise ValueError("need at least one array to concat")
total_dim = 0
if dim < 0: dim += len(arr[0].shape)
dtypes = []
for a in arr:
total_dim += a.shape[dim]
dtypes.append(str(a.dtype))
cdim = 0
shape = list(a.shape)
shape[dim] = total_dim
s = jt.empty(shape, a.dtype)
s = jt.empty(shape, dtype = _merge_dtypes(dtypes))
slices = [slice(None)]*len(a.shape)
for a in arr:
if a.shape[dim] == 0:

View File

@ -1,5 +1,6 @@
from .dataset import Dataset, ImageFolder
from .dataset import Dataset, ImageFolder, dataset_root, TensorDataset, VarDataset
from .mnist import MNIST
from .cifar import CIFAR10, CIFAR100
from .voc import VOC
from .sampler import *

View File

@ -0,0 +1,189 @@
import os
from jittor_utils.misc import download_and_extract_archive, check_integrity
from PIL import Image
import sys, pickle
import numpy as np
from jittor.dataset import Dataset, dataset_root
class CIFAR10(Dataset):
"""`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
Args:
root (string): Root directory of dataset where directory
``cifar-10-batches-py`` exists or will be saved to if download is set to True.
train (bool, optional): If True, creates dataset from training set, otherwise
creates from test set.
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, ``transforms.RandomCrop``
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
download (bool, optional): If true, downloads the dataset from the internet and
puts it in root directory. If dataset is already downloaded, it is not
downloaded again.
Example::
from jittor.dataset.cifar import CIFAR10
a = CIFAR10()
a.set_attrs(batch_size=16)
for imgs, labels in a:
print(imgs.shape, labels.shape)
break
"""
base_folder = 'cifar-10-batches-py'
url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
filename = "cifar-10-python.tar.gz"
tgz_md5 = 'c58f30108f718f92721af3b95e74349a'
train_list = [
['data_batch_1', 'c99cafc152244af753f735de768cd75f'],
['data_batch_2', 'd4bba439e000b95fd0a9bffe97cbabec'],
['data_batch_3', '54ebc095f3ab1f0389bbae665268c751'],
['data_batch_4', '634d18415352ddfa80567beed471001a'],
['data_batch_5', '482c414d41f54cd18b22e5b47cb7c3cb'],
]
test_list = [
['test_batch', '40351d587109b95175f43aff81a1287e'],
]
meta = {
'filename': 'batches.meta',
'key': 'label_names',
'md5': '5ff9c542aee3614f3951f8cda6e48888',
}
def __init__(self, root=dataset_root+"/cifar_data/", train=True, transform=None, target_transform=None,
download=True):
super(CIFAR10, self).__init__()
self.root = root
self.transform=transform
self.target_transform=target_transform
self.train = train # training set or test set
if download:
self.download()
if not self._check_integrity():
raise RuntimeError('Dataset not found or corrupted.' +
' You can use download=True to download it')
if self.train:
downloaded_list = self.train_list
else:
downloaded_list = self.test_list
self.data = []
self.targets = []
# now load the picked numpy arrays
for file_name, checksum in downloaded_list:
file_path = os.path.join(self.root, self.base_folder, file_name)
with open(file_path, 'rb') as f:
if sys.version_info[0] == 2:
entry = pickle.load(f)
else:
entry = pickle.load(f, encoding='latin1')
self.data.append(entry['data'])
if 'labels' in entry:
self.targets.extend(entry['labels'])
else:
self.targets.extend(entry['fine_labels'])
self.data = np.vstack(self.data).reshape(-1, 3, 32, 32)
self.data = self.data.transpose((0, 2, 3, 1)) # convert to HWC
self._load_meta()
def _load_meta(self):
path = os.path.join(self.root, self.base_folder, self.meta['filename'])
if not check_integrity(path, self.meta['md5']):
raise RuntimeError('Dataset metadata file not found or corrupted.' +
' You can use download=True to download it')
with open(path, 'rb') as infile:
if sys.version_info[0] == 2:
data = pickle.load(infile)
else:
data = pickle.load(infile, encoding='latin1')
self.classes = data[self.meta['key']]
self.class_to_idx = {_class: i for i, _class in enumerate(self.classes)}
def __getitem__(self, index):
"""
Args:
index (int): Index
Returns:
tuple: (image, target) where target is index of the target class.
"""
img, target = self.data[index], self.targets[index]
# doing this so that it is consistent with all other datasets
# to return a PIL Image
img = Image.fromarray(img)
if self.transform is not None:
img = self.transform(img)
if self.target_transform is not None:
target = self.target_transform(target)
return img, target
def __len__(self):
return len(self.data)
def _check_integrity(self):
root = self.root
for fentry in (self.train_list + self.test_list):
filename, md5 = fentry[0], fentry[1]
fpath = os.path.join(root, self.base_folder, filename)
if not check_integrity(fpath, md5):
return False
return True
def download(self):
if self._check_integrity():
print('Files already downloaded and verified')
return
download_and_extract_archive(self.url, self.root, filename=self.filename, md5=self.tgz_md5)
def extra_repr(self):
return "Split: {}".format("Train" if self.train is True else "Test")
class CIFAR100(CIFAR10):
"""`CIFAR100 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
This is a subclass of the `CIFAR10` Dataset.
Example::
from jittor.dataset.cifar import CIFAR100
a = CIFAR100()
a.set_attrs(batch_size=16)
for imgs, labels in a:
print(imgs.shape, labels.shape)
break
"""
base_folder = 'cifar-100-python'
url = "https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz"
filename = "cifar-100-python.tar.gz"
tgz_md5 = 'eb9058c3a382ffc7106e4002c42a8d85'
train_list = [
['train', '16019d7e3df5f24257cddd939b257f8d'],
]
test_list = [
['test', 'f0ef6b0ae62326f3e7ffdfab6717acfc'],
]
meta = {
'filename': 'meta',
'key': 'fine_label_names',
'md5': '7973b15100ade9c7d40fb424638fde48',
}

View File

@ -26,6 +26,7 @@ dataset_root = os.path.join(pathlib.Path.home(), ".cache", "jittor", "dataset")
mp_log_v = os.environ.get("mp_log_v", 0)
mpi = jt.mpi
img_open_hook = HookTimer(Image, "open")
CHECK_MEMORY = int(os.environ.get("CHECK_MEMORY", "0"))
class Worker:
def __init__(self, target, args, buffer_size, keep_numpy_array=False):
@ -48,6 +49,8 @@ class Dataset(object):
[in] drop_last(bool): if true, the last batch of dataset might smaller than batch_size, default True.
[in] num_workers(int): number of workers for loading data.
[in] buffer_size(int): buffer size for each worker in bytes, default(512MB).
[in] keep_numpy_array(bool): return numpy array rather than jittor array, default(False).
[in] endless(bool): will this dataset yield data forever, default(False).
Example::
@ -70,8 +73,11 @@ class Dataset(object):
num_workers = 0,
buffer_size = 512*1024*1024,
stop_grad = True,
keep_numpy_array = False):
keep_numpy_array = False,
endless = False):
super().__init__()
if os.environ.get("DISABLE_MULTIPROCESSING", '0') == '1':
num_workers = 0
self.total_len = None
self.batch_size = batch_size
self.shuffle = shuffle
@ -80,7 +86,10 @@ class Dataset(object):
self.buffer_size = buffer_size
self.stop_grad = stop_grad
self.keep_numpy_array = keep_numpy_array
self.endless = endless
self.epoch_id = 0
self.sampler = None
self._disable_workers = False
def __getitem__(self, index):
raise NotImplementedError
@ -129,13 +138,16 @@ class Dataset(object):
if self.stop_grad else jt.array(x)
if isinstance(batch, np.ndarray):
return to_jt(batch)
if isinstance(batch, dict):
new_batch = {}
for k,v in batch.items():
new_batch[k] = self.to_jittor(v)
return new_batch
if not isinstance(batch, (list, tuple)):
return batch
new_batch = []
for a in batch:
if isinstance(a, np.ndarray) or \
isinstance(a, int) or \
isinstance(a, float):
if isinstance(a, np.ndarray):
new_batch.append(to_jt(a))
else:
new_batch.append(self.to_jittor(a))
@ -180,15 +192,20 @@ class Dataset(object):
while True:
# get id
with gid_lock:
while gid_obj.value >= self.batch_len or buffer.is_stop():
while buffer.is_stop() or self.idqueue.is_stop() or \
gid_obj.value >= self.batch_len:
self.num_idle.value += 1
self.num_idle_c.notify()
self.gidc.wait()
self.num_idle.value -= 1
cid = gid_obj.value
self.idmap[cid] = worker_id
batch_index_list = self.index_list_numpy[
cid*self.real_batch_size:
min(self.real_len, (cid+1)*self.real_batch_size)
].copy()
gid_obj.value += 1
self.gidc.notify()
with self.idqueue_lock:
self.idqueue.push(worker_id)
now = time.time()
other_time = now - start
start = now
@ -197,8 +214,8 @@ class Dataset(object):
batch = []
if mp_log_v:
print(f"#{worker_id} {os.getpid()} load batch", cid*self.real_batch_size, min(self.real_len, (cid+1)*self.real_batch_size))
for i in range(cid*self.real_batch_size, min(self.real_len, (cid+1)*self.real_batch_size)):
batch.append(self[self.index_list[i]])
for i in batch_index_list:
batch.append(self[i])
batch = self.collate_batch(batch)
now = time.time()
data_time = now - start
@ -276,10 +293,10 @@ Example::
if not hasattr(self, "workers"):
return
msg = [""]
msg.append(f"progress:{self.last_id}/{self.batch_len}")
msg.append(f"progress:{self.batch_id}/{self.batch_len}")
msg.append(f"batch(s): {self.batch_time:.3f}\twait(s):{self.wait_time:.3f}")
msg.append(f"recv(s): {self.recv_time:.3f}\tto_jittor(s):{self.to_jittor_time:.3f}")
msg.append(f"last 10 workers: {self.idmap[max(0, self.last_id-9):self.last_id+1]}")
msg.append(f"last 10 workers: {self.last_ids}")
msg.append(f"ID\twait(s)\topen(s)\tload(s)\tsend(s)\ttotal(s)")
for i in range(self.num_workers):
w = self.workers[i]
@ -291,6 +308,7 @@ Example::
# stop workers
for w in self.workers:
w.buffer.stop()
self.idqueue.stop()
# wait until all workers idle
if self.num_idle.value < self.num_workers:
with self.gid.get_lock():
@ -304,29 +322,34 @@ Example::
# clean workers' buffer
for w in self.workers:
w.buffer.clear()
self.idqueue.clear()
self.gid_obj.value = 0
def _init_workers(self):
def _init_workers(self, index_list):
jt.clean()
jt.gc()
self.index_list = mp.Array('i', self.real_len, lock=False)
workers = []
# batch id to worker id
self.idmap = mp.Array('i', self.batch_len, lock=False)
# get worker id
self.idqueue = jt.RingBuffer(2048)
self.idqueue_lock = mp.Lock()
# global token index
self.gid = mp.Value('i', self.batch_len)
self.gid.value = 0
# global token index condition
self.gidc = mp.Condition(self.gid.get_lock())
# number of idle workers
self.num_idle = mp.Value('i', 0, lock=False)
# number of idle workers condition
self.num_idle_c = mp.Condition(self.gid.get_lock())
self.index_list_numpy = np.ndarray(dtype='int32', shape=self.real_len, buffer=self.index_list)
self.index_list_numpy[:] = index_list
for i in range(self.num_workers):
w = Worker(target=self._worker_main, args=(i,),
buffer_size=self.buffer_size,
keep_numpy_array=self.keep_numpy_array)
workers.append(w)
self.workers = workers
self.index_list_numpy = np.ndarray(dtype='int32', shape=self.real_len, buffer=self.index_list)
def reset(self):
if not hasattr(self, "workers"):
@ -351,8 +374,8 @@ Example::
if self.total_len is None:
self.total_len = len(self)
return self.total_len
def __iter__(self):
def _get_index_list(self):
if self.total_len is None:
self.total_len = len(self)
# maybe rewrite by sampler
@ -381,7 +404,8 @@ Example::
world_size = mpi.world_size()
world_rank = mpi.world_rank()
index_list = np.int32(index_list)
mpi.broadcast(index_list, 0)
# TODO: mpi broadcast in subprocess has bug, fix it
# mpi.broadcast(index_list, 0)
assert self.batch_size >= world_size, \
f"Batch size({self.batch_size}) is smaller than MPI world_size({world_size})"
@ -416,71 +440,104 @@ Example::
else:
self.real_len = self.total_len
self.real_batch_size = self.batch_size
self.batch_len = self.__batch_len__()
return index_list
def _epochs(self):
if self.endless:
while True:
yield
self.epoch_id += 1
else:
yield
def __iter__(self):
if self._disable_workers:
self.num_workers = 0
index_list = self._get_index_list()
if not hasattr(self, "workers") and self.num_workers:
self._init_workers()
self._init_workers(index_list)
self.last_ids = [-1] * 10
if self.num_workers:
self._stop_all_workers()
self.index_list_numpy[:] = index_list
gid_obj = self.gid.get_obj()
gid_lock = self.gid.get_lock()
with gid_lock:
gid_obj.value = 0
self.gidc.notify_all()
start = time.time()
self.batch_time = 0
for i in range(self.batch_len):
# try not get lock first
if gid_obj.value <= i:
with gid_lock:
if gid_obj.value <= i:
if mp_log_v:
print("wait")
self.gidc.wait()
now = time.time()
self.wait_time = now - start
start = now
gid_obj = self.gid.get_obj()
gid_lock = self.gid.get_lock()
self.last_id = i
worker_id = self.idmap[i]
w = self.workers[worker_id]
if mp_log_v:
print(f"#{worker_id} {os.getpid()} recv buffer", w.buffer)
batch = w.buffer.recv()
now = time.time()
self.recv_time = now - start
start = now
for _ in self._epochs():
with gid_lock:
if self.num_idle.value:
self.gidc.notify_all()
if mp_log_v:
print(f"#{worker_id} {os.getpid()} recv", type(batch).__name__, [ type(b).__name__ for b in batch ])
batch = self.to_jittor(batch)
now = time.time()
self.to_jittor_time = now - start
start = now
for i in range(self.batch_len):
if self.num_idle.value:
with gid_lock:
if self.num_idle.value and \
gid_obj.value >= self.batch_len:
index_list = self._get_index_list()
self.index_list_numpy[:] = index_list
gid_obj.value = 0
self.gidc.notify_all()
yield batch
# get which worker has this batch
worker_id = self.idqueue.pop()
now = time.time()
self.batch_time = now - start
start = now
now = time.time()
self.wait_time = now - start
start = now
self.last_ids[i%10] = worker_id
self.batch_id = i
w = self.workers[worker_id]
if mp_log_v:
print(f"#{worker_id} {os.getpid()} recv buffer", w.buffer)
batch = w.buffer.recv()
now = time.time()
self.recv_time = now - start
start = now
if mp_log_v:
print(f"#{worker_id} {os.getpid()} recv", type(batch).__name__, [ type(b).__name__ for b in batch ])
batch = self.to_jittor(batch)
now = time.time()
self.to_jittor_time = now - start
start = now
yield batch
now = time.time()
self.batch_time = now - start
start = now
if CHECK_MEMORY and self.batch_id % CHECK_MEMORY == 0:
jt.display_memory_info()
else:
batch_data = []
for idx in index_list:
batch_data.append(self[int(idx)])
if len(batch_data) == self.real_batch_size:
for _ in self._epochs():
self.batch_id = 0
batch_data = []
for idx in index_list:
batch_data.append(self[int(idx)])
if len(batch_data) == self.real_batch_size:
batch_data = self.collate_batch(batch_data)
tmp = batch_data
batch_data = self.to_jittor(batch_data)
# breakpoint()
yield batch_data
self.batch_id += 1
if CHECK_MEMORY and self.batch_id % CHECK_MEMORY == 0:
jt.display_memory_info()
batch_data = []
# depend on drop_last
if not self.drop_last and len(batch_data) > 0:
batch_data = self.collate_batch(batch_data)
batch_data = self.to_jittor(batch_data)
self.batch_id += 1
yield batch_data
batch_data = []
# depend on drop_last
if not self.drop_last and len(batch_data) > 0:
batch_data = self.collate_batch(batch_data)
batch_data = self.to_jittor(batch_data)
yield batch_data
class ImageFolder(Dataset):
@ -537,3 +594,47 @@ class ImageFolder(Dataset):
if self.transform:
img = self.transform(img)
return img, self.imgs[k][1]
class VarDataset(Dataset):
""" Dataset using Var directly, TensorDataset is alias of VarDataset, Example::
import jittor as jt
from jittor.dataset import VarDataset
x = jt.array([1,2,3])
y = jt.array([4,5,6])
z = jt.array([7,8,9])
dataset = VarDataset(x, y, z)
dataset.set_attrs(batch_size=1)
for a,b,c in dataset:
print(a,b,c)
# will print
# 1,4,7
# 2,5,8
# 3,6,9
"""
def __init__(self, *args):
super().__init__()
self.args = args
self._disable_workers = True
assert len(args), "At lease one args"
l = len(args[0])
for a in args:
assert l == len(a), "Len should be the same"
self.set_attrs(total_len=l)
def __getitem__(self, idx):
return [ a[idx] for a in self.args ]
def collate_batch(self, batch):
b = collate_batch(batch)
for i in range(len(self.args)):
x = b[i]
if jt.is_var(self.args[i]) and self.args[i].ndim == 1:
x.assign(x.squeeze(-1))
return b
TensorDataset = VarDataset

View File

@ -27,8 +27,7 @@ def collate_batch(batch):
elem = batch[0]
elem_type = type(elem)
if isinstance(elem, jt.Var):
# TODO: use jittor
temp_data = np.stack([data.data for data in batch], 0)
temp_data = jt.stack([data for data in batch], 0)
return temp_data
if elem_type is np.ndarray:
temp_data = np.stack([data for data in batch], 0)

View File

@ -29,18 +29,7 @@ kernel(in0->num/in0->shape[in0->shape.size()-1], 0, in0_p, out0_p, in0->shape[in
class OneHotCategorical:
def __init__(self, probs=None, logits=None):
assert not (probs is None and logits is None)
if probs is None:
# cannot align to pytorch
probs = jt.sigmoid(logits)
elif logits is None:
logits = jt.log(probs)
with jt.no_grad():
self.probs = probs / probs.sum(-1, True)
self.cum_probs = simple_presum(self.probs)
self.cum_probs_l = self.cum_probs[..., :-1]
self.cum_probs_r = self.cum_probs[..., 1:]
self.logits = logits
Categorical.__init__(self, probs, logits)
def sample(self, sample_shape=[]):
shape = sample_shape + self.probs.shape[:-1] + (1,)
@ -48,17 +37,12 @@ class OneHotCategorical:
one_hot = jt.logical_and(self.cum_probs_l < rand, rand <= self.cum_probs_r).float()
return one_hot
def log_prob(self,x):
if len(x.shape) == 1:
x = x.unsqueeze(0)
logits = self.logits.broadcast(x.shape)
indices = jt.argmax(x, dim=-1)[0]
return logits.gather(1, indices.unsqueeze(-1)).reshape(-1)
def log_prob(self, x):
x = jt.argmax(x, dim=-1)[0]
return Categorical.log_prob(self, x)
def entropy(self):
min_real = -(math.pow(2,23)-1) / math.pow(2,22) * math.pow(2,127)
logits = jt.clamp(self.logits,min_v=min_real)
p_log_p = logits * self.probs
p_log_p = self.logits * self.probs
return -p_log_p.sum(-1)
@ -68,29 +52,32 @@ class Categorical:
if probs is None:
# cannot align to pytorch
probs = jt.sigmoid(logits)
elif logits is None:
logits = jt.log(probs)
probs = probs / probs.sum(-1, True)
if logits is None:
logits = jt.safe_log(probs)
with jt.no_grad():
self.probs = probs / probs.sum(-1, True)
self.probs = probs
self.logits = logits
self.cum_probs = simple_presum(probs)
self.cum_probs = simple_presum(self.probs)
self.cum_probs_l = self.cum_probs[..., :-1]
self.cum_probs_r = self.cum_probs[..., 1:]
def sample(self, sample_shape=[]):
def sample(self, sample_shape=()):
shape = sample_shape + self.probs.shape[:-1] + (1,)
rand = jt.rand(shape)
one_hot = jt.logical_and(self.cum_probs_l < rand, rand <= self.cum_probs_r)
index = one_hot.index(one_hot.ndim-1)
index = one_hot.index(one_hot.ndim - 1)
return (one_hot * index).sum(-1)
def log_prob(self, x):
return jt.log(self.probs)[0,x]
a = self.probs.ndim
b = x.ndim
indexes = tuple( f'i{i}' for i in range(b-a+1, b) )
indexes = indexes + (x,)
return jt.safe_log(self.probs).getitem(indexes)
def entropy(self):
min_real = -(math.pow(2,23)-1) / math.pow(2,22) * math.pow(2,127)
logits = jt.clamp(self.logits,min_v=min_real)
p_log_p = logits * self.probs
p_log_p = self.logits * self.probs
return -p_log_p.sum(-1)
@ -104,11 +91,11 @@ class Normal:
def log_prob(self, x):
var = self.sigma**2
log_scale = jt.log(self.sigma)
log_scale = jt.safe_log(self.sigma)
return -((x-self.mu)**2) / (2*var) - log_scale-np.log(np.sqrt(2*np.pi))
def entropy(self):
return 0.5+0.5*np.log(2*np.pi)+jt.log(self.sigma)
return 0.5+0.5*np.log(2*np.pi)+jt.safe_log(self.sigma)
class Uniform:
@ -123,10 +110,10 @@ class Uniform:
def log_prob(self,x):
if x < self.low or x >= self.high:
return math.inf
return -jt.log(self.high - self.low)
return -jt.safe_log(self.high - self.low)
def entropy(self):
return jt.log(self.high - self.low)
return jt.safe_log(self.high - self.low)
class Geometric:
@ -138,15 +125,14 @@ class Geometric:
self.logits = logits
elif logits is None:
self.prob = p
self.logits = -jt.log(1. / p - 1)
self.logits = -jt.safe_log(1. / p - 1)
def sample(self, sample_shape):
tiny = jt.info(self.probs.dtype).tiny
u = jt.clamp(jt.rand(sample_shape),min_v=tiny)
return (jt.log(u) / (jt.log(-self.probs+1))).floor()
u = jt.rand(sample_shape)
return (jt.safe_log(u) / (jt.safe_log(-self.probs+1))).floor_int()
def log_prob(self, x):
return x*jt.log(-self.prob+1)+jt.log(self.prob)
return x*jt.safe_log(-self.prob+1)+jt.safe_log(self.prob)
def entropy(self):
return binary_cross_entropy_with_logits(jt.array(self.logits),jt.array(self.prob)) / self.prob
@ -157,16 +143,14 @@ def kl_divergence(cur_dist, old_dist):
if isinstance(cur_dist, Normal):
vr = (cur_dist.sigma / old_dist.sigma)**2
t1 = ((cur_dist.mu - old_dist.mu) / old_dist.sigma)**2
return 0.5*(vr+t1-1-jt.log(vr))
return 0.5*(vr+t1-1-jt.safe_log(vr))
if isinstance(cur_dist, Categorical) or isinstance(cur_dist,OneHotCategorical):
t = cur_dist.probs * (cur_dist.logits-old_dist.logits)
t[jt.array((old_dist.probs == 0))] = math.inf
t[jt.array((cur_dist.probs == 0))] = 0
return t.sum(-1)
if isinstance(cur_dist, Uniform):
res = jt.log((old_dist.high - old_dist.low) / (cur_dist.high - cur_dist.low))
res = jt.safe_log((old_dist.high - old_dist.low) / (cur_dist.high - cur_dist.low))
if old_dist.low > cur_dist.low or old_dist.high < cur_dist.high:
res = math.inf
return res
if isinstance(cur_dist, Geometric):
return -cur_dist.entropy() - jt.log(-old_dist.prob+1) / cur_dist.prob - old_dist.logits
return -cur_dist.entropy() - jt.safe_log(-old_dist.prob+1) / cur_dist.prob - old_dist.logits

View File

@ -11,6 +11,7 @@
#include <cuda_runtime.h>
#include <cublas_v2.h>
#include "utils/log.h"
#include "helper_cuda.h"
#include "fp16_emu.h"
#include "common.h"

View File

@ -58,6 +58,7 @@
// CUDA and CUBLAS functions
#include <helper_functions.h>
#include "utils/log.h"
#include "helper_cuda.h"
#ifndef min

View File

@ -8,6 +8,7 @@
#include <cuda_runtime.h>
#include <cudnn.h>
#include "utils/log.h"
#include "helper_cuda.h"
#include "fp16_emu.h"
#include "common.h"

View File

@ -0,0 +1,288 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers:
// Dun Liang <randonlang@gmail.com>
// Guowei Yang <471184555@qq.com>
//
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#include "mem/allocator.h"
#include "var.h"
#include "cudnn_conv3d_backward_w_op.h"
#include "cudnn_warper.h"
#include "executor.h"
#include "ops/op_register.h"
using namespace std;
namespace jittor {
#pragma GCC diagnostic ignored "-Wunused-variable"
#ifndef JIT
CudnnConv3dBackwardWOp::CudnnConv3dBackwardWOp(Var* x, Var* dy, int kd, int kh, int kw, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups, string xformat)
: x(x), dy(dy), kd(kd), kh(kh), kw(kw), strided(strided), strideh(strideh), stridew(stridew), paddingd(paddingd), paddingh(paddingh), paddingw(paddingw), dilationd(dilationd), dilationh(dilationh), dilationw(dilationw), groups(groups),
xformat(move(xformat)) {
flags.set(NodeFlags::_cuda, 1);
flags.set(NodeFlags::_cpu, 0);
dw = create_output(nullptr, dtype_infer(dy->ns, x->ns));
}
void CudnnConv3dBackwardWOp::infer_shape() {
ASSERTop(x->shape.size(),==,5);
ASSERTop(dy->shape.size(),==,5);
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
if (xformat == "ncdhw") {
x->shape.unpack(xn, xc, xd, xh, xw);
dy->shape.unpack(yn, yc, yd, yh, yw);
} else {
x->shape.unpack(xn, xd, xh, xw, xc);
dy->shape.unpack(yn, yd, yh, yw, yc);
}
wco = yc, wci = xc / groups;
wh = kh;
ww = kw;
wd = kd;
dw->set_shape(NanoVector(wco, wci, wd, wh, ww));
}
void CudnnConv3dBackwardWOp::jit_prepare(JK& jk) {
jk << _CS("[Tx:") << x->dtype();
jk << _CS("][Ty:") << dy->dtype();
jk << _CS("][Tw:") << dw->dtype();
jk << ']';
}
static auto make_conv3d = get_op_info("cudnn_conv3d")
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, string>();
static auto make_backwardx = get_op_info("cudnn_conv3d_backward_x")
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, int, int, int, string>();
VarPtr CudnnConv3dBackwardWOp::grad(Var* out, Var* dout, Var* v, int v_index) {
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
if (xformat == "ncdhw") {
x->shape.unpack(xn, xc, xd, xh, xw);
dy->shape.unpack(yn, yc, yd, yh, yw);
} else {
x->shape.unpack(xn, xd, xh, xw, xc);
dy->shape.unpack(yn, yd, yh, yw, yc);
}
if (v_index == 0) {
return make_backwardx(dout, dy, xd, xh, xw, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
} else {
return make_conv3d(x, dout, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
}
}
// unordered_map<string, cudnnConvolutionBwdFilterAlgo_t> bwdw_algo_cache;
#else // JIT
#ifdef JIT_cuda
#pragma clang diagnostic ignored "-Wtautological-compare"
extern unordered_map<string, cudnnConvolutionBwdFilterAlgo_t> bwdw_algo_cache;
template <typename T_ELEM> __inline__ cudnnDataType_t getDataType();
template <> __inline__ cudnnDataType_t getDataType<half1>() { return CUDNN_DATA_HALF; }
template <> __inline__ cudnnDataType_t getDataType<float>() { return CUDNN_DATA_FLOAT; }
void CudnnConv3dBackwardWOp::jit_run() {
auto w = dw;
auto y = dy;
cudnnHandle_t& handle_ = cudnn_handle;
cudnnTensorDescriptor_t cudnnIdesc;
cudnnFilterDescriptor_t cudnnFdesc;
cudnnTensorDescriptor_t cudnnOdesc;
cudnnConvolutionDescriptor_t cudnnConvDesc;
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnIdesc ));
checkCudaErrors(cudnnCreateFilterDescriptor( &cudnnFdesc ));
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnOdesc ));
checkCudaErrors(cudnnCreateConvolutionDescriptor( &cudnnConvDesc ));
checkCudaErrors(cudnnSetConvolutionGroupCount( cudnnConvDesc, groups ));
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
int sx[] = {0,0,0,0,1};
for (int i=3; i>=0; i--) sx[i] = sx[i+1] * x->shape[i+1];
int strideX[5];
if (xformat == "ncdhw") {
x->shape.unpack(xn, xc, xd, xh, xw);
int tmp[5] = {sx[0],sx[1],sx[2],sx[3],sx[4]};
memcpy(strideX, tmp, sizeof(tmp));
} else {
x->shape.unpack(xn, xd, xh, xw, xc);
int tmp[5] = {sx[0],sx[2],sx[3],sx[4],sx[1]};
memcpy(strideX, tmp, sizeof(tmp));
}
int dimX[] = {xn, xc, xd, xh, xw};
// dimX: ncdhw
checkCudaErrors(cudnnSetTensorNdDescriptor(
cudnnIdesc, getDataType<Tx>(),
5, dimX, strideX
));
auto ws = w->shape;
int dimW[] = {(int)ws[0],(int)ws[1],(int)ws[2],(int)ws[3],(int)ws[4]};
// cudnn only support this two format
// https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnSetFilterNdDescriptor
#define filterFormat_oihw CUDNN_TENSOR_NCHW
#define filterFormat_ohwi CUDNN_TENSOR_NHWC
// dimW: KCRS(oihw)
checkCudaErrors(cudnnSetFilterNdDescriptor(
cudnnFdesc, getDataType<Tw>(),
// filterFormat_@WFORMAT, 5, dimW
filterFormat_oihw, 5, dimW
));
int padA[] = {paddingd, paddingh, paddingw};
int convstrideA[] = {strided, strideh, stridew};
int dilationA[] = {dilationd, dilationh, dilationw};
// difference between
// CUDNN_CONVOLUTION and CUDNN_CROSS_CORRELATION
// is the kernel rc order
// currently, No perf difference is observed between
// this two mode
checkCudaErrors(cudnnSetConvolutionNdDescriptor(
cudnnConvDesc, 3,
padA, convstrideA, dilationA,
CUDNN_CROSS_CORRELATION, getDataType<Ty>()
));
// using tensor core
// checkCudaErrors( cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );
int sy[] = {0,0,0,0,1};
for (int i=3; i>=0; i--) sy[i] = sy[i+1] * y->shape[i+1];
int strideY[5];
if (xformat == "ncdhw") {
y->shape.unpack(yn, yc, yd, yh, yw);
int tmp[5] = {sy[0],sy[1],sy[2],sy[3],sy[4]};
memcpy(strideY, tmp, sizeof(tmp));
} else {
y->shape.unpack(yn, yd, yh, yw, yc);
int tmp[5] = {sy[0],sy[2],sy[3],sy[4],sy[1]};
memcpy(strideY, tmp, sizeof(tmp));
}
int dimY[] = {yn, yc, yd, yh, yw};
checkCudaErrors(cudnnSetTensorNdDescriptor(
cudnnOdesc, getDataType<Ty>(),
5, dimY, strideY
));
cudnnConvolutionBwdFilterAlgo_t algos[] = {
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0,
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1,
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT,
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3,
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_WINOGRAD_NONFUSED,
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING,
};
int num_algos = CUDNN_CONVOLUTION_BWD_FILTER_ALGO_COUNT;
int perf_count;
cudnnConvolutionBwdFilterAlgoPerf_t perf_results[num_algos];
cudnnConvolutionBwdFilterAlgo_t algo;
bool benchmark=true;
jk.clear();
jk << dimX[0] << "," << dimX[1] << "," << dimX[2] << "," << dimX[3] << "," << dimX[4] << ",";
jk << dimW[0] << "," << dimW[1] << "," << dimW[2] << "," << dimW[3] << "," << dimW[4] << ",";
jk << paddingd << paddingh << paddingw << "," << strided << strideh <<stridew << "," << dilationd << dilationh << dilationw << "," << groups << ".";
auto iter = bwdw_algo_cache.find(jk.to_string());
if (iter!=bwdw_algo_cache.end()) algo = iter->second;
else {
if (bwdw_algo_cache.size()>=max_cache_size) benchmark = false;
if (benchmark) {
size_t max_ws_size = 0;
for (int i = 0; i < num_algos; i++) {
size_t sz;
cudnnStatus_t ret = cudnnGetConvolutionBackwardFilterWorkspaceSize(handle_, cudnnIdesc, cudnnOdesc, cudnnConvDesc, cudnnFdesc, algos[i], &sz);
// continue if use too much workspace
if (sz > mem_info.total_cuda_ram * max_workspace_ratio) continue;
if (CUDNN_STATUS_SUCCESS == ret && sz > max_ws_size) max_ws_size = sz;
}
size_t allocation;
void* ws = exe.temp_allocator->alloc(max_ws_size, allocation);
checkCudaErrors(cudnnFindConvolutionBackwardFilterAlgorithmEx(
handle_,
cudnnIdesc, x->ptr<Tx>(),
cudnnOdesc, y->ptr<Ty>(),
cudnnConvDesc,
cudnnFdesc, w->ptr<Tw>(),
num_algos,
&perf_count,
perf_results,
ws,
max_ws_size));
exe.temp_allocator->free(ws, max_ws_size, allocation);
} else {
checkCudaErrors(cudnnGetConvolutionBackwardFilterAlgorithm_v7(
handle_,
cudnnIdesc,
cudnnOdesc,
cudnnConvDesc,
cudnnFdesc,
num_algos,
&perf_count,
perf_results));
}
int best_algo_idx=-1;
for (int i = 0; i < perf_count; i++)
if (perf_results[i].status == CUDNN_STATUS_SUCCESS){
best_algo_idx=i;
break;
}
ASSERT(best_algo_idx!=-1);
algo=perf_results[best_algo_idx].algo;
if (benchmark) {
bwdw_algo_cache[jk.to_string()] = algo;
if (bwdw_algo_cache.size()==max_cache_size)
LOGw << "backward w algorithm cache is full";
}
}
// TODO: warp work space
void *workSpace = 0;
size_t workSpaceSize;
checkCudaErrors (cudnnGetConvolutionBackwardFilterWorkspaceSize(
handle_, cudnnIdesc, cudnnOdesc, cudnnConvDesc,
cudnnFdesc, algo, &workSpaceSize));
size_t allocation;
if (workSpaceSize > 0) {
workSpace = exe.temp_allocator->alloc(workSpaceSize, allocation);
}
float alpha=1, beta=0;
checkCudaErrors(cudnnConvolutionBackwardFilter(
handle_,
(void*)(&alpha),
cudnnIdesc, x->ptr<Tx>(),
cudnnOdesc, y->ptr<Ty>(),
cudnnConvDesc,
algo,
workSpace, workSpaceSize,
(void*)(&beta),
cudnnFdesc, w->ptr<Tw>())
);
if (workSpace)
exe.temp_allocator->free(workSpace, workSpaceSize, allocation);
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnIdesc ));
checkCudaErrors(cudnnDestroyFilterDescriptor( cudnnFdesc ));
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnOdesc ));
checkCudaErrors(cudnnDestroyConvolutionDescriptor( cudnnConvDesc ));
}
#endif
#endif // JIT
} // jittor

View File

@ -0,0 +1,28 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers:
// Dun Liang <randonlang@gmail.com>
// Guowei Yang <471184555@qq.com>
//
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#pragma once
#include "op.h"
namespace jittor {
struct CudnnConv3dBackwardWOp : Op {
Var* x, * dy, * dw;
int kd, kh, kw, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups;
string xformat;
CudnnConv3dBackwardWOp(Var* x, Var* y, int kd, int kh, int kw, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups=1, string xformat="ncdhw");
const char* name() const override { return "cudnn_conv3d_backward_w"; }
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
void infer_shape() override;
DECLARE_jit_run;
};
} // jittor

View File

@ -0,0 +1,279 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers:
// Dun Liang <randonlang@gmail.com>
// Guowei Yang <471184555@qq.com>
//
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#include "mem/allocator.h"
#include "var.h"
#include "cudnn_conv3d_backward_x_op.h"
#include "cudnn_warper.h"
#include "executor.h"
#include "ops/op_register.h"
using namespace std;
namespace jittor {
#pragma GCC diagnostic ignored "-Wunused-variable"
#ifndef JIT
CudnnConv3dBackwardXOp::CudnnConv3dBackwardXOp(Var* w, Var* dy, int depth, int height, int width, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups, string xformat)
: w(w), dy(dy), xd(depth), xh(height), xw(width), strided(strided), strideh(strideh), stridew(stridew), paddingd(paddingd), paddingh(paddingh), paddingw(paddingw), dilationd(dilationd), dilationh(dilationh), dilationw(dilationw), groups(groups),
xformat(move(xformat)) {
flags.set(NodeFlags::_cuda, 1);
flags.set(NodeFlags::_cpu, 0);
dx = create_output(nullptr, dtype_infer(dy->ns, w->ns));
}
void CudnnConv3dBackwardXOp::infer_shape() {
ASSERTop(w->shape.size(),==,5);
ASSERTop(dy->shape.size(),==,5);
int xn, xc, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
w->shape.unpack(wco, wci, wd, wh, ww);
if (xformat == "ncdhw")
dy->shape.unpack(yn, yc, yd, yh, yw);
else
dy->shape.unpack(yn, yd, yh, yw, yc);
xn = yn, xc = wci * groups;
if (xformat == "ncdhw")
dx->set_shape(NanoVector(xn, xc, xd, xh, xw));
else
dx->set_shape(NanoVector(xn, xd, xh, xw, xc));
}
void CudnnConv3dBackwardXOp::jit_prepare(JK& jk) {
jk << _CS("[Tx:") << dx->dtype();
jk << _CS("][Ty:") << dy->dtype();
jk << _CS("][Tw:") << w->dtype();
jk << ']';
}
static auto make_conv3d = get_op_info("cudnn_conv3d")
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, string>();
static auto make_backwardw = get_op_info("cudnn_conv3d_backward_w")
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, int, int, int, string>();
VarPtr CudnnConv3dBackwardXOp::grad(Var* out, Var* dout, Var* v, int v_index) {
int xn, xc, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
w->shape.unpack(wco, wci, wd, wh, ww);
if (v_index == 0) {
return make_backwardw(dout, dy, wd, wh, ww, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
} else {
return make_conv3d(dout, w, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
}
}
// unordered_map<string, cudnnConvolutionBwdDataAlgo_t> bwdx_algo_cache;
#else // JIT
#ifdef JIT_cuda
#pragma clang diagnostic ignored "-Wtautological-compare"
extern unordered_map<string, cudnnConvolutionBwdDataAlgo_t> bwdx_algo_cache;
template <typename T_ELEM> __inline__ cudnnDataType_t getDataType();
template <> __inline__ cudnnDataType_t getDataType<half1>() { return CUDNN_DATA_HALF; }
template <> __inline__ cudnnDataType_t getDataType<float>() { return CUDNN_DATA_FLOAT; }
void CudnnConv3dBackwardXOp::jit_run() {
auto x = dx;
auto y = dy;
cudnnHandle_t& handle_ = cudnn_handle;
cudnnTensorDescriptor_t cudnnIdesc;
cudnnFilterDescriptor_t cudnnFdesc;
cudnnTensorDescriptor_t cudnnOdesc;
cudnnConvolutionDescriptor_t cudnnConvDesc;
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnIdesc ));
checkCudaErrors(cudnnCreateFilterDescriptor( &cudnnFdesc ));
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnOdesc ));
checkCudaErrors(cudnnCreateConvolutionDescriptor( &cudnnConvDesc ));
checkCudaErrors(cudnnSetConvolutionGroupCount( cudnnConvDesc, groups ));
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
int sx[] = {0,0,0,0,1};
for (int i=3; i>=0; i--) sx[i] = sx[i+1] * x->shape[i+1];
int strideX[5];
if (xformat == "ncdhw") {
x->shape.unpack(xn, xc, xd, xh, xw);
int tmp[5] = {sx[0],sx[1],sx[2],sx[3],sx[4]};
memcpy(strideX, tmp, sizeof(tmp));
} else {
x->shape.unpack(xn, xd, xh, xw, xc);
int tmp[5] = {sx[0],sx[2],sx[3],sx[4],sx[1]};
memcpy(strideX, tmp, sizeof(tmp));
}
int dimX[] = {xn, xc, xd, xh, xw};
// dimX: ncdhw
checkCudaErrors(cudnnSetTensorNdDescriptor(
cudnnIdesc, getDataType<Tx>(),
5, dimX, strideX
));
auto ws = w->shape;
int dimW[] = {(int)ws[0],(int)ws[1],(int)ws[2],(int)ws[3],(int)ws[4]};
// cudnn only support this two format
// https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnSetFilterNdDescriptor
#define filterFormat_oihw CUDNN_TENSOR_NCHW
#define filterFormat_ohwi CUDNN_TENSOR_NHWC
// dimW: KCRS(oihw)
checkCudaErrors(cudnnSetFilterNdDescriptor(
cudnnFdesc, getDataType<Tw>(),
// filterFormat_@WFORMAT, 5, dimW
filterFormat_oihw, 5, dimW
));
int padA[] = {paddingd, paddingh, paddingw};
int convstrideA[] = {strided, strideh, stridew};
int dilationA[] = {dilationd, dilationh, dilationw};
// difference between
// CUDNN_CONVOLUTION and CUDNN_CROSS_CORRELATION
// is the kernel rc order
// currently, No perf difference is observed between
// this two mode
checkCudaErrors(cudnnSetConvolutionNdDescriptor(
cudnnConvDesc, 3,
padA, convstrideA, dilationA,
CUDNN_CROSS_CORRELATION, getDataType<Ty>()
));
// using tensor core
// checkCudaErrors( cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );
int sy[] = {0,0,0,0,1};
for (int i=3; i>=0; i--) sy[i] = sy[i+1] * y->shape[i+1];
int strideY[5];
if (xformat == "ncdhw") {
y->shape.unpack(yn, yc, yd, yh, yw);
int tmp[5] = {sy[0],sy[1],sy[2],sy[3],sy[4]};
memcpy(strideY, tmp, sizeof(tmp));
} else {
y->shape.unpack(yn, yd, yh, yw, yc);
int tmp[5] = {sy[0],sy[2],sy[3],sy[4],sy[1]};
memcpy(strideY, tmp, sizeof(tmp));
}
int dimY[] = {yn, yc, yd, yh, yw};
checkCudaErrors(cudnnSetTensorNdDescriptor(
cudnnOdesc, getDataType<Ty>(),
5, dimY, strideY
));
cudnnConvolutionBwdDataAlgo_t algos[] = {
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0,
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1,
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT,
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING,
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD,
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD_NONFUSED
};
int num_algos = CUDNN_CONVOLUTION_BWD_DATA_ALGO_COUNT;
int perf_count;
cudnnConvolutionBwdDataAlgoPerf_t perf_results[num_algos];
cudnnConvolutionBwdDataAlgo_t algo;
bool benchmark=true;
jk.clear();
jk << dimX[0] << "," << dimX[1] << "," << dimX[2] << "," << dimX[3] << "," << dimX[4] << ",";
jk << dimW[0] << "," << dimW[1] << "," << dimW[2] << "," << dimW[3] << "," << dimW[4] << ",";
jk << paddingd << paddingh << paddingw << "," << strided << strideh <<stridew << "," << dilationd << dilationh << dilationw << "," << groups << ".";
auto iter = bwdx_algo_cache.find(jk.to_string());
if (iter!=bwdx_algo_cache.end()) algo = iter->second;
else {
if (bwdx_algo_cache.size()>=max_cache_size) benchmark = false;
if (benchmark) {
size_t max_ws_size = 0;
for (int i = 0; i < num_algos; i++) {
size_t sz;
cudnnStatus_t ret = cudnnGetConvolutionBackwardDataWorkspaceSize(handle_, cudnnFdesc, cudnnOdesc, cudnnConvDesc, cudnnIdesc, algos[i], &sz);
// continue if use too much workspace
if (sz > mem_info.total_cuda_ram * max_workspace_ratio) continue;
if (CUDNN_STATUS_SUCCESS == ret && sz > max_ws_size) max_ws_size = sz;
}
size_t allocation;
void* ws = exe.temp_allocator->alloc(max_ws_size, allocation);
checkCudaErrors(cudnnFindConvolutionBackwardDataAlgorithmEx(
handle_,
cudnnFdesc, w->ptr<Tw>(),
cudnnOdesc, y->ptr<Ty>(),
cudnnConvDesc,
cudnnIdesc, x->ptr<Tx>(),
num_algos,
&perf_count,
perf_results,
ws,
max_ws_size));
exe.temp_allocator->free(ws, max_ws_size, allocation);
} else {
checkCudaErrors(cudnnGetConvolutionBackwardDataAlgorithm_v7(
handle_,
cudnnFdesc,
cudnnOdesc,
cudnnConvDesc,
cudnnIdesc,
num_algos,
&perf_count,
perf_results));
}
int best_algo_idx=-1;
for (int i = 0; i < perf_count; i++)
if (perf_results[i].status == CUDNN_STATUS_SUCCESS){
best_algo_idx=i;
break;
}
ASSERT(best_algo_idx!=-1);
algo=perf_results[best_algo_idx].algo;
if (benchmark) {
bwdx_algo_cache[jk.to_string()] = algo;
if (bwdx_algo_cache.size()==max_cache_size)
LOGw << "backward x algorithm cache is full";
}
}
// TODO: warp work space
void *workSpace = 0;
size_t workSpaceSize;
checkCudaErrors (cudnnGetConvolutionBackwardDataWorkspaceSize(
handle_, cudnnFdesc, cudnnOdesc, cudnnConvDesc,
cudnnIdesc, algo, &workSpaceSize));
size_t allocation;
if (workSpaceSize > 0) {
workSpace = exe.temp_allocator->alloc(workSpaceSize, allocation);
}
float alpha=1, beta=0;
checkCudaErrors(cudnnConvolutionBackwardData(
handle_,
(void*)(&alpha),
cudnnFdesc, w->ptr<Tw>(),
cudnnOdesc, y->ptr<Ty>(),
cudnnConvDesc,
algo,
workSpace, workSpaceSize,
(void*)(&beta),
cudnnIdesc, x->ptr<Tx>())
);
if (workSpace)
exe.temp_allocator->free(workSpace, workSpaceSize, allocation);
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnIdesc ));
checkCudaErrors(cudnnDestroyFilterDescriptor( cudnnFdesc ));
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnOdesc ));
checkCudaErrors(cudnnDestroyConvolutionDescriptor( cudnnConvDesc ));
}
#endif
#endif // JIT
} // jittor

View File

@ -0,0 +1,28 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers:
// Dun Liang <randonlang@gmail.com>
// Guowei Yang <471184555@qq.com>
//
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#pragma once
#include "op.h"
namespace jittor {
struct CudnnConv3dBackwardXOp : Op {
Var* w, * dy, * dx;
int xd, xh, xw, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups;
string xformat;
CudnnConv3dBackwardXOp(Var* w, Var* y, int depth, int height, int width, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups=1, string xformat="ncdhw");
const char* name() const override { return "cudnn_conv3d_backward_x"; }
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
void infer_shape() override;
DECLARE_jit_run;
};
} // jittor

View File

@ -0,0 +1,284 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers: Dun Liang <randonlang@gmail.com>.
//
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#include "var.h"
#include "cudnn_conv3d_op.h"
#include "cudnn_warper.h"
#include "executor.h"
#include "ops/op_register.h"
using namespace std;
namespace jittor {
#pragma GCC diagnostic ignored "-Wunused-variable"
#ifndef JIT
CudnnConv3dOp::CudnnConv3dOp(Var* x, Var* w, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups, string xformat)
: x(x), w(w), strided(strided), strideh(strideh), stridew(stridew), paddingd(paddingd), paddingh(paddingh), paddingw(paddingw), dilationd(dilationd), dilationh(dilationh), dilationw(dilationw), groups(groups),
xformat(move(xformat)) {
flags.set(NodeFlags::_cuda, 1);
flags.set(NodeFlags::_cpu, 0);
y = create_output(nullptr, dtype_infer(x->ns, w->ns));
}
void CudnnConv3dOp::infer_shape() {
ASSERTop(x->shape.size(),==,5);
ASSERTop(w->shape.size(),==,5);
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
if (xformat == "ncdhw")
x->shape.unpack(xn, xc, xd, xh, xw);
else
x->shape.unpack(xn, xd, xh, xw, xc);
w->shape.unpack(wco, wci, wd, wh, ww);
ASSERTop(wci * groups,==,xc);
yn = xn, yc = wco;
yd = (xd+paddingd*2-wd*dilationd+dilationd-1)/strided+1;
yh = (xh+paddingh*2-wh*dilationh+dilationh-1)/strideh+1;
yw = (xw+paddingw*2-ww*dilationw+dilationw-1)/stridew+1;
if (xformat == "ncdhw")
y->set_shape(NanoVector(yn, yc, yd, yh, yw));
else
y->set_shape(NanoVector(yn, yd, yh, yw, yc));
}
void CudnnConv3dOp::jit_prepare(JK& jk) {
jk << _CS("[Tx:") << x->dtype();
jk << _CS("][Ty:") << y->dtype();
jk << _CS("][Tw:") << w->dtype();
jk << ']';
}
static auto make_backwardx = get_op_info("cudnn_conv3d_backward_x")
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, int, int, int, string>();
static auto make_backwardw = get_op_info("cudnn_conv3d_backward_w")
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, int, int, int, string>();
VarPtr CudnnConv3dOp::grad(Var* out, Var* dout, Var* v, int v_index) {
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
if (xformat == "ncdhw")
x->shape.unpack(xn, xc, xd, xh, xw);
else
x->shape.unpack(xn, xd, xh, xw, xc);
w->shape.unpack(wco, wci, wd, wh, ww);
if (v_index == 0) {
return make_backwardx(w, dout, xd, xh, xw, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
} else {
return make_backwardw(x, dout, wd, wh, ww, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
}
}
// unordered_map<string, cudnnConvolutionFwdAlgo_t> fwd_algo_cache;
#else // JIT
#ifdef JIT_cuda
#pragma clang diagnostic ignored "-Wtautological-compare"
extern unordered_map<string, cudnnConvolutionFwdAlgo_t> fwd_algo_cache;
template <typename T_ELEM> __inline__ cudnnDataType_t getDataType();
template <> __inline__ cudnnDataType_t getDataType<half1>() { return CUDNN_DATA_HALF; }
template <> __inline__ cudnnDataType_t getDataType<float>() { return CUDNN_DATA_FLOAT; }
void CudnnConv3dOp::jit_run() {
cudnnHandle_t& handle_ = cudnn_handle;
cudnnTensorDescriptor_t cudnnIdesc;
cudnnFilterDescriptor_t cudnnFdesc;
cudnnTensorDescriptor_t cudnnOdesc;
cudnnConvolutionDescriptor_t cudnnConvDesc;
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnIdesc ));
checkCudaErrors(cudnnCreateFilterDescriptor( &cudnnFdesc ));
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnOdesc ));
checkCudaErrors(cudnnCreateConvolutionDescriptor( &cudnnConvDesc ));
checkCudaErrors(cudnnSetConvolutionGroupCount( cudnnConvDesc, groups ));
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
int sx[] = {0,0,0,0,1};
for (int i=3; i>=0; i--) sx[i] = sx[i+1] * x->shape[i+1];
int strideX[5];
if (xformat == "ncdhw") {
x->shape.unpack(xn, xc, xd, xh, xw);
int tmp[5] = {sx[0],sx[1],sx[2],sx[3],sx[4]};
memcpy(strideX, tmp, sizeof(tmp));
} else {
x->shape.unpack(xn, xd, xh, xw, xc);
int tmp[5] = {sx[0],sx[2],sx[3],sx[4],sx[1]};
memcpy(strideX, tmp, sizeof(tmp));
}
int dimX[] = {xn, xc, xd, xh, xw};
// dimX: ncdhw
checkCudaErrors(cudnnSetTensorNdDescriptor(
cudnnIdesc, getDataType<Tx>(),
5, dimX, strideX
));
auto ws = w->shape;
int dimW[] = {(int)ws[0],(int)ws[1],(int)ws[2],(int)ws[3],(int)ws[4]};
// cudnn only support this two format
// https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnSetFilterNdDescriptor
#define filterFormat_oihw CUDNN_TENSOR_NCHW
#define filterFormat_ohwi CUDNN_TENSOR_NHWC
// dimW: KCRS(oihw)
checkCudaErrors(cudnnSetFilterNdDescriptor(
cudnnFdesc, getDataType<Tw>(),
// filterFormat_@WFORMAT, 5, dimW
filterFormat_oihw, 5, dimW
));
int padA[] = {paddingd, paddingh, paddingw};
int convstrideA[] = {strided, strideh, stridew};
int dilationA[] = {dilationd, dilationh, dilationw};
// difference between
// CUDNN_CONVOLUTION and CUDNN_CROSS_CORRELATION
// is the kernel rc order
// currently, No perf difference is observed between
// this two mode
checkCudaErrors(cudnnSetConvolutionNdDescriptor(
cudnnConvDesc, 3,
padA, convstrideA, dilationA,
CUDNN_CROSS_CORRELATION, getDataType<Ty>()
));
// using tensor core
// checkCudaErrors( cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );
int sy[] = {0,0,0,0,1};
for (int i=3; i>=0; i--) sy[i] = sy[i+1] * y->shape[i+1];
int strideY[5];
if (xformat == "ncdhw") {
y->shape.unpack(yn, yc, yd, yh, yw);
int tmp[5] = {sy[0],sy[1],sy[2],sy[3],sy[4]};
memcpy(strideY, tmp, sizeof(tmp));
} else {
y->shape.unpack(yn, yd, yh, yw, yc);
int tmp[5] = {sy[0],sy[2],sy[3],sy[4],sy[1]};
memcpy(strideY, tmp, sizeof(tmp));
}
int dimY[] = {yn, yc, yd, yh, yw};
checkCudaErrors(cudnnSetTensorNdDescriptor(
cudnnOdesc, getDataType<Ty>(),
5, dimY, strideY
));
cudnnConvolutionFwdAlgo_t algos[] = {
CUDNN_CONVOLUTION_FWD_ALGO_GEMM,
CUDNN_CONVOLUTION_FWD_ALGO_FFT,
CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING,
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM,
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM,
CUDNN_CONVOLUTION_FWD_ALGO_DIRECT,
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD,
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED,
};
int num_algos = CUDNN_CONVOLUTION_FWD_ALGO_COUNT;
int perf_count;
cudnnConvolutionFwdAlgoPerf_t perf_results[num_algos];
cudnnConvolutionFwdAlgo_t algo;
bool benchmark=true;
jk.clear();
jk << dimX[0] << "," << dimX[1] << "," << dimX[2] << "," << dimX[3] << "," << dimX[4] << ",";
jk << dimW[0] << "," << dimW[1] << "," << dimW[2] << "," << dimW[3] << "," << dimW[4] << ",";
jk << paddingd << paddingh << paddingw << "," << strided << strideh <<stridew << "," << dilationd << dilationh << dilationw << "," << groups << ".";
auto iter = fwd_algo_cache.find(jk.to_string());
if (iter!=fwd_algo_cache.end()) algo = iter->second;
else {
if (fwd_algo_cache.size()>=max_cache_size) benchmark = false;
if (benchmark) {
size_t max_ws_size = 0;
for (int i = 0; i < num_algos; i++) {
size_t sz;
cudnnStatus_t ret = cudnnGetConvolutionForwardWorkspaceSize(
handle_, cudnnIdesc, cudnnFdesc, cudnnConvDesc,
cudnnOdesc, algos[i], &sz);
// continue if use too much workspace
if (sz > mem_info.total_cuda_ram * max_workspace_ratio) continue;
if (CUDNN_STATUS_SUCCESS == ret && sz > max_ws_size) max_ws_size = sz;
}
size_t allocation;
void* ws = exe.temp_allocator->alloc(max_ws_size, allocation);
checkCudaErrors(cudnnFindConvolutionForwardAlgorithmEx(
handle_,
cudnnIdesc, x->ptr<Tx>(),
cudnnFdesc, w->ptr<Tw>(),
cudnnConvDesc,
cudnnOdesc, y->ptr<Ty>(),
num_algos,
&perf_count,
perf_results,
ws,
max_ws_size));
exe.temp_allocator->free(ws, max_ws_size, allocation);
} else {
checkCudaErrors(cudnnGetConvolutionForwardAlgorithm_v7(
handle_,
cudnnIdesc,
cudnnFdesc,
cudnnConvDesc,
cudnnOdesc,
num_algos,
&perf_count,
perf_results));
}
int best_algo_idx=-1;
for (int i = 0; i < perf_count; i++)
if (perf_results[i].status == CUDNN_STATUS_SUCCESS){
best_algo_idx=i;
break;
}
ASSERT(best_algo_idx!=-1);
algo=perf_results[best_algo_idx].algo;
if (benchmark) {
fwd_algo_cache[jk.to_string()] = algo;
if (fwd_algo_cache.size()==max_cache_size)
LOGw << "forward_ algorithm cache is full";
}
}
// TODO: warp work space
void *workSpace = 0;
size_t workSpaceSize;
checkCudaErrors (cudnnGetConvolutionForwardWorkspaceSize(
handle_, cudnnIdesc, cudnnFdesc, cudnnConvDesc,
cudnnOdesc, algo, &workSpaceSize) );
size_t allocation;
if (workSpaceSize > 0) {
workSpace = exe.temp_allocator->alloc(workSpaceSize, allocation);
}
float alpha=1, beta=0;
checkCudaErrors(cudnnConvolutionForward(
handle_,
(void*)(&alpha),
cudnnIdesc, x->ptr<Tx>(),
cudnnFdesc, w->ptr<Tw>(),
cudnnConvDesc,
algo,
workSpace, workSpaceSize,
(void*)(&beta),
cudnnOdesc, y->ptr<Ty>())
);
if (workSpace)
exe.temp_allocator->free(workSpace, workSpaceSize, allocation);
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnIdesc ));
checkCudaErrors(cudnnDestroyFilterDescriptor( cudnnFdesc ));
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnOdesc ));
checkCudaErrors(cudnnDestroyConvolutionDescriptor( cudnnConvDesc ));
}
#endif
#endif // JIT
} // jittor

View File

@ -0,0 +1,24 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers: Dun Liang <randonlang@gmail.com>.
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#pragma once
#include "op.h"
namespace jittor {
struct CudnnConv3dOp : Op {
Var* x, * w, * y;
int strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups;
string xformat;
CudnnConv3dOp(Var* x, Var* w, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd=1, int dilationh=1, int dilationw=1, int groups=1, string xformat="ncdhw");
const char* name() const override { return "cudnn_conv3d"; }
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
void infer_shape() override;
DECLARE_jit_run;
};
} // jittor

View File

@ -65,6 +65,7 @@
#include <assert.h>
#include <cudnn.h>
#include "utils/log.h"
#include "helper_cuda.h"
#include "fp16_dev.h"
#include "fp16_emu.h"

View File

@ -1,4 +1,5 @@
#include <cudnn.h>
#include "utils/log.h"
#include "helper_cuda.h"
const char *_cudaGetErrorEnum(cudnnStatus_t error) {

View File

@ -11,6 +11,7 @@
#include <cuda_runtime.h>
#include <cublas.h>
#include "utils/log.h"
#include "helper_cuda.h"
#include <curand.h>

View File

@ -89,7 +89,7 @@ void CuttTransposeOp::jit_run() {
reverse[i] = dim-1-new_axes[dim-1-i];
for (int i=0; i<dim; i++)
x_shape[i] = new_shape[dim-1-i];
if (dim == 1) {
if (dim == 1 || x->num==1) {
checkCudaErrors(cudaMemcpyAsync(yp, xp, x->size, cudaMemcpyDefault, 0));
return;
}
@ -105,7 +105,9 @@ void CuttTransposeOp::jit_run() {
cuttExecute(iter->second, xp, yp);
} else {
cuttHandle plan;
CHECK(0==cuttPlan(&plan, dim, x_shape.data(), reverse.data(), x->dtype().dsize(), 0));
checkCudaErrors(cudaDeviceSynchronize());
auto ret = cuttPlan(&plan, dim, x_shape.data(), reverse.data(), x->dtype().dsize(), 0);
CHECK(0==ret) << ret << jk.to_string() << x << y;
cutt_plan_cache[jk.to_string()] = plan;
cuttExecute(plan, xp, yp);
}

View File

@ -12,6 +12,7 @@
#include <cuda_runtime.h>
#include <nccl.h>
#include "utils/log.h"
#include "helper_cuda.h"
namespace jittor {

View File

@ -9,6 +9,7 @@
*
*/
#include "utils/log.h"
#include "helper_cuda.h"
#include "fp16_dev.h"

View File

@ -13,6 +13,7 @@
// These are CUDA Helper functions for initialization and error checking
#include <cuda_runtime.h>
#include "utils/log.h"
#include "helper_cuda.h"
#ifdef _CUFFT_H_

View File

@ -778,7 +778,7 @@ void simple_net(int times = 100) {
s.wait();
}
// extern "C" int mkl_test_entry();
// extern int mkl_test_entry();
int mkl_test_entry() {
try {

View File

@ -15,25 +15,25 @@ def eye(shape, dtype):
return jt.array(np.identity(shape[0])).unary(dtype)
def eye_(var):
var.assign(eye(var.shape, var.dtype))
return var.assign(eye(var.shape, var.dtype))
def constant(shape, dtype, value=0.0):
return jt.array(value).unary(dtype).broadcast(shape)
def constant_(var, value=0.0):
var.assign(constant(var.shape, var.dtype, value))
return var.assign(constant(var.shape, var.dtype, value))
def uniform(shape, dtype, low, high):
return jt.random(shape, dtype) * (low - high) + high
def uniform_(var, low, high):
var.assign(uniform(var.shape, var.dtype, low, high))
return var.assign(uniform(var.shape, var.dtype, low, high))
def gauss(shape, dtype, mean=0.0, std=1.0):
return jt.random(shape, dtype, "normal") * std + mean
def gauss_(var, mean=0.0, std=1.0):
var.assign(gauss(var.shape, var.dtype, mean, std))
return var.assign(gauss(var.shape, var.dtype, mean, std))
def invariant_uniform(shape, dtype, mode="fan_in"):
assert len(shape)>1
@ -61,7 +61,7 @@ def relu_invariant_gauss(shape, dtype, mode="fan_in"):
return gauss(shape, dtype, 0, std)
def relu_invariant_gauss_(var, mode="fan_in"):
var.assign(relu_invariant_gauss(tuple(var.shape), var.dtype, mode))
return var.assign(relu_invariant_gauss(tuple(var.shape), var.dtype, mode))
def calculate_std(var,mode,nonlinearity,param=0.01):
mode = mode.lower()
@ -100,7 +100,6 @@ def kaiming_normal_(var, a=0, mode='fan_in', nonlinearity='leaky_relu'):
return gauss_(var,0, std)
#TODO: bound = gain * math.sqrt(6.0/fan) ??
def xavier_uniform(shape, dtype, gain=1.0):
assert len(shape)>1
@ -108,11 +107,11 @@ def xavier_uniform(shape, dtype, gain=1.0):
for i in shape[2:]:
matsize *= i
fan = (shape[1] * matsize) + (shape[0] * matsize)
bound = gain * math.sqrt(1.0/fan)
bound = gain * math.sqrt(6.0/fan)
return uniform(shape, dtype, -bound, bound)
def xavier_uniform_(var, gain=1.0):
var.assign(xavier_uniform(tuple(var.shape), var.dtype, gain))
return var.assign(xavier_uniform(tuple(var.shape), var.dtype, gain))
def xavier_gauss(shape, dtype, gain=1.0):
assert len(shape)>1
@ -125,4 +124,4 @@ def xavier_gauss(shape, dtype, gain=1.0):
return gauss(shape, dtype, 0, std)
def xavier_gauss_(var, gain=1.0):
var.assign(xavier_gauss(tuple(var.shape), var.dtype, gain))
return var.assign(xavier_gauss(tuple(var.shape), var.dtype, gain))

View File

@ -0,0 +1,2 @@
from .chamfer import chamfer_loss, ChamferLoss
from .emd import earth_mover_distance, EarthMoverDistance

View File

@ -0,0 +1,153 @@
# Author: Zheng-Ning Liu
#
# This file implements chamfer loss on both CPU and GPU.
# The implementation does no use extra NxM matrix to store distances, and thus
# supports large point clouds.
import jittor as jt
import jittor.nn as nn
cpu_src = '''
for (int bs = 0; bs < in0_shape0; ++bs)
for (int i = 0; i < in0_shape1; ++i) {
float min_dis = (@in0(bs, i, 0) - @in1(bs, 0, 0)) * (@in0(bs, i, 0) - @in1(bs, 0, 0)) +
(@in0(bs, i, 1) - @in1(bs, 0, 1)) * (@in0(bs, i, 1) - @in1(bs, 0, 1)) +
(@in0(bs, i, 2) - @in1(bs, 0, 2)) * (@in0(bs, i, 2) - @in1(bs, 0, 2));
@out(bs, i) = 0;
for (int j = 1; j < in1_shape1; ++j) {
float dis = (@in0(bs, i, 0) - @in1(bs, j, 0)) * (@in0(bs, i, 0) - @in1(bs, j, 0)) +
(@in0(bs, i, 1) - @in1(bs, j, 1)) * (@in0(bs, i, 1) - @in1(bs, j, 1)) +
(@in0(bs, i, 2) - @in1(bs, j, 2)) * (@in0(bs, i, 2) - @in1(bs, j, 2));
if (dis < min_dis) {
min_dis = dis;
@out(bs, i) = j;
}
}
}
'''
cuda_src = '''
__global__ void chamfer_loss_min_idx_kernel(@ARGS_DEF) {
@PRECALC
int bs = blockIdx.x;
int n = in0_shape1;
int m = in1_shape1;
for (int i = threadIdx.x; i < n; i += blockDim.x) {
float min_dis = (@in0(bs, i, 0) - @in1(bs, 0, 0)) * (@in0(bs, i, 0) - @in1(bs, 0, 0)) +
(@in0(bs, i, 1) - @in1(bs, 0, 1)) * (@in0(bs, i, 1) - @in1(bs, 0, 1)) +
(@in0(bs, i, 2) - @in1(bs, 0, 2)) * (@in0(bs, i, 2) - @in1(bs, 0, 2));
@out(bs, i) = 0;
for (int j = 1; j < m; ++j) {
float dis = (@in0(bs, i, 0) - @in1(bs, j, 0)) * (@in0(bs, i, 0) - @in1(bs, j, 0)) +
(@in0(bs, i, 1) - @in1(bs, j, 1)) * (@in0(bs, i, 1) - @in1(bs, j, 1)) +
(@in0(bs, i, 2) - @in1(bs, j, 2)) * (@in0(bs, i, 2) - @in1(bs, j, 2));
if (dis < min_dis) {
min_dis = dis;
@out(bs, i) = j;
}
}
}
}
chamfer_loss_min_idx_kernel<<<in0_shape0, 512>>>(@ARGS);
'''
def chamfer_loss(pc1, pc2, reduction='mean', dims='BNC', bidirectional=False):
''' return the chamfer loss from pc1 to pc2.
:param pc1: input point cloud
:type pc1: jittor array
:param pc2: input point cloud
:type pc2: jittor array
:param reduction: reduction method in batches, can be 'mean', 'sum', or None. Default: 'mean'.
:type reduction: str, optional
:param dims: a string that represents each dimension, can be
'[BNC]' ([batch, number of points, xyz]), or
'[BCN]' ([batch, xyz, number of points]). Default: 'BNC'.
:type dims: str, optional
Example:
>>> import jittor as jt
>>> from jittor.loss3d import chamfer_loss
>>> jt.flags.use_cuda = True
>>> pc1 = jt.rand([10, 100, 3], dtype=jt.float32)
>>> pc2 = jt.rand([10, 100, 3], dtype=jt.float32)
>>> cf = chamfer_loss(pc1, pc2, dims='BNC', bidirectional=True)
>>> print('chamfer loss =', cf.item())
'''
if bidirectional:
return chamfer_loss(pc1, pc2, reduction, dims) + chamfer_loss(pc2, pc1, reduction, dims)
assert dims in ['BNC', 'BCN']
if dims == 'BCN':
pc1, pc2 = pc1.permute(0, 2, 1), pc2.permute(0, 2, 1)
batch_size_1, N, _ = pc1.shape
batch_size_2, M, _ = pc2.shape
assert batch_size_1 == batch_size_2
batch_size = batch_size_1
idx = jt.code([batch_size, N], 'int32', [pc1, pc2],
cpu_src=cpu_src,
cuda_src=cuda_src)
nearest_pts = pc2.reindex([batch_size, idx.shape[1], 3], [
'i0',
'@e0(i0, i1)',
'i2'
], extras=[idx])
chamfer_distance = (((pc1 - nearest_pts) ** 2).sum(dim=-1)).sqrt()
if reduction is None:
return chamfer_distance
elif reduction == 'sum':
return jt.sum(chamfer_distance)
elif reduction == 'mean':
return jt.mean(chamfer_distance)
class ChamferLoss(nn.Module):
''' A loss layer that computes the chamfer loss from pc1 to pc2.
:param pc1: input point cloud
:type pc1: jittor array
:param pc2: input point cloud
:type pc2: jittor array
:param reduction: reduction method in batches, can be 'mean', 'sum', or None. Default: 'mean'.
:type reduction: str, optional
:param dims: a string that represents each dimension, can be
'[BNC]' ([batch, number of points, xyz]), or
'[BCN]' ([batch, xyz, number of points]). Default: 'BNC'.
:type dims: str, optional
Example:
>>> import jittor as jt
>>> from jittor.loss3d import ChamferLoss
>>> jt.flags.use_cuda = True
>>> pc1 = jt.rand([10, 100, 3], dtype=jt.float32)
>>> pc2 = jt.rand([10, 100, 3], dtype=jt.float32)
>>> CF = ChamferLoss(dims='BNC', bidirectional=True)
>>> cf = CF(pc1, pc2)
>>> print('chamfer loss =', cf.item())
'''
def __init__(self, reduction='mean', dims='BNC', bidirectional=False):
''' see function @chamfer_loss
'''
super().__init__()
self.reduction = reduction
self.dims = dims
self.bidirectional = bidirectional
def execute(self, pc1, pc2):
return chamfer_loss(pc1, pc2, self.reduction, self.dims, self.bidirectional)

440
python/jittor/loss3d/emd.py Normal file
View File

@ -0,0 +1,440 @@
# Author: Zheng-Ning Liu
#
# The gpu implementation is original provided by Haoqiang Fan and Kaichun Mo,
# <https://github.com/daerduoCarey/PyTorchEMD>.
import jittor as jt
from jittor import Function
EMD_gpu_header = '''
namespace jittor {
__device__ inline out_type dist2(out_type x1, out_type y1, out_type z1,
out_type x2, out_type y2, out_type z2) {
return (x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1) + (z2 - z1) * (z2 - z1);
}
}
'''
approxmatch_gpu_src = '''
__global__ void approxmatch_gpu_kernel(@ARGS_DEF) {
@PRECALC
@alias(xyz1, in0)
@alias(xyz2, in1)
@alias(match, out)
int b = in0_shape0;
int n = in0_shape1;
int m = in1_shape1;
out_type *remainL = in2_p + blockIdx.x * (n + m) * 2;
out_type *remainR = remainL + n;
out_type *ratioL = remainR + m;
out_type *ratioR = ratioL + n;
const int Block = 1024;
__shared__ out_type buf[Block * 4];
for (int i = blockIdx.x; i < b; i += gridDim.x) {
for (int j = threadIdx.x; j < n * m; j += blockDim.x)
match_p[i * n * m + j] = 0;
for (int j = threadIdx.x; j < n; j += blockDim.x)
remainL[j] = n >= m ? 1 : m / n;
for (int j = threadIdx.x; j < m; j += blockDim.x)
remainR[j] = n >= m ? n / m : 1;
__syncthreads();
for (int j = 7; j >= -2; j--) {
out_type level = j > -2 ? -powf(4.0f, j) : 0;
for (int k0 = 0; k0 < n; k0 += blockDim.x) {
int k = k0 + threadIdx.x;
out_type x1 = 0, y1 = 0, z1 = 0;
if (k < n) {
x1 = @xyz1(i, k, 0);
y1 = @xyz1(i, k, 1);
z1 = @xyz1(i, k, 2);
}
out_type suml = 1e-9f;
for (int l0 = 0; l0 < m; l0 += Block){
int lend = min(m, l0 + Block) - l0;
for (int l = threadIdx.x; l < lend; l += blockDim.x) {
buf[l * 4 + 0] = @xyz2(i, l0 + l, 0);
buf[l * 4 + 1] = @xyz2(i, l0 + l, 1);
buf[l * 4 + 2] = @xyz2(i, l0 + l, 2);
buf[l * 4 + 3] = remainR[l0 + l];
}
__syncthreads();
for (int l = 0; l < lend; l++){
out_type x2 = buf[l * 4 + 0];
out_type y2 = buf[l * 4 + 1];
out_type z2 = buf[l * 4 + 2];
out_type d = level * dist2(x1, y1, z1, x2, y2, z2);
out_type w = __expf(d) * buf[l * 4 + 3];
suml += w;
}
__syncthreads();
}
if (k < n)
ratioL[k] = remainL[k] / suml;
}
__syncthreads();
for (int l0 = 0; l0 < m; l0 += blockDim.x){
int l = l0 + threadIdx.x;
out_type x2 = 0, y2 = 0, z2 = 0;
if (l < m){
x2 = @xyz2(i, l, 0);
y2 = @xyz2(i, l, 1);
z2 = @xyz2(i, l, 2);
}
out_type sumr = 0;
for (int k0 = 0; k0 < n; k0 += Block){
int kend = min(n, k0 + Block) - k0;
for (int k = threadIdx.x; k < kend; k += blockDim.x){
buf[k * 4 + 0] = @xyz1(i, k0 + k, 0);
buf[k * 4 + 1] = @xyz1(i, k0 + k, 1);
buf[k * 4 + 2] = @xyz1(i, k0 + k, 2);
buf[k * 4 + 3] = ratioL[k0 + k];
}
__syncthreads();
for (int k = 0; k < kend; k++){
out_type x1 = buf[k * 4 + 0];
out_type y1 = buf[k * 4 + 1];
out_type z1 = buf[k * 4 + 2];
out_type d = level * dist2(x1, y1, z1, x2, y2, z2);
out_type w = __expf(d) * buf[k * 4 + 3];
sumr += w;
}
__syncthreads();
}
if (l < m){
sumr *= remainR[l];
out_type consumption = fminf(remainR[l] / (sumr + 1e-9f), 1.0f);
ratioR[l] = consumption * remainR[l];
remainR[l] = fmaxf(0.0f, remainR[l] - sumr);
}
}
__syncthreads();
for (int k0 = 0; k0 < n; k0 += blockDim.x){
int k = k0 + threadIdx.x;
out_type x1 = 0, y1 = 0, z1 = 0;
if (k < n){
x1 = @xyz1(i, k, 0);
y1 = @xyz1(i, k, 1);
z1 = @xyz1(i, k, 2);
}
out_type suml = 0;
for (int l0 = 0; l0 < m; l0 += Block){
int lend = min(m, l0 + Block)-l0;
for (int l = threadIdx.x; l < lend; l += blockDim.x){
buf[l * 4 + 0] = @xyz2(i, l0 + l, 0);
buf[l * 4 + 1] = @xyz2(i, l0 + l, 1);
buf[l * 4 + 2] = @xyz2(i, l0 + l, 2);
buf[l * 4 + 3] = ratioR[l0 + l];
}
__syncthreads();
out_type rl = ratioL[k];
if (k < n){
for (int l = 0; l < lend; l++){
out_type x2 = buf[l * 4 + 0];
out_type y2 = buf[l * 4 + 1];
out_type z2 = buf[l * 4 + 2];
out_type d = level * dist2(x1, y1, z1, x2, y2, z2);
out_type w = __expf(d) * rl * buf[l*4+3];
@match(i, l0 + l, k) += w;
suml += w;
}
}
__syncthreads();
}
if (k < n)
remainL[k] = fmaxf(0.0f, remainL[k] - suml);
}
__syncthreads();
}
}
}
approxmatch_gpu_kernel<<<32, 512>>>(@ARGS);
'''
matchcost_gpu_src = '''
__global__ void matchcost_gpu_kernel(@ARGS_DEF) {
@PRECALC
@alias(xyz1, in0)
@alias(xyz2, in1)
@alias(match, in2)
int b = in0_shape0;
int n = in0_shape1;
int m = in1_shape1;
const int Block = 1024;
__shared__ out_type allsum[512];
__shared__ out_type buf[Block * 3];
for (int i = blockIdx.x; i < b; i += gridDim.x) {
out_type subsum = 0;
for (int k0 = 0; k0 < n; k0 += blockDim.x) {
int k = k0 + threadIdx.x;
out_type x1 = 0, y1 = 0, z1 = 0;
if (k < n) {
x1 = @xyz1(i, k, 0);
y1 = @xyz1(i, k, 1);
z1 = @xyz1(i, k, 2);
}
for (int l0 = 0; l0 < m; l0 += Block) {
int lend = min(m, l0 + Block) - l0;
for (int l = threadIdx.x; l < lend * 3; l += blockDim.x)
buf[l] = xyz2_p[i * m * 3 + l0 * 3 + l];
__syncthreads();
if (k < n) {
for (int l = 0; l < lend; l++) {
out_type x2 = buf[l * 3 + 0];
out_type y2 = buf[l * 3 + 1];
out_type z2 = buf[l * 3 + 2];
out_type d = dist2(x1, y1, z1, x2, y2, z2);
subsum += d * @match(i, l0 + l, k);
}
}
__syncthreads();
}
}
allsum[threadIdx.x] = subsum;
for (int j = 1; j < blockDim.x; j <<= 1) {
__syncthreads();
if ((threadIdx.x & j) == 0 && threadIdx.x + j < blockDim.x) {
allsum[threadIdx.x] += allsum[threadIdx.x + j];
}
}
if (threadIdx.x == 0)
@out(i) = allsum[0];
__syncthreads();
}
}
matchcost_gpu_kernel<<<32, 512>>>(@ARGS);
'''
matchcost_grad1_gpu_src = '''
__global__ void matchcost_grad1_gpu_kernel(@ARGS_DEF) {
@PRECALC
@alias(grad, in0)
@alias(xyz1, in1)
@alias(xyz2, in2)
@alias(match, in3)
int b = grad_shape0;
int n = xyz1_shape1;
int m = xyz2_shape1;
for (int i = blockIdx.x; i < b ; i += gridDim.x){
for (int l = threadIdx.x; l < n; l += blockDim.x){
out_type x1 = @xyz1(i, l, 0);
out_type y1 = @xyz1(i, l, 1);
out_type z1 = @xyz1(i, l, 2);
out_type dx = 0, dy = 0, dz = 0;
for (int k = 0; k < m; k++){
out_type x2 = @xyz2(i, k, 0);
out_type y2 = @xyz2(i, k, 1);
out_type z2 = @xyz2(i, k, 2);
out_type d = @match(i, k, l) * 2;
dx += (x1 - x2) * d;
dy += (y1 - y2) * d;
dz += (z1 - z2) * d;
}
@out(i, l, 0) = dx * @grad(i);
@out(i, l, 1) = dy * @grad(i);
@out(i, l, 2) = dz * @grad(i);
}
}
}
matchcost_grad1_gpu_kernel<<<32, 512>>>(@ARGS);
'''
matchcost_grad2_gpu_src = '''
__global__ void matchcost_grad2_gpu_kernel(@ARGS_DEF) {
@PRECALC
@alias(grad, in0)
@alias(xyz1, in1)
@alias(xyz2, in2)
@alias(match, in3)
int b = grad_shape0;
int n = xyz1_shape1;
int m = xyz2_shape1;
__shared__ out_type sum_grad[256 * 3];
for (int i = blockIdx.x; i < b; i += gridDim.x) {
int kbeg = m * blockIdx.y / gridDim.y;
int kend = m * (blockIdx.y + 1) / gridDim.y;
for (int k = kbeg; k < kend; k++) {
out_type x2 = @xyz2(i, k, 0);
out_type y2 = @xyz2(i, k, 1);
out_type z2 = @xyz2(i, k, 2);
out_type subsumx = 0, subsumy = 0, subsumz = 0;
for (int j = threadIdx.x; j < n; j += blockDim.x) {
out_type x1 = x2 - @xyz1(i, j, 0);
out_type y1 = y2 - @xyz1(i, j, 1);
out_type z1 = z2 - @xyz1(i, j, 2);
out_type d = @match(i, k, j) * 2;
subsumx += x1 * d;
subsumy += y1 * d;
subsumz += z1 * d;
}
sum_grad[threadIdx.x * 3 + 0] = subsumx;
sum_grad[threadIdx.x * 3 + 1] = subsumy;
sum_grad[threadIdx.x * 3 + 2] = subsumz;
for (int j = 1; j < blockDim.x; j <<= 1) {
__syncthreads();
int j1 = threadIdx.x;
int j2 = threadIdx.x + j;
if ((j1 & j) == 0 && j2 < blockDim.x){
sum_grad[j1 * 3 + 0] += sum_grad[j2 * 3 + 0];
sum_grad[j1 * 3 + 1] += sum_grad[j2 * 3 + 1];
sum_grad[j1 * 3 + 2] += sum_grad[j2 * 3 + 2];
}
}
if (threadIdx.x == 0){
@out(i, k, 0) = sum_grad[0] * @grad(i);
@out(i, k, 1) = sum_grad[1] * @grad(i);
@out(i, k, 2) = sum_grad[2] * @grad(i);
}
__syncthreads();
}
}
}
matchcost_grad2_gpu_kernel<<<dim3(32, 32), 256>>>(@ARGS);
'''
class EarthMoverDistance(Function):
''' A loss layer that computes Earth Mover's distance from pc1 to pc2. Only supports GPU.
:param pc1: input point cloud
:type pc1: jittor array
:param pc2: input point cloud
:type pc2: jittor array
:param reduction: reduction method in batches, can be 'mean', 'sum', or None. Default: 'mean'.
:type reduction: str, optional
:param dims: a string that represents each dimension, can be
'[BNC]' ([batch, number of points, xyz]), or
'[BCN]' ([batch, xyz, number of points]). Default: 'BNC'.
:type dims: str, optional
Example:
>>> import jittor as jt
>>> from jittor.loss3d import EarthMoverDistance
>>> jt.flags.use_cuda = True
>>> pc1 = jt.rand([10, 100, 3], dtype=jt.float32)
>>> pc2 = jt.rand([10, 100, 3], dtype=jt.float32)
>>> EMD = EarthMoverDistance(dims='BNC')
>>> emd = EMD(pc1, pc2)
>>> print('EMD =', emd.item())
'''
def execute(self, pc1, pc2, reduction='mean', dims='BNC'):
assert dims in ['BNC', 'BCN']
if dims == 'BCN':
pc1, pc2 = pc1.permute(0, 2, 1), pc2.permute(0, 2, 1)
batch_size_1, N, _ = pc1.shape
batch_size_2, M, _ = pc2.shape
assert batch_size_1 == batch_size_2
batch_size = batch_size_1
temp = jt.zeros([batch_size, (N + M) * 2], pc1.dtype)
match = jt.code(
shape=[batch_size, M, N],
dtype=pc1.dtype,
inputs=[pc1, pc2, temp],
cuda_header=EMD_gpu_header,
cuda_src=approxmatch_gpu_src,
)
emd = jt.code(
shape=[batch_size],
dtype=pc1.dtype,
inputs=[pc1, pc2, match],
cuda_header=EMD_gpu_header,
cuda_src=matchcost_gpu_src,
)
self.saved_vars = (pc1, pc2, match, reduction)
if reduction is None:
return emd
elif reduction == 'sum':
return emd.sum()
elif reduction == 'mean':
return emd.mean()
def grad(self, grad):
pc1, pc2, match, reduction = self.saved_vars
if reduction == 'sum':
grad = jt.ones([pc1.shape[0]]) * grad
elif reduction == 'mean':
grad = jt.ones([pc1.shape[0]]) * grad / pc1.shape[0]
grad_pc1 = jt.code(
shape=pc1.shape,
dtype=pc1.dtype,
inputs=[grad, pc1, pc2, match],
cuda_src=matchcost_grad1_gpu_src,
)
grad_pc2 = jt.code(
shape=pc2.shape,
dtype=pc2.dtype,
inputs=[grad, pc1, pc2, match],
cuda_src=matchcost_grad2_gpu_src,
)
return grad_pc1, grad_pc2
def earth_mover_distance(pc1, pc2, reduction='mean', dims='BNC'):
''' Earth Mover's distance from pc1 to pc2. Only supports GPU.
:param pc1: input point cloud
:type pc1: jittor array
:param pc2: input point cloud
:type pc2: jittor array
:param reduction: reduction method in batches, can be 'mean', 'sum', or None. Default: 'mean'.
:type reduction: str, optional
:param dims: a string that represents each dimension, can be
'[BNC]' ([batch, number of points, xyz]), or
'[BCN]' ([batch, xyz, number of points]). Default: 'BNC'.
:type dims: str, optional
Example:
>>> import jittor as jt
>>> from jittor.loss3d import earth_mover_distance
>>> jt.flags.use_cuda = True
>>> pc1 = jt.rand([10, 100, 3], dtype=jt.float32)
>>> pc2 = jt.rand([10, 100, 3], dtype=jt.float32)
>>> emd = earth_mover_distance(pc1, pc2, dims='BNC')
>>> print('EMD =', emd.item())
'''
return EarthMoverDistance.apply(pc1, pc2, reduction, dims)

View File

@ -116,6 +116,65 @@ class CosineAnnealingLR(object):
if param_group.get("lr") != None:
param_group["lr"] = self.get_lr(self.base_lr_pg[i], param_group["lr"])
class ExponentialLR(object):
""" learning rate is multiplied by gamma in each step.
"""
def __init__(self, optimizer, gamma, last_epoch=-1):
self.optimizer = optimizer
self.gamma = gamma
self.last_epoch = last_epoch
self.base_lr = optimizer.lr
self.base_lr_pg = [pg.get("lr") for pg in optimizer.param_groups]
def get_lr(self, base_lr, now_lr):
if self.last_epoch == 0:
return base_lr
return base_lr * self.gamma ** self.last_epoch
def step(self):
self.last_epoch += 1
self.update_lr()
def update_lr(self):
self.optimizer.lr = self.get_lr(self.base_lr, self.optimizer.lr)
for i, param_group in enumerate(self.optimizer.param_groups):
if param_group.get("lr") != None:
param_group["lr"] = self.get_lr(self.base_lr_pg[i], param_group["lr"])
class StepLR(object):
def __init__(self, optimizer, step_size, gamma=0.1, last_epoch=-1):
self.optimizer = optimizer
self.step_size = step_size
self.gamma = gamma
self.last_epoch = last_epoch
self.cur_epoch = 0
def get_gamma(self):
if self.last_epoch < 0:
if (self.cur_epoch != 0 and (self.cur_epoch + 1) % self.step_size == 0):
return self.gamma
else:
if (self.cur_epoch + 1 + self.last_epoch) % self.step_size == 0:
return self.gamma
return 1.
def get_lr(self):
return self.optimizer.lr
def step(self):
self.update_lr()
self.cur_epoch += 1
def update_lr(self):
gamma = self.get_gamma()
if gamma != 1.:
self.optimizer.lr = self.optimizer.lr * gamma
for i, param_group in enumerate(self.optimizer.param_groups):
if param_group.get("lr") != None:
param_group["lr"] = param_group["lr"] * gamma
class MultiStepLR(object):
def __init__(self, optimizer, milestones=[], gamma=0.1, last_epoch=-1):
self.optimizer = optimizer

View File

@ -182,7 +182,28 @@ def chunk(x, chunks, dim=0):
jt.Var.chunk = chunk
def expand(x, shape):
def expand(x, *shape):
''' Expand and broadcast this array, -1 represents this dimension is not changed.
Example::
a = jt.zeros((3,1))
b = a.expand(3, 4)
assert b.shape == (3,4)
b = a.expand(-1, 4)
assert b.shape == (3,4)
b = a.expand((3, 4))
assert b.shape == (3,4)
b = a.expand((-1, 4))
assert b.shape == (3,4)
'''
if len(shape) == 1 and isinstance(shape[0], (tuple,list,jt.NanoVector)):
shape = shape[0]
shape = list(shape)
for i in range(len(shape)):
if shape[i] == -1:
shape[i] = x.shape[i]
return x.broadcast(shape)
jt.Var.expand = expand
@ -634,23 +655,6 @@ def kthvalue(input, k, dim=None, keepdim=False):
jt.Var.kthvalue = kthvalue
def gather(x,dim,index):
if dim<0:
dim+=index.ndim
x_shape = list(x.shape )
i_shape = list(index.shape)
assert i_shape[dim]>0
assert x.ndim == index.ndim
i_shape[dim]=x_shape[dim]
assert i_shape == x_shape
ins = []
for i in range(index.ndim):
ins.append(jt.index(index.shape,dim=i))
ins[dim]=index
return x.reindex(ins)
jt.Var.gather = gather
def _prod(x,dim=0):
x = jt.log(x)
x = x.sum(dim=dim)
@ -708,7 +712,6 @@ def nms(dets,thresh):
return order[selected]
jt.Var.expand = jt.Var.broadcast
jt.Var.expand_as = jt.Var.broadcast_var
@ -1061,6 +1064,13 @@ def randperm(n, dtype="int32"):
return index.cast(dtype)
def set_global_seed(seed):
''' Sets the seeds of the random number generators of Python, numpy and jittor,
simultaneously.
.. note::
Jittor also gurantees each worker of jittor.dataset.Dataset to hold a different seed,
which is global_seed ^ worker_id ^ 1234.
'''
import random
random.seed(seed)
jt.set_seed(seed)
@ -1197,7 +1207,7 @@ def gather(x, dim, index):
Parameters::
* input (jt.Var) the source array
* x (jt.Var) the source array
* dim (int) the axis along which to index
* index (jt.Var) the indices of elements to gather
@ -1216,3 +1226,46 @@ Example::
return x.getitem(tuple(indexes))
jt.Var.gather = gather
def roll(x, shifts, dims=None):
'''Roll the tensor along the given dimension(s).
Parameters::
* x (jt.Var) the source array
* shifts (int or tuple) shift offset of dims
* dims (int or tuple) shift dims
Examples::
x = jt.array([1, 2, 3, 4, 5, 6, 7, 8]).view(4, 2)
y = x.roll(1, 0)
assert (y.numpy() == [[7,8],[1,2],[3,4],[5,6]]).all()
y = x.roll(-1, 0)
assert (y.numpy() == [[3,4],[5,6],[7,8],[1,2]]).all()
y = x.roll(shifts=(2, 1), dims=(0, 1))
assert (y.numpy() == [[6,5],[8,7],[2,1],[4,3]]).all()
'''
if isinstance(shifts, int):
shifts = (shifts,)
if dims is None:
dims = tuple(range(len(shifts)))
elif isinstance(dims, int):
dims = (dims,)
assert len(dims) == len(shifts)
ids = [ f'i{i}' for i in range(x.ndim) ]
for i in range(len(dims)):
shift = shifts[i]
d = dims[i]
size = x.shape[d]
shift = shift % size
if shift<0: shift += size
ids[d] = f'(i{d}<{shift}?i{d}+{size-shift}:(i{d}-{shift}))'
return x.reindex(x.shape, ids)
jt.Var.roll = roll
def safe_log(x):
return jt.safe_clip(x, 1e-30, 1e30).log()
jt.Var.safe_log = safe_log

View File

@ -20,19 +20,20 @@ import math
from collections import OrderedDict
from jittor.pool import *
from jittor.optim import *
from jittor.misc import _pair
from jittor.misc import _pair, _triple
from jittor_utils import LOG
def matmul_transpose(a, b):
'''
returns a * b^T
'''
assert len(a.shape) >= 2 and len(b.shape) == 2
assert a.shape[-1] == b.shape[-1], (a.shape, b.shape)
if len(a.shape)>2:
if len(a.shape) != 2:
aa = a.reshape((-1, a.shape[-1]))
cc = matmul_transpose(aa, b)
return cc.reshape(a.shape[:-1]+(-1,))
assert len(a.shape) == 2 and len(b.shape) == 2
shape = list(a.shape)[:-1] + list(b.shape)
a = a.broadcast(shape, [len(shape)-2])
@ -44,7 +45,7 @@ def bmm_transpose(a, b):
'''
returns a * b^T
'''
if jt.flags.use_cuda:
if jt.flags.use_cuda and jt.compile_extern.cublas_ops:
return jt.compile_extern.cublas_ops.cublas_batched_matmul(a, b, 0, 1)
t = list(range(b.ndim))
t[-1], t[-2] = t[-2], t[-1]
@ -117,7 +118,7 @@ Example::
if len_a>=3 and len_a==len_b:
# bmm
# a: [..., n, m], b: [..., m, k], c:[..., n, k]
if jt.flags.use_cuda:
if jt.flags.use_cuda and jt.compile_extern.cublas_ops:
return jt.compile_extern.cublas_ops.cublas_batched_matmul(a, b, 0, 0)
shape = []
len_c = max(len_a, len_b)
@ -178,36 +179,45 @@ class ELU(Module):
class PReLU(Module):
def __init__(self, num_parameters=1, init_=0.25):
self.num_parameters = num_parameters
self.a = init.constant((num_parameters,), "float32", init_)
self.weight = init.constant((num_parameters,), "float32", init_)
def execute(self, x):
if self.num_parameters != 1:
assert self.num_parameters == x.size(1), f"num_parameters does not match input channels in PReLU"
return jt.maximum(0, x) + self.a.broadcast(x, [0,2,3]) * jt.minimum(0, x)
return jt.maximum(0, x) + self.weight.broadcast(x, [0,2,3]) * jt.minimum(0, x)
else:
return jt.maximum(0, x) + self.a * jt.minimum(0, x)
return jt.maximum(0, x) + self.weight * jt.minimum(0, x)
#TODO dims is 4 will cause slowly execution
def cross_entropy_loss(output, target, ignore_index=None):
def cross_entropy_loss(output, target, weight=None, ignore_index=None,reduction='sum'):
if len(output.shape) == 4:
c_dim = output.shape[1]
output = output.transpose((0, 2, 3, 1))
output = output.reshape((-1, c_dim))
if ignore_index is not None:
target = jt.ternary(target==ignore_index,
jt.array(-1).broadcast(target), target)
mask = jt.logical_and(target >= 0, target < output.shape[1])
target = target.reshape((-1, ))
target_weight = jt.ones(target.shape[0], dtype='float32')
if weight is not None:
target_weight = weight[target]
if ignore_index is not None:
target_weight = jt.ternary(
target==ignore_index,
jt.array(0).broadcast(target_weight),
target_weight
)
target = target.broadcast(output, [1])
target = target.index(1) == target
output = output - output.max([1], keepdims=True)
loss = output.exp().sum(1).log()
loss = loss - (output*target).sum(1)
if ignore_index is None:
return loss.mean()
logsum = output.exp().sum(1).log()
loss = (logsum - (output*target).sum(1)) * target_weight
if reduction == 'sum':
return loss.sum() / target_weight.sum()
elif reduction == 'mean':
return loss.mean() / target_weight.mean()
else:
return loss.sum() / jt.maximum(mask.int().sum(), 1)
return loss / target_weight
def mse_loss(output, target):
return (output-target).sqr().mean()
@ -273,11 +283,12 @@ def nll_loss(output,target,weight=None,ignore_index=-100,reduction='mean'):
raise ValueError(f'not support {reduction}')
class CrossEntropyLoss(Module):
def __init__(self,ignore_index=None):
def __init__(self, weight=None, ignore_index=None):
self.weight = weight
self.ignore_index = ignore_index
def execute(self, output, target):
return cross_entropy_loss(output, target,self.ignore_index)
return cross_entropy_loss(output, target, self.weight, self.ignore_index)
class MSELoss(Module):
def __init__(self):
@ -423,7 +434,16 @@ class BatchNorm(Module):
norm_x = x * w.broadcast(x, dims) + b.broadcast(x, dims)
return norm_x
BatchNorm2d = BatchNorm1d = BatchNorm
BatchNorm3d = BatchNorm2d = BatchNorm1d = BatchNorm
def batch_norm(x, running_mean, running_var, weight=1, bias=0, training=False, momentum=0.1, eps=1e-05):
dims = [0]+list(range(2,x.ndim))
assert not training
w = weight / jt.sqrt(running_var+eps)
b = bias - running_mean * w
norm_x = x * w.broadcast(x, dims) + b.broadcast(x, dims)
return norm_x
class InstanceNorm(Module):
def __init__(self, num_features, eps=1e-05, momentum=0.1, affine=True, is_train=True, sync=True):
@ -447,7 +467,23 @@ class InstanceNorm(Module):
b = self.bias - xmean * w
return x * w.broadcast(x, dims) + b.broadcast(x, dims)
InstanceNorm2d = InstanceNorm1d = InstanceNorm
InstanceNorm3d = InstanceNorm2d = InstanceNorm1d = InstanceNorm
def instance_norm(x,
running_mean = None,
running_var = None,
weight = 1,
bias = 0,
momentum = 0.1,
eps = 1e-5):
dims = list(range(2,x.ndim))
xmean = jt.mean(x, dims=dims)
x2mean = jt.mean(x*x, dims=dims)
xvar = (x2mean-xmean*xmean).maximum(0.0)
w = weight / jt.sqrt(xvar+eps)
b = bias - xmean * w
return x * w.broadcast(x, dims) + b.broadcast(x, dims)
class LayerNorm(Module):
def __init__(self, normalized_shape, eps: float = 1e-5, elementwise_affine: bool = True) -> None:
@ -470,7 +506,22 @@ class LayerNorm(Module):
return x * w + b
LayerNorm2d = LayerNorm1d = LayerNorm
LayerNorm3d = LayerNorm2d = LayerNorm1d = LayerNorm
def layer_norm(x,
normalized_shape,
weight = 1,
bias = 0,
eps: float = 1e-5,
elementwise_affine: bool = True):
dims = [-i for i in range(len(normalized_shape), 0, -1)]
xmean = jt.mean(x, dims=dims, keepdims=1)
x2mean = jt.mean(x*x, dims=dims, keepdims=1)
xvar = (x2mean-xmean*xmean).maximum(0.0)
w = weight / jt.sqrt(xvar+eps)
b = bias - xmean * w
return x * w + b
class GroupNorm(Module):
def __init__(self, num_groups, num_channels, eps=1e-05, affine=True, is_train=True):
@ -506,6 +557,33 @@ class GroupNorm(Module):
x = x * w.broadcast(x, [3]) + b.broadcast(x, [3])
return x.reshape(output_shape)
def group_norm(x,
num_groups,
weight = 1,
bias = 0,
eps=1e-05):
N = x.shape[0]
C = x.shape[1]
output_shape = (N,-1)
# TODO: 3d group norm
if x.ndim==4:
output_shape = x.shape
assert C % num_groups == 0
x = x.reshape((N, num_groups, C//num_groups, -1))
xmean = jt.mean(x, dims=[2,3]).reshape((N, num_groups, 1))
x2mean = jt.mean(x*x, dims=[2,3]).reshape((N, num_groups, 1))
xvar = (x2mean-xmean*xmean).maximum(0.0)
if isinstance(weight, jt.Var):
weight = weight.reshape((1, num_groups, -1))
if isinstance(bias, jt.Var):
bias = bias.reshape((1, num_groups, -1))
weight = weight / jt.sqrt(xvar+eps)
bias = bias - xmean * weight
x = x * weight.broadcast(x, [3]) + bias.broadcast(x, [3])
return x.reshape(output_shape)
Relu = jt.make_module(relu)
ReLU = Relu
Leaky_relu = jt.make_module(leaky_relu, 2)
@ -637,6 +715,34 @@ class Conv1d(Module):
y = x.squeeze(-1)
return y
class Conv3d(Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True):
self.in_channels = in_channels
self.out_channels = out_channels
self.kernel_size = kernel_size if isinstance(kernel_size, tuple) else (kernel_size, kernel_size, kernel_size)
self.stride = stride if isinstance(stride, tuple) else (stride, stride, stride)
self.padding = padding if isinstance(padding, tuple) else (padding, padding, padding)
self.dilation = dilation if isinstance(dilation, tuple) else (dilation, dilation, dilation)
self.groups = groups
assert in_channels % groups == 0, 'in_channels must be divisible by groups'
assert out_channels % groups == 0, 'out_channels must be divisible by groups'
Kh, Kw, Kd = self.kernel_size
self.groups = groups
assert in_channels % groups == 0, 'in_channels must be divisible by groups'
assert out_channels % groups == 0, 'out_channels must be divisible by groups'
self.weight = init.invariant_uniform([out_channels, in_channels//groups, Kh, Kw, Kd], dtype="float")
if bias:
fan=1
for i in self.weight.shape[1:]:
fan *= i
bound = 1 / math.sqrt(fan)
self.bias = init.uniform([out_channels], dtype="float", low=-bound, high=bound)
else:
self.bias = None
def execute(self, x):
return conv3d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
def conv2d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
padding = _pair(padding)
@ -694,7 +800,70 @@ def conv2d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
if bias is not None:
b = bias.broadcast(y.shape, [0,2,3])
y = y + b
return y
return y
def conv3d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
padding = _triple(padding)
stride = _triple(stride)
dilation = _triple(dilation)
out_channels = weight.shape[0]
if jt.flags.use_cuda and jt.cudnn:
y = jt.cudnn.ops.cudnn_conv3d(x, weight, *stride, *padding, *dilation, groups)
elif groups == 1:
N,C,D,H,W = x.shape
Kd, Kh, Kw = weight.shape[-3:]
od = (D+padding[0]*2-Kd*dilation[0]+dilation[0]-1)//stride[0]+1
oh = (H+padding[1]*2-Kh*dilation[1]+dilation[1]-1)//stride[1]+1
ow = (W+padding[2]*2-Kw*dilation[2]+dilation[2]-1)//stride[2]+1
xx = x.reindex([N,out_channels,C,od,oh,ow,Kd,Kh,Kw], [
'i0', # Nid
'i2', # Cid
f'i3*{stride[0]}-{padding[0]}+i6*{dilation[0]}', # Hid+Khid
f'i4*{stride[1]}-{padding[1]}+i7*{dilation[1]}', # Wid+KWid
f'i5*{stride[2]}-{padding[2]}+i8*{dilation[2]}', # Did+KDid
])
ww = weight.broadcast(xx.shape, [0,3,4,5])
yy = xx*ww
y = yy.sum([2,6,7,8]) # Kc, Kh, Kw,Kd
else:
N,C,D,H,W = x.shape
Kd, Kh, Kw = weight.shape[-3:]
G = groups
CpG = C // G # channels per group
oc = out_channels
od = (D+padding[0]*2-Kd*dilation[0]+dilation[0]-1)//stride[0]+1
oh = (H+padding[1]*2-Kh*dilation[1]+dilation[1]-1)//stride[1]+1
ow = (W+padding[2]*2-Kw*dilation[2]+dilation[2]-1)//stride[2]+1
xx = x.reindex([N,G,oc//G,CpG,od,oh,ow,Kd,Kh,Kw], [
'i0', # Nid
f'i1*{CpG}+i3', # Gid
f'i4*{stride[0]}-{padding[0]}+i7*{dilation[0]}', # Hid+Khid
f'i5*{stride[1]}-{padding[1]}+i8*{dilation[1]}', # Wid+KWid
f'i6*{stride[2]}-{padding[2]}+i9*{dilation[2]}', # Did+KDid
])
xx.compile_options = {"G":G}
# w: [oc, CpG, Kh, Kw, Kd]
ww = weight.reindex([N, G, oc//G, CpG, oh, ow, od, Kh, Kw, Kd], [
f'i1*{oc//G}+i2',
'i3',
'i7',
'i8',
'i9'
])
yy = xx*ww
y = yy.reindex_reduce('add', [N, oc, od, oh, ow], [
'i0',
f'i1*{oc//G}+i2',
'i4',
'i5',
'i6'
])
if bias is not None:
b = bias.broadcast(y.shape, [0,2,3,4])
y = y + b
return y
class ConvTranspose(Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, \
@ -754,6 +923,45 @@ class ConvTranspose(Module):
y = y + b
return y
class ConvTranspose3d(Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, \
padding=0, output_padding=0, groups=1, bias=True, dilation=1):
self.in_channels = in_channels
self.out_channels = out_channels
# added
self.dilation = dilation
self.group = groups
assert groups==1, "Group conv not supported yet."
self.kernel_size = kernel_size if isinstance(kernel_size, tuple) else (kernel_size, kernel_size, kernel_size)
self.stride = stride if isinstance(stride, tuple) else (stride, stride, stride)
self.dilation = dilation if isinstance(dilation, tuple) else (dilation, dilation, dilation)
# added
self.padding = padding if isinstance(padding, tuple) else (padding, padding, padding)
self.real_padding = (
self.dilation[0] * (self.kernel_size[0] - 1) - self.padding[0],
self.dilation[1] * (self.kernel_size[1] - 1) - self.padding[1],
self.dilation[2] * (self.kernel_size[2] - 1) - self.padding[2])
self.output_padding = output_padding if isinstance (output_padding, tuple) else (output_padding, output_padding, output_padding)
assert self.output_padding[0] < max(self.stride[0], self.dilation[0]) and \
self.output_padding[1] < max(self.stride[1], self.dilation[1]) and \
self.output_padding[2] < max(self.stride[2], self.dilation[2]), \
"output padding must be smaller than max(stride, dilation)"
self.weight = init.invariant_uniform((in_channels, out_channels) + self.kernel_size, dtype="float")
if bias:
fan=1
for i in self.weight.shape[1:]:
fan *= i
bound = 1 / math.sqrt(fan)
self.bias = init.uniform([out_channels], dtype="float", low=-bound, high=bound)
else:
self.bias = None
def execute(self, x):
return conv_transpose3d(x, self.weight, self.bias, self.stride, self.padding, self.output_padding, self.group, self.dilation)
def conv_transpose(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1):
x = input
N,C,H,W = x.shape
@ -792,6 +1000,49 @@ def conv_transpose(input, weight, bias=None, stride=1, padding=0, output_padding
assert not bias, "Bias should be none or jittor var"
return y
def conv_transpose3d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1):
x = input
N,C,D,H,W = x.shape
i,o,d,h,w = weight.shape
assert C==i
assert groups==1, "Group conv not supported yet."
stride = stride if isinstance(stride, tuple) else (stride, stride, stride)
dilation = dilation if isinstance(dilation, tuple) else (dilation, dilation, dilation)
# added
padding = padding if isinstance(padding, tuple) else (padding, padding, padding)
output_padding = output_padding if isinstance (output_padding, tuple) else (output_padding, output_padding, output_padding)
assert output_padding[0] < max(stride[0], dilation[0]) and \
output_padding[1] < max(stride[1], dilation[1]) and \
output_padding[2] < max(stride[2], dilation[2]), \
"output padding must be smaller than max(stride, dilation)"
stride_d, stride_h, stride_w = stride
padding_d, padding_h, padding_w = padding
dilation_d, dilation_h, dilation_w = dilation
d_out = (D-1) * stride_d + output_padding[0] - 2*padding_d + 1 + (d-1)*dilation_d
h_out = (H-1) * stride_h + output_padding[1] - 2*padding_h + 1 + (h-1)*dilation_h
w_out = (W-1) * stride_w + output_padding[2] - 2*padding_w + 1 + (w-1)*dilation_w
out_shape = (N, o, d_out, h_out, w_out)
if jt.flags.use_cuda and jt.cudnn:
return jt.cudnn.ops.cudnn_conv3d_backward_x(weight, x, *out_shape[2:], *stride, *padding, *dilation, groups)
shape = (N, i, o, D, H, W, d, h, w)
xx = x.broadcast(shape, (2, 6, 7, 8)) # i,h,w
ww = weight.broadcast(shape, (0, 3, 4, 5)) # N,H,W
y = (ww*xx).reindex_reduce("add", out_shape, [
'i0', # N
'i2', # o
f'i3*{stride_d}-{padding_d}+i6*{dilation_d}', # Did+Kdid
f'i4*{stride_h}-{padding_h}+i7*{dilation_h}', # Hid+Khid
f'i5*{stride_w}-{padding_w}+i8*{dilation_w}', # Wid+KWid
])
if isinstance(bias, jt.Var):
b = bias.broadcast(y.shape, [0,2,3,4])
y = y + b
else:
assert not bias, "Bias should be none or jittor var"
return y
conv_transpose2d = conv_transpose
def pad(x,padding, mode='constant', value=0):
@ -997,9 +1248,9 @@ def _bicubic(x, a, func):
def _interpolate(img, x, y, ids, mode):
if mode == "nearest":
return img.reindex([*ids, x.floor(), y.floor()])
return img.reindex([*ids, x.floor_int(), y.floor_int()])
if mode == "bilinear":
fx, fy = x.floor(), y.floor()
fx, fy = x.floor_int(), y.floor_int()
cx, cy = fx + 1, fy + 1
dx, dy = x - fx, y - fy
a = img.reindex_var([*ids, fx, fy])
@ -1013,7 +1264,7 @@ def _interpolate(img, x, y, ids, mode):
return o
if mode=="bicubic": # ugly ver.
n,c,h,w = img.shape
fx, fy = x.floor(), y.floor()
fx, fy = x.floor_int(), y.floor_int()
dix, diy = x - fx, y - fy
ax, ay = _bicubic(dix+1,-0.75,2), _bicubic(diy+1,-0.75,2)
bx, by = _bicubic(dix,-0.75,1), _bicubic(diy,-0.75,1)
@ -1031,23 +1282,8 @@ def _interpolate(img, x, y, ids, mode):
return o
raise (f"Not support interpolation mode: {mode}")
def resize(img, size, mode="nearest", align_corners=False):
n, c, h, w = img.shape
H, W = size
nid, cid, hid, wid = jt.index((n, c, H, W))
if align_corners:
x = hid * ((h - 1) / max(1, H - 1))
y = wid * ((w - 1) / max(1, W - 1))
else:
x = hid * (h / H) + (h / H * 0.5 - 0.5)
if H > h: x = x.clamp(0, h - 1)
y = wid * (w / W) + (w / W * 0.5 - 0.5)
if W > w: y = y.clamp(0, w - 1)
return _interpolate(img, x, y, (nid, cid), mode)
def upsample(img, size, mode="nearest", align_corners=False):
# TODO: tf_mode to another function
def resize(img, size, mode="nearest", align_corners=False, tf_mode=False):
n, c, h, w = img.shape
H, W = size
nid, cid, hid, wid = jt.index((n, c, H, W))
@ -1061,22 +1297,30 @@ def upsample(img, size, mode="nearest", align_corners=False):
x = hid * (h / H)
y = wid * (w / W)
else:
x = hid * (h / H) + (h / H * 0.5 - 0.5)
if H > h: x = x.clamp(0, h - 1)
y = wid * (w / W) + (w / W * 0.5 - 0.5)
if W > w: y = y.clamp(0, w - 1)
if (tf_mode):
x = hid * (h / H)
if H > h: x = x.clamp(0, h - 1)
y = wid * (w / W)
if W > w: y = y.clamp(0, w - 1)
else:
x = hid * (h / H) + (h / H * 0.5 - 0.5)
if H > h: x = x.clamp(0, h - 1)
y = wid * (w / W) + (w / W * 0.5 - 0.5)
if W > w: y = y.clamp(0, w - 1)
return _interpolate(img, x, y, (nid, cid), mode)
upsample = resize
def interpolate(X, size=None, scale_factor=None, mode='bilinear', align_corners=False):
def interpolate(X, size=None, scale_factor=None, mode='bilinear', align_corners=False, tf_mode=False):
if scale_factor is not None:
size = [X.shape[-2] * scale_factor, X.shape[-1] * scale_factor]
if isinstance(size, int):
size = (size, size)
if scale_factor is not None and scale_factor > 1:
return upsample(X, size, mode, align_corners)
return upsample(X, size, mode, align_corners, tf_mode)
else:
return resize(X, size, mode, align_corners)
return resize(X, size, mode, align_corners, tf_mode)
def grid_sample_v0(input, grid, mode='bilinear', padding_mode='zeros'):
@ -1134,7 +1378,7 @@ def linspace_from_neg_one(grid,num_steps,align_corners):
return jt.array(ra,dtype=grid.dtype)
def make_base_grid_4D(theta,N,C,H,W,align_corners):
base_grid = jt.zeros((N, H, W, 3), dtype=theta.dtype);
base_grid = jt.zeros((N, H, W, 3), dtype=theta.dtype)
base_grid[...,0] = linspace_from_neg_one(theta, W, align_corners)
base_grid[...,1] = jt.unsqueeze(linspace_from_neg_one(theta, H, align_corners),-1)
base_grid[...,-1] = 1
@ -1190,7 +1434,7 @@ def reflect_coordinates(x,twice_low,twice_high):
x = (x - m).abs()
#`fmod` returns same sign as `in`, which is positive after the `fabs` above.
extra = x.mod(span)
flips = (x / span).floor()
flips = (x / span).floor_int()
result1 = extra+m
result2 = span-extra+m
con = flips%2==0
@ -1242,9 +1486,9 @@ def grid_sampler_3d(X,grid,mode,padding_mode,align_corners):
zid = z.reindex(shape,['i0','i2','i3','i4'])
if mode=='nearest':
return X.reindex([nid,cid,zid.round(),yid.round(),xid.round()])
return X.reindex([nid,cid,zid.round_int(),yid.round_int(),xid.round_int()])
elif mode=='bilinear':
fx,fy,fz = xid.floor(),yid.floor(),zid.floor()
fx,fy,fz = xid.floor_int(),yid.floor_int(),zid.floor_int()
cx,cy,cz = fx+1,fy+1,fz+1
dx,dy,dz = xid-fx,yid-fy,zid-fz
dnx,dny,dnz = cx-xid,cy-yid,cz-zid
@ -1279,10 +1523,10 @@ def grid_sampler_2d(X,grid,mode,padding_mode,align_corners):
yid = y.reindex(shape,['i0','i2','i3'])
if mode=='nearest':
return X.reindex([nid,cid,yid.round(),xid.round()])
return X.reindex([nid,cid,yid.round_int(),xid.round_int()])
elif mode=='bilinear':
#xid,yid = (xid+0.00001),(yid+0.00001)
fx,fy = (xid).floor(),(yid).floor()
fx,fy = (xid).floor_int(),(yid).floor_int()
cx,cy = fx+1,fy+1
dx,dy = xid-fx,yid-fy
dnx,dny = cx-xid,cy-yid
@ -1385,6 +1629,87 @@ class Sequential(Module):
return len(self.layers)
class ParameterList(Module):
def __init__(self, *args):
self.params = collections.OrderedDict()
for var in args:
if isinstance(var, (collections.OrderedDict, dict)):
for k, v in var.items():
self.add_param(k, v)
elif isinstance(var, list):
for v in var:
self.append(v)
else:
self.append(var)
def __getitem__(self, idx):
if idx not in self.params:
return list(self.params.values())[idx]
return self.params[idx]
def __iter__(self):
return self.params.values().__iter__()
def keys(self):
return self.params.keys()
def values(self):
return self.params.values()
def items(self):
return self.params.items()
def execute(self, x):
raise NotImplementedError("Parameters is not executable")
def append(self, var):
assert isinstance(var, jt.Var), f"argument <{type(var)}> is not jittor var"
self.params[len(self.params)] = var
def add_param(self, name, var):
assert isinstance(var, jt.Var), f"argument <{type(var)}> is not jittor var"
self.params[name]=var
def __setitem__(self, name, var):
self.add_param(name, var)
def __len__(self):
return len(self.params)
ParameterDict = ParameterList
def Parameter(data, requires_grad=True):
''' The `Parameter` interface isn't needed in Jittor, this interface
doesn't nothings and it is just used for compatible.
A Jittor Var is a Parameter
when it is a member of Module, if you don't want a Jittor
Var menber is treated as a Parameter, just name it startswith
underscore `_`.
'''
LOG.w(Parameter.__doc__)
data = data.clone()
data.requires_grad = requires_grad
return data
def backward(v, *args, **kw):
''' The `backward` variable interface doesn't exist in Jittor.
please use `optimizer.backward(loss)` or
`optimizer.step(loss)` instead.
For example, if your code looks like this::
optimizer.zero_grad()
loss.backward()
optimizer.step()
It can be changed to this::
optimizer.zero_grad()
optimizer.backward(loss)
optimizer.step()
Or more concise::
optimizer.step(loss)
The step function will automatically zero grad and backward.
'''
LOG.f(backward.__doc__)
jt.Var.backward = backward
def unfold(X, kernel_size, dilation=1, padding=0, stride=1):
assert X.ndim == 4
if not isinstance(kernel_size, tuple):
@ -1900,3 +2225,36 @@ class GRU(RNNBase):
h = (1 - z) * n + z * hidden
return h, h
def bilinear(in1, in2, weight, bias):
w = weight.transpose((1,0,2))
w = w.reshape((w.shape[0], -1))
x = jt.matmul(in1, w)
x = x.reshape(x.shape[:-1]+[weight.shape[0], weight.shape[2]])
y = in2.broadcast(x, (-2,))
z = (x*y).sum(-1)
if bias is not None:
z += bias
return z
class Bilinear(Module):
''' bilinear transformation $out = in1^T W in2 + bias$, Example::
m = nn.Bilinear(20, 30, 40)
input1 = jt.randn(128, 20)
input2 = jt.randn(128, 30)
output = m(input1, input2)
print(output.shape)
# [128, 40]
'''
def __init__(self, in1_features, in2_features, out_features, bias=True, dtype="float32"):
bound = 1 / math.sqrt(in1_features)
self.weight = jt.init.uniform([out_features, in1_features, in2_features], dtype, -bound, bound)
self.bias = bias
if bias:
self.bias = jt.init.uniform([out_features], dtype, -bound, bound)
def execute(self, in1, in2):
return bilinear(in1, in2, self.weight, self.bias)

View File

@ -0,0 +1,11 @@
# 计图零基础入门教程60分钟
```
git clone https://github.com/Jittor/LearnJittorBasicIn60Min.git
cd LearnJittorBasicIn60Min
jupyter notebook
```
在线浏览地址:<https://nbviewer.jupyter.org/github/Jittor/LearnJittorBasicIn60Min/tree/master/>
特别感谢教程作者llt

Binary file not shown.

After

Width:  |  Height:  |  Size: 237 KiB

View File

@ -0,0 +1,138 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 计图入门教程 0 --- 介绍与安装"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**计图 (Jittor)** 是一个以 Python 为前端语言的深度学习框架,它 \n",
"* 效率高:可作为 NumPyPyTorch 的替代品,可以使用 GPU 等其他加速器进行高效的数据运算。除此之外,计图还拥有多个创新点,旨在大幅提升其运算效率;\n",
"* 易使用:是一个用于实现神经网络的自动微分库,并集成了大量有关深度学习的函数库,方便您快速开展开发任务。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**通过本教程,您将**\n",
"* 学习并理解计图中基本类型的一般操作;\n",
"* 了解神经网络的一些基本概念,并学会如何利用计图进行神经网络的训练;\n",
"* 解决一个机器学习的经典实战问题。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**本教程的适用群体:** \n",
"我们的目标是,只要您会 Python 编程,即可通过本教程学习并掌握如何使用计图进行深度学习的开发。不用担心,本教程几乎对所有的关键代码都加以注释说明。只要您耐心跟着本教程一步步学习,便一定能有所斩获。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"现在,请您开启计图快速入门之旅。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 安装\n",
"\n",
"\n",
"Jittor框架对环境要求如下:\n",
"\n",
"\n",
"* 操作系统: **Ubuntu** >= 16.04 或 **Windows Subsystem of LinuxWSL**\n",
"* Python版本 >= 3.7\n",
"* C++编译器 (需要下列至少一个)\n",
" - g++ >=5.4.0\n",
" - clang >=8.0\n",
"* GPU 编译器可选nvcc >=10.0\n",
"* GPU 加速库可选cudnn-dev (cudnn开发版, 推荐使用tar安装方法[参考链接](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar))\n",
"\n",
"如果您不希望手动配置环境,我们推荐使用 Docker 进行安装。\n",
"除此之外,您还可以使用 pip 安装和手动安装。\n",
"\n",
"注意目前Jittor通过WSL的方式在Windows操作系统上运行WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10)WSL版本目前尚不支持CUDA。\n",
"\n",
"Jittor 提供了三种安装方法dockerpip和手动安装\n",
"\n",
"\n",
"\n",
"## Docker 安装\n",
"\n",
"我们提供了Docker安装方式免去您配置环境Docker安装方法如下\n",
"\n",
"\n",
"```\n",
"# CPU only(Linux)\n",
"docker run -it --network host jittor/jittor\n",
"# CPU and CUDA(Linux)\n",
"docker run -it --network host --gpus all jittor/jittor-cuda\n",
"# CPU only(Mac and Windows)\n",
"docker run -it -p 8888:8888 jittor/jittor\n",
"# Upgrade jittor docker image\n",
"docker pull jittor/jittor\n",
"docker pull jittor/jittor-cuda\n",
"```\n",
"\n",
"关于Docker安装的详细教程可以参考[Windows/Mac/Linux通过Docker安装计图](https://cg.cs.tsinghua.edu.cn/jittor/tutorial/2020-5-15-00-00-docker/)\n",
"\n",
"## Pip 安装\n",
"\n",
"\n",
"如果您没有准备好环境或者使用的不是Ubuntu操作系统 推荐使用**docker安装** 如果您已经装好编译器和对应版本的Python,我们强烈推荐您使用这种方法\n",
"(如果无法访问github, 可以通过jittor主页下载):\n",
"\n",
"```bash\n",
"sudo apt install python3.7-dev libomp-dev\n",
"python3.7 -m pip install jittor\n",
"# or install from github(latest version)\n",
"# python3.7 -m pip install git+https://github.com/Jittor/jittor.git\n",
"python3.7 -m jittor.test.test_example\n",
"\n",
"# Upgrade jittor from pip\n",
"python3.7 -m pip install jittor -U\n",
"# Upgrade jittor from github\n",
"python3.7 -m pip install git+https://github.com/Jittor/jittor.git -U\n",
"```\n",
"\n",
"如果测试运行通过,恭喜你已经安装完成.\n",
"jittor会自动在路径中寻找合适的编译器, 如果您希望手动指定编译器, 请使用环境变量 `cc_path` 和 `nvcc_path`(可选).\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

File diff suppressed because one or more lines are too long

View File

@ -37,6 +37,7 @@ class Optimizer(object):
# __zero_grad is a value for fast determ the grad is zero or not
# so we can omit 0+x
self.__zero_grad = True
self._grad_map = {}
def add_param_group(self, group):
self.param_groups.append(group)
@ -77,7 +78,7 @@ class Optimizer(object):
for pg in self.param_groups:
for p, g in zip(pg["params"], pg["grads"]):
if p.is_stop_grad(): continue
g *= clip_coef
g.update(g*clip_coef)
@property
@ -86,6 +87,14 @@ class Optimizer(object):
return { k:v for k, v in self.__dict__.items()
if k[0] != '_' and k not in exclude and not callable(v) }
def state_dict(self):
state = {"defaults": self.defaults}
return state
def load_state_dict(self, state):
for k,v in state["defaults"].items():
setattr(self, k, v)
def zero_grad(self):
self.__zero_grad = True
@ -181,6 +190,35 @@ class Optimizer(object):
p.update(p - g * lr)
self.zero_grad()
def _build_grad_map(self):
_grad_map = {}
for pg in self.param_groups:
for p, g in zip(pg["params"], pg["grads"]):
_grad_map[id(p)] = g
self._grad_map = _grad_map
def find_grad(self, v:jt.Var) -> jt.Var:
if id(v) not in self._grad_map:
self._build_grad_map()
if id(v) not in self._grad_map:
raise RuntimeError("This variable is not managed by this optimizer")
return self._grad_map[id(v)]
def opt_grad(v:jt.Var, opt:Optimizer):
''' Get grad of certain variable in optimizer, Example::
model = Model()
optimizer = SGD(model.parameters())
...
optimizer.backward(loss)
for p in model.parameters():
grad = p.opt_grad(optimizer)
'''
return opt.find_grad(v)
jt.Var.opt_grad = opt_grad
class SGD(Optimizer):
""" SGD Optimizer.
@ -325,6 +363,60 @@ class Adam(Optimizer):
self.zero_grad()
class AdamW(Optimizer):
""" AdamW Optimizer.
Example::
optimizer = nn.AdamW(model.parameters(), lr, eps=1e-8, betas=(0.9, 0.999))
optimizer.step(loss)
"""
def __init__(self, params, lr, eps=1e-8, betas=(0.9, 0.999), weight_decay=0):
super().__init__(params, lr)
self.eps = eps
self.betas = betas
self.weight_decay = weight_decay
# assert weight_decay==0, "weight_decay is not supported yet"
# initialize required arguments for each param_groups
for pg in self.param_groups:
values = pg["values"] = []
m = pg["m"] = []
for p in pg["params"]:
values.append(jt.zeros(p.shape, p.dtype).stop_grad())
m.append(jt.zeros(p.shape, p.dtype).stop_grad())
def add_param_group(self, group):
values = group["values"] = []
m = group["m"] = []
for p in group["params"]:
values.append(jt.zeros(p.shape, p.dtype).stop_grad())
m.append(jt.zeros(p.shape, p.dtype).stop_grad())
self.param_groups.append(group)
def step(self, loss=None):
if loss is not None:
self.pre_step(loss)
n = float(self.n_step)
for pg in self.param_groups:
# get arguments from each param_groups
lr = pg.get("lr", self.lr)
eps = pg.get("eps", self.eps)
weight_decay = pg.get("weight_decay", self.weight_decay)
b0, b1 = pg.get("betas", self.betas)
for p, g, v, m in zip(pg["params"], pg["grads"], pg["values"], pg["m"]):
if p.is_stop_grad(): continue
p.update(p * (1 - lr * weight_decay))
bias_correction1 = 1 - b0 ** n
bias_correction2 = 1 - b1 ** n
m.update(b0 * m + (1-b0) * g) #exp_avg
v.update(b1 * v + (1-b1) * g * g) #exp_avg_sq
denom = jt.sqrt(v) / jt.sqrt(bias_correction2) + eps
step_size = lr / bias_correction1
p.update(p - step_size * m / denom)
self.zero_grad()
class LRScheduler:
def __init__(self,optimizer, last_epoch=-1):
assert isinstance(optimizer,Optimizer)

View File

@ -73,7 +73,7 @@ class Pool(Module):
for (int q = k3; q < k3_; ++q)
if (out_value < @in0(i0, i1, p, q)) {{
out_value = @in0(i0, i1, p, q);
out_index = (p - k2) * {self.kernel_size[0]} + (q - k3);
out_index = p * in0_shape3 + q;
}}
@out(i0, i1, i2, i3) = out_value;
@out1(i0, i1, i2, i3) = out_index;
@ -99,7 +99,7 @@ class Pool(Module):
'''
if self.return_indices:
return_shapes = [[N,C,h,w]] * 2
return_dtypes = [x.dtype, 'uint8']
return_dtypes = [x.dtype, 'int32']
else:
return_shapes = [N,C,h,w]
return_dtypes = x.dtype
@ -184,6 +184,204 @@ class Pool(Module):
])
return xx.reduce(self.op, [4,5])
def _triple(x):
if isinstance(x, tuple):
assert len(x) == 3
return x
else:
return (x,x,x)
class Pool3d(Module):
def __init__(self, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False, count_include_pad=True, op="maximum"):
assert dilation == None
assert return_indices == None or op == "maximum"
self.return_indices = return_indices
self.kernel_size = _triple(kernel_size)
self.op = op
stride = stride if stride else kernel_size
self.stride = _triple(stride)
self.padding = _triple(padding)
self.ceil_mode = ceil_mode
self.count_include_pad = count_include_pad and padding != 0
def execute(self, x):
N,C,D,H,W = x.shape
if self.ceil_mode == False:
d = (D+self.padding[0]*2-self.kernel_size[0])//self.stride[0]+1
h = (H+self.padding[1]*2-self.kernel_size[1])//self.stride[1]+1
w = (W+self.padding[2]*2-self.kernel_size[2])//self.stride[2]+1
use_code_op = self.op in ['maximum', 'minimum']
# some second order avg_pool is require, so we don't use code op here
else:
d = (D+self.padding[0]*2-self.kernel_size[0] + self.stride[0] - 1)//self.stride[0]+1
h = (H+self.padding[1]*2-self.kernel_size[1] + self.stride[1] - 1)//self.stride[1]+1
w = (W+self.padding[2]*2-self.kernel_size[2] + self.stride[2] - 1)//self.stride[2]+1
use_code_op = self.op in ['maximum', 'minimum', 'mean']
if use_code_op:
if self.op == 'mean':
if self.count_include_pad:
count = f"int count = {self.kernel_size[0]*self.kernel_size[1]*self.kernel_size[2]};"
else:
count = "int count = (k2_ - k2) * (k3_ - k3) * (k4_ - k4);"
count += "float32 rcount = 1.0f / count;"
else:
count = ""
forward_body = f'''
int k4 = i4*{self.stride[2]}-{self.padding[2]};
int k3 = i3*{self.stride[1]}-{self.padding[1]};
int k2 = i2*{self.stride[0]}-{self.padding[0]};
int k4_ = min(k4 + {self.kernel_size[2]}, in0_shape4);
int k3_ = min(k3 + {self.kernel_size[1]}, in0_shape3);
int k2_ = min(k2 + {self.kernel_size[0]}, in0_shape2);
k4 = max(0, k4);
k3 = max(0, k3);
k2 = max(0, k2);
{count}
'''
if not self.return_indices:
forward_body += f'''
@out(i0, i1, i2, i3, i4) = init_{self.op}(out_type);
for (int p = k2; p < k2_; ++p)
for (int q = k3; q < k3_; ++q)
for (int r = k4; r < k4_; ++r)
@out(i0, i1, i2, i3, i4) = {self.op}(out_type, @out(i0, i1, i2, i3, i4), @in0(i0, i1, p, q, r));
'''
else:
forward_body += f'''
auto out_value = init_{self.op}(out_type);
int out_index = -1;
for (int p = k2; p < k2_; ++p)
for (int q = k3; q < k3_; ++q)
for (int r = k4; q < k4_; ++r)
if (out_value < @in0(i0, i1, p, q, r)) {{
out_value = @in0(i0, i1, p, q, r);
out_index = p * in0_shape3 * in0_shape4 + q * in0_shape4 + r;
}}
@out(i0, i1, i2, i3, i4) = out_value;
@out1(i0, i1, i2, i3, i4) = out_index;
'''
backward_body = f'''
int k4 = i4*{self.stride[2]}-{self.padding[2]};
int k3 = i3*{self.stride[1]}-{self.padding[1]};
int k2 = i2*{self.stride[0]}-{self.padding[0]};
int k4_ = min(k4 + {self.kernel_size[2]}, in0_shape4);
int k3_ = min(k3 + {self.kernel_size[1]}, in0_shape3);
int k2_ = min(k2 + {self.kernel_size[0]}, in0_shape2);
k4 = max(0, k4);
k3 = max(0, k3);
k2 = max(0, k2);
{count}
int bo=1;
for (int p = k2; p < k2_ && bo; ++p)
for (int q = k3; q < k3_ && bo; ++q)
for (int r = k4; r < k4_ && bo; ++r) {{
{"atomicAdd(&@out(i0,i1,p,q,r), @dout(i0,i1,i2,i3,i4)/count);"
if self.op == "mean" else
f"""if (@pout(i0,i1,i2,i3,i4) == @in0(i0,i1,p,q,r)) {{
atomicAdd(&@out(i0,i1,p,q,r), @dout(i0,i1,i2,i3,i4)),
bo=0;
}}"""}
}}
'''
if self.return_indices:
return_shapes = [[N,C,d,h,w]] * 2
return_dtypes = [x.dtype, 'int32']
else:
return_shapes = [N,C,d,h,w]
return_dtypes = x.dtype
out = jt.code(return_shapes, return_dtypes, [x],
cuda_header="""
#include <ops/binary_op_defs.h>
#include <misc/cuda_limits.h>
""",
cuda_src=f'''
__global__ static void kernel1(@ARGS_DEF) {{
@PRECALC
int p4 = threadIdx.x;
int s4 = blockDim.x;
int p3 = threadIdx.y;
int s3 = blockDim.y;
int p2 = threadIdx.z + blockIdx.x * blockDim.z;
int s2 = blockDim.z * gridDim.x;
int i1 = blockIdx.y;
int i0 = blockIdx.z;
for (int i4 = p4; i4 < out_shape4; i4 += s4)
for (int i3 = p3; i3 < out_shape3; i3 += s3)
for (int i2 = p2; i2 < out_shape2; i2 += s2)
{{ {forward_body} }}
}}
int tx = min(1024, out_shape4);
int ty = min(1024 / tx, out_shape3);
int tz = min(1024 / tx / ty, out_shape2);
int bx = (out_shape2 - 1) / tz + 1;
int by = out_shape1;
int bz = out_shape0;
dim3 s1(bx, by, bz);
dim3 s2(tx, ty, tz);
kernel1<<<s1, s2>>>(@ARGS);
''',
cuda_grad_src=[f'''
__global__ static void kernel3(@ARGS_DEF) {{
@PRECALC
int p4 = threadIdx.x;
int s4 = blockDim.x;
int p3 = threadIdx.y;
int s3 = blockDim.y;
int p2 = threadIdx.z + blockIdx.x * blockDim.z;
int s2 = blockDim.z * gridDim.x;
int i1 = blockIdx.y;
int i0 = blockIdx.z;
for (int i4 = p4; i4 < out_shape4; i4 += s4)
for (int i3 = p3; i3 < out_shape3; i3 += s3)
for (int i2 = p2; i2 < out_shape2; i2 += s2)
{{ {backward_body} }}
}}
cudaMemsetAsync(out_p, 0, out->size);
int tx = min(1024, pout_shape4);
int ty = min(1024 / tx, pout_shape3);
int tz = min(1024 / tx / ty, pout_shape2);
int bx = (pout_shape2 - 1) / tz + 1;
int by = pout_shape1;
int bz = pout_shape0;
dim3 s1(bx, by, bz);
dim3 s2(tx, ty, tz);
kernel3<<<s1, s2>>>(@ARGS);
'''],
cpu_header='#include <ops/binary_op_defs.h>',
cpu_src=f'''
using namespace std;
for (int i0=0; i0<out_shape0; i0++)
for (int i1=0; i1<out_shape1; i1++)
for (int i2=0; i2<out_shape2; i2++)
for (int i3=0; i3<out_shape3; i3++)
for (int i4=0; i4<out_shape4; i4++)
{{ {forward_body} }}
''',
cpu_grad_src = [f'''
using namespace std;
std::memset(out_p, 0, out->size);
#define atomicAdd(a,b) (*a) += b
for (int i0=0; i0<pout_shape0; i0++)
for (int i1=0; i1<pout_shape1; i1++)
for (int i2=0; i2<pout_shape2; i2++)
for (int i3=0; i3<pout_shape3; i3++)
for (int i4=0; i4<pout_shape4; i4++)
{{ {backward_body} }}
'''])
return out
else:
# TODO: backward
xx = x.reindex([N,C,d,h,w,self.kernel_size[0],self.kernel_size[1],self.kernel_size[2]], [
"i0", # Nid
"i1", # Cid
f"i2*{self.stride[0]}-{self.padding[0]}+i5", # Did
f"i3*{self.stride[1]}-{self.padding[1]}+i6", # Hid
f"i4*{self.stride[2]}-{self.padding[2]}+i7", # Hid
])
return xx.reduce(self.op, [5,6,7])
class AdaptiveAvgPool2d(Module):
def __init__(self, output_size):
@ -216,8 +414,9 @@ class AdaptiveAvgPool2d(Module):
return xx.reduce("mean", [4,5])
class AdaptiveMaxPool2d(Module):
def __init__(self, output_size):
def __init__(self, output_size, return_indices=False):
self.output_size = output_size
self.return_indices = return_indices
def execute(self, x):
if isinstance(self.output_size, int):
@ -235,6 +434,10 @@ class AdaptiveMaxPool2d(Module):
self.sw = math.floor(W / ow)
self.ksh = H - (oh - 1) * self.sh
self.ksw = W - (ow - 1) * self.sw
if self.return_indices:
return MaxPool2d(
kernel_size=(self.ksh, self.ksw),
stride=(self.sh, self.sw), return_indices=True)(x)
h = (H-self.ksh)//self.sh+1
w = (W-self.ksw)//self.sw+1
xx = x.reindex([N,C,h,w,self.ksh,self.ksw], [
@ -245,9 +448,74 @@ class AdaptiveMaxPool2d(Module):
])
return xx.reduce("maximum", [4,5])
class AdaptiveAvgPool3d(Module):
def __init__(self, output_size):
self.output_size = _triple(output_size)
def execute(self, x):
od, oh, ow = self.output_size
if od == 1 and oh == 1 and ow == 1:
return x.reduce("mean", [2,3,4], keepdims=True)
N,C,D,H,W = x.shape
self.sd = math.floor(D / od)
self.sh = math.floor(H / oh)
self.sw = math.floor(W / ow)
self.ksd = D - (od - 1) * self.sd
self.ksh = H - (oh - 1) * self.sh
self.ksw = W - (ow - 1) * self.sw
d = (D-self.ksd)//self.sd+1
h = (H-self.ksh)//self.sh+1
w = (W-self.ksw)//self.sw+1
xx = x.reindex([N,C,d,h,w,self.ksd,self.ksh,self.ksw], [
"i0", # Nid
"i1", # Cid
f"i2*{self.sd}+i5", # Did
f"i3*{self.sh}+i6", # Hid
f"i4*{self.sw}+i7", # Wid
])
return xx.reduce("mean", [5,6,7])
class AdaptiveMaxPool3d(Module):
def __init__(self, output_size, return_indices=False):
self.output_size = _triple(output_size)
self.return_indices = return_indices
def execute(self, x):
od, oh, ow = self.output_size
if od == 1 and oh == 1 and ow == 1 and not self.return_indices:
return x.reduce("maximum", [2,3,4], keepdims=True)
N,C,D,H,W = x.shape
self.sd = math.floor(D / od)
self.sh = math.floor(H / oh)
self.sw = math.floor(W / ow)
self.ksd = D - (od - 1) * self.sd
self.ksh = H - (oh - 1) * self.sh
self.ksw = W - (ow - 1) * self.sw
if self.return_indices:
return MaxPool3d(
kernel_size=(self.ksd, self.ksh, self.ksw),
stride=(self.sd, self.sh, self.sw), return_indices=True)(x)
d = (D-self.ksd)//self.sd+1
h = (H-self.ksh)//self.sh+1
w = (W-self.ksw)//self.sw+1
xx = x.reindex([N,C,d,h,w,self.ksd,self.ksh,self.ksw], [
"i0", # Nid
"i1", # Cid
f"i2*{self.sd}+i5", # Did
f"i3*{self.sh}+i6", # Hid
f"i4*{self.sw}+i7", # Wid
])
return xx.reduce("maximun", [5,6,7])
def pool(x, kernel_size, op, padding=0, stride=None):
return Pool(kernel_size, stride, padding, op=op)(x)
pool2d = pool
def pool3d(x, kernel_size, op, padding=0, stride=None):
return Pool3d(kernel_size, stride, padding, op=op)(x)
class AvgPool2d(Module):
def __init__(self, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True):
self.layer = Pool(kernel_size=kernel_size, stride=stride, padding=padding, ceil_mode=ceil_mode, count_include_pad=count_include_pad, op="mean")
@ -255,6 +523,13 @@ class AvgPool2d(Module):
def execute(self, x):
return self.layer(x)
class AvgPool3d(Module):
def __init__(self, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True):
self.layer = Pool3d(kernel_size=kernel_size, stride=stride, padding=padding, ceil_mode=ceil_mode, count_include_pad=count_include_pad, op="mean")
def execute(self, x):
return self.layer(x)
def avg_pool2d(x, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True):
return AvgPool2d(kernel_size, stride, padding, ceil_mode, count_include_pad)(x)
@ -265,54 +540,115 @@ class MaxPool2d(Module):
def execute(self, x):
return self._layer(x)
class MaxPool3d(Module):
def __init__(self, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False):
self._layer = Pool3d(kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, return_indices=return_indices, ceil_mode=ceil_mode, op="maximum")
def execute(self, x):
return self._layer(x)
def max_pool2d(x, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False):
return MaxPool2d(kernel_size, stride, padding, dilation, return_indices, ceil_mode)(x)
def max_pool3d(x, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False):
return MaxPool3d(kernel_size, stride, padding, dilation, return_indices, ceil_mode)(x)
class MaxUnpool2d(Module):
''' MaxUnpool2d is the invert version of MaxPool2d with indices.
It takes the output index of MaxPool2d as input.
The element will be zero if it is not the max pooled value.
Example::
>>> import jittor as jt
>>> from jittor import nn
>>> pool = nn.MaxPool2d(2, stride=2, return_indices=True)
>>> unpool = nn.MaxUnpool2d(2, stride=2)
>>> input = jt.array([[[[ 1., 2, 3, 4,0],
[ 5, 6, 7, 8,0],
[ 9, 10, 11, 12,0],
[13, 14, 15, 16,0],
[0, 0, 0, 0, 0]]]])
>>> output, indices = pool(input)
>>> unpool(output, indices, output_size=input.shape)
jt.array([[[[ 0., 0., 0., 0., 0.],
[ 0., 6., 0., 8., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 14., 0., 16., 0.],
[ 0., 0., 0., 0., 0.]]]])
'''
def __init__(self, kernel_size, stride=None):
''' MaxUnpool2d is the invert version of MaxPool2d with indices.
It takes the output index of MaxPool2d as input.
The element will be zero if it is not the max pooled value.
Example::
>>> import jittor as jt
>>> from jittor import nn
>>> pool = nn.MaxPool2d(2, stride=2, return_indices=True)
>>> unpool = nn.MaxUnpool2d(2, stride=2)
>>> input = jt.array([[[[ 1., 2, 3, 4,0],
[ 5, 6, 7, 8,0],
[ 9, 10, 11, 12,0],
[13, 14, 15, 16,0],
[0, 0, 0, 0, 0]]]])
>>> output, indices = pool(input)
>>> unpool(output, indices, output_size=input.shape)
jt.array([[[[ 0., 0., 0., 0., 0.],
[ 0., 6., 0., 8., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 14., 0., 16., 0.],
[ 0., 0., 0., 0., 0.]]]])
'''
if isinstance(kernel_size, int):
kernel_size = (kernel_size, kernel_size)
if isinstance(stride, int):
stride = (stride, stride)
if stride is None: stride = kernel_size
assert stride == kernel_size, "Different stride and kernel is not supported yet."
self.kernel_size = kernel_size
self.stride = stride
def execute(self, x, id, output_size=None):
b, c, ph, pw = x.shape
kh, kw = self.kernel_size
sh, sw = self.stride
if output_size:
h, w = output_size[-2:]
else:
h, w = ph * kh, pw * kw
x = x.reindex(shape=[b, c, h, w],
indexes=['i0', 'i1', f'i2/{kh}', f'i3/{kw}'],
extras=[id],
overflow_conditions=[
f'((i2%{kh})*{kw}+i3%{kw}) != @e0(i0,i1,i2/{kh},i3/{kw})'],
overflow_value=0)
return x
h, w = ph * sh, pw * sw
if self.stride == self.kernel_size:
x = x.reindex(shape=[b, c, h, w],
indexes=['i0', 'i1', f'i2/{kh}', f'i3/{kw}'],
extras=[id],
overflow_conditions=[
f'(i2*yshape3+i3) != @e0(i0,i1,i2/{kh},i3/{kw})'],
overflow_value=0)
else:
x = x.reindex_reduce(
op="add",
shape=[b, c, h, w],
indexes=['i0', 'i1',
f'@e0(i0,i1,i2,i3)/xshape3',
f'@e0(i0,i1,i2,i3)%xshape3'],
extras=[id],
)
return x
class MaxUnpool3d(Module):
''' MaxUnpool3d is the invert version of MaxPool3d with indices.
It takes the output index of MaxPool3d as input.
The element will be zero if it is not the max pooled value.
'''
def __init__(self, kernel_size, stride=None):
if stride is None: stride = kernel_size
kernel_size = _triple(kernel_size)
stride = _triple(stride)
self.kernel_size = kernel_size
self.stride = stride
def execute(self, x, id, output_size=None):
b, c, pd, ph, pw = x.shape
kd, kh, kw = self.kernel_size
sd, sh, sw = self.stride
if output_size:
d, h, w = output_size[-3:]
else:
d, h, w = pd * sd, ph * sh, pw * sw
if self.stride == self.kernel_size:
x = x.reindex(shape=[b, c, d, h, w],
indexes=['i0', 'i1', f'i2/{kd}', f'i3/{kh}', f'i4/{kw}'],
extras=[id],
overflow_conditions=[
f'(i2*yshape3*yshape4+i3*yshape4+i4) != @e0(i0,i1,i2/{kd},i3/{kh},i4/{kw})'],
overflow_value=0)
else:
x = x.reindex_reduce(
op="add",
shape=[b, c, d, h, w],
indexes=['i0', 'i1',
f'@e0(i0,i1,i2,i3,i4)/(xshape4*xshape3)',
f'@e0(i0,i1,i2,i3,i4)/xshape4%xshape3',
f'@e0(i0,i1,i2,i3,i4)%xshape4'],
extras=[id],
)
return x

View File

@ -668,7 +668,7 @@ def compile_src(src, h, basename):
arr_func_return.append(f"return ({func_call},0)")
func_return_failed = "return -1"
else:
assert "-> void" in func_head
assert "-> void" in func_head, func_head
arr_func_return.append(f"{func_call};{before_return}return")
func_return_failed = "return"
# generate error msg when not a valid call

View File

@ -0,0 +1,31 @@
# wget https://github.com/oneapi-src/oneDNN/archive/refs/tags/v2.2.zip
# extract zip
# cd to root folder
mkdir -p build
cd build
make clean
export CC=aarch64-linux-gnu-gcc-8
export CXX=aarch64-linux-gnu-g++-8
cmake .. \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=AARCH64 \
-DCMAKE_LIBRARY_PATH=/usr/aarch64-linux-gnu/lib \
-DCMAKE_BUILD_TYPE=Release
# -DCMAKE_SHARED_LINKER_FLAGS=' -lm ' \
make -j8
name=dnnl_lnx_2.2.0_cpu_gomp_aarch64
mkdir -p $name
cp -r ../include ./$name/
mkdir -p ./$name/lib
cp ./src/libmkldnn.so ./$name/lib/libmkldnn.so
cp -r ../examples ./$name/
cp ./include/oneapi/dnnl/* ./$name/include/oneapi/dnnl/
tar -acvf $name.tgz ./$name/
rsync -avPu $name.tgz jittor-web:Documents/jittor-blog/assets/
ssh jittor-web Documents/jittor-blog.git/hooks/post-update
echo "https://cg.cs.tsinghua.edu.cn/jittor/assets/$name.tgz"
md5sum $name.tgz

View File

@ -10,7 +10,7 @@
#include <functional>
#include "utils/log.h"
#define JIT_TEST(name) extern "C" void jit_test_ ## name ()
#define JIT_TEST(name) extern void jit_test_ ## name ()
void expect_error(std::function<void()> func);
#define VAR_MEMBER_NAME_AND_OFFSET(name, op) { #name , offsetof(struct op, name) }

View File

@ -34,6 +34,7 @@ namespace jittor {
Executor exe;
extern MemoryProfiler memory_profiler;
DECLARE_FLAG(int, profile_memory_enable);
DEFINE_FLAG(int, gopt_disable, 0, "Disable graph optimizer.");
// from fetch_op.cc
extern list<VarPtr> fetcher_to_free;
@ -145,7 +146,7 @@ void Executor::run_sync(vector<Var*> vars, bool device_sync) {
}
}
}
if (!need_opt) break;
if (!need_opt || gopt_disable) break;
for (Node* n : bfs_q) {
if (n->flags.get(NodeFlags::_has_gopt)) {
n->op()->graph_optimize();
@ -486,9 +487,14 @@ void Executor::run_sync(vector<Var*> vars, bool device_sync) {
if (use_cuda)
checkCudaErrors(cudaDeviceSynchronize());
#endif
for (Var* var : op->outputs())
check_nan(var);
}
#ifdef JT_CHECK_NAN
for (Var* var : op->outputs())
check_nan(var);
#endif
#ifdef JT_SYNC
checkCudaErrors(cudaDeviceSynchronize());
#endif
LOGvvv << "Finished Op(" >> op->name() << rid >>
"/" >> queue.size() >> ") output:" << op->outputs();
if (is_fused_op) {

View File

@ -37,6 +37,14 @@ inline static void assign_attrs(Var* a, Var* b) {
a->flags.set(NodeFlags::_stop_fuse);
}
map<string,int> grad_breaks;
void warn_grad_break(int i, Var* v) {
if (grad_breaks.count(v->name.c_str())) return;
grad_breaks[v->name.c_str()] = 1;
LOGw << "grads[">>i>>"] '">> v->name>>"' doesn't have gradient. It will be set to zero:" << v;
}
vector<VarPtr> grad(Var* loss, vector<Var*> targets) {
LOGvv << "loss:" >> loss << "targets:" >> targets;
CHECK(loss->is_float()) << "Loss should be float";
@ -212,7 +220,7 @@ vector<VarPtr> grad(Var* loss, vector<Var*> targets) {
grad = move(grads[id]);
if (!grad) {
// TODO: better warning message
LOGw << "grads[">>i>>"] '">> var->name>>"' doesn't have gradient. It will be set to zero:" << var;
warn_grad_break(i, var);
grad = make_number(0.f, var);
assign_attrs(grad.ptr, var);
}

View File

@ -13,14 +13,27 @@ namespace jittor {
typedef void (*set_seed_callback)(int);
void init();
/**
Sets the seed of jittor random number generator. Also see @jittor.set_global_seed.
----------------
* [in] seed: a python number.
*/
// @pyjt(set_seed, seed)
void set_seed(int seed);
/**
Returns the seed of jittor random number generator.
*/
// @pyjt(get_seed)
int get_seed();
void add_set_seed_callback(set_seed_callback callback);
extern "C"
extern
std::default_random_engine* get_random_engine();
// things need to be clean before python exit

12
python/jittor/src/jit_compiler.cc Executable file → Normal file
View File

@ -12,6 +12,7 @@
#else
#include <dlfcn.h>
#endif
#include <mutex>
#include "jit_compiler.h"
#include "op.h"
@ -33,14 +34,19 @@ DEFINE_FLAG(int, rewrite_op, 1, "Rewrite source file of jit operator or not");
namespace jit_compiler {
std::mutex dl_open_mutex;
jit_op_entry_t load_jit_lib(string name, string symbol_name="jit_entry") {
const char* msg = "";
LOGvv << "Opening jit lib:" << name;
#ifdef _WIN32
void* handle = (void*)LoadLibrary(name.c_str());
#else
#elif defined(__linux__)
void* handle = dlopen(name.c_str(), RTLD_LAZY | RTLD_DEEPBIND | RTLD_LOCAL);
msg = dlerror();
#else
void *handle = dlopen(name.c_str(), RTLD_NOW | RTLD_LOCAL);
msg = dlerror();
#endif
CHECK(handle) << "Cannot open library" << name << ":" << msg;
@ -99,10 +105,10 @@ jit_op_entry_t compile(const string& jit_key, const string& src, const bool is_c
+ " \"" + jit_src_path + "\"" + other_src
+ cc_flags + extra_flags
+ " -o \"" + jit_lib_path + "\"";
#ifndef _WIN32
#ifdef __linux__
cmd = python_path+" "+jittor_path+"/utils/asm_tuner.py "
"--cc_path=" + cmd;
#endif
#endif
}
cache_compile(cmd, cache_path, jittor_path);
auto symbol_name = get_symbol_name(jit_key);

View File

@ -8,34 +8,33 @@
#include <sys/mman.h>
#endif
#include <sstream>
#include <unistd.h>
#include "jit_key.h"
#include "utils/str_utils.h"
namespace jittor {
const int page_size = 4*1024;
extern thread_local size_t protected_page;
#ifndef _WIN32
static size_t get_buffer_end_page(size_t buffer_end) {
// get the last complete page in buffer
// 4k align :
// | | | | |
// buffer: xxxxxxxxxxxxxxxxxxxxxxxx
// ^ buffer_end_page
size_t buffer_end_page = buffer_end - buffer_end % page_size;
if (buffer_end_page + page_size-1 > buffer_end)
buffer_end_page -= page_size;
size_t buffer_end_page = buffer_end - buffer_end % getpagesize();
if (buffer_end_page + getpagesize()-1 > buffer_end)
buffer_end_page -= getpagesize();
return buffer_end_page;
}
#endif
JitKey::JitKey() {
(void)get_buffer_end_page;
#ifndef _WIN32
auto buffer_end_page = get_buffer_end_page((size_t)&buffer[buffer_size-1]);
LOGvv << "protect page" << (void*)buffer_end_page;
ASSERT(0==mprotect((void*)buffer_end_page, page_size, PROT_NONE));
// windows assign extern thread_local var cause fault, FIX IT
ASSERT(0==mprotect((void*)buffer_end_page, getpagesize(), PROT_NONE));
protected_page = buffer_end_page;
#endif
}
@ -44,7 +43,7 @@ JitKey::~JitKey() {
#ifndef _WIN32
auto buffer_end_page = get_buffer_end_page((size_t)&buffer[buffer_size-1]);
LOGvv << "un-protect page" << (void*)buffer_end_page;
mprotect((void*)buffer_end_page, page_size, PROT_READ|PROT_WRITE|PROT_EXEC);
mprotect((void*)buffer_end_page, getpagesize(), PROT_READ|PROT_WRITE|PROT_EXEC);
protected_page = 0;
#endif
}

View File

@ -165,8 +165,8 @@ inline JK& operator<<(JK& jk, int64 c) {
}
return jk << JK::hex(c);
}
#ifndef _WIN32
// win32 cause redefinition error
#ifdef __linux__
inline JK& operator<<(JK& jk, long long int c) {
return jk << (int64)c;
}

View File

@ -15,6 +15,9 @@ const char* AlignedAllocator::name() const {return "aligned";}
void* AlignedAllocator::alloc(size_t size, size_t& allocation) {
#ifndef _WIN32
#ifdef __APPLE__
size += 32-size%32;
#endif
return aligned_alloc(alignment, size);
#else
return _aligned_malloc(size, alignment);

View File

@ -13,6 +13,7 @@
namespace jittor {
DEFINE_FLAG(int, use_temp_allocator, 1, "Enable temp allocator");
vector<TempAllocator*> TempAllocator::temp_allocators;
TempAllocator::~TempAllocator() {
while (!cached_blocks.empty()) {

View File

@ -24,6 +24,7 @@ struct TempCachingBlock {
struct TempAllocator : Allocator {
static const size_t ALIGN_SIZE = 512;
static const size_t ID_LIMIT = 1 << 18;
static vector<TempAllocator*> temp_allocators;
Allocator* underlying;
size_t cache_blocks_limit, used_memory, unused_memory;
std::map<unsigned long long, TempCachingBlock*> cached_blocks;
@ -33,6 +34,7 @@ struct TempAllocator : Allocator {
inline TempAllocator(size_t cache_blocks_limit=2) : cache_blocks_limit(cache_blocks_limit), used_memory(0), unused_memory(0), tot_block_id(0), occupied_id_mapper(new TempCachingBlock*[ID_LIMIT]) {
temp_allocators.push_back(this);
}
inline TempAllocator(Allocator* underlying, size_t cache_blocks_limit=2) : TempAllocator(cache_blocks_limit) {
setup(underlying);

View File

@ -6,11 +6,17 @@
// ***************************************************************
#include <iomanip>
#include <algorithm>
#ifndef _WIN32
#if defined(__linux__)
#include <sys/sysinfo.h>
#else
#elif defined(__APPLE__)
#include <sys/sysctl.h>
#include <mach/host_info.h>
#include <mach/mach_init.h>
#include <mach/mach_host.h>
#elif defined(_WIN32)
#include <windows.h>
#endif
#include <unistd.h>
#include "var.h"
#include "op.h"
@ -104,16 +110,41 @@ void display_memory_info(const char* fileline, bool dump_var, bool red_color) {
>> "(" >> std::setprecision(p) >> a->unused_memory*100.0 / total >> "%)"
<< "total:" << FloatOutput{(double)total, " KMG", 1024, "B"} >> "\n";
}
if (use_temp_allocator && exe.temp_allocator) {
for (auto& a : TempAllocator::temp_allocators) {
auto total = a->used_memory + a->unused_memory;
all_total += total;
a->is_cuda() ? gpu_total += total : cpu_total += total;
log << "name:" << a->name() << "is_cuda:" << a->is_cuda()
<< "used:" << FloatOutput{(double)a->used_memory, " KMG", 1024, "B"}
>> "(" >> std::setprecision(p) >> a->used_memory*100.0 / total >> "%)"
<< "unused:" << FloatOutput{(double)a->unused_memory, " KMG", 1024, "B"}
>> "(" >> std::setprecision(p) >> a->unused_memory*100.0 / total >> "%)"
<< "total:" << FloatOutput{(double)total, " KMG", 1024, "B"} >> "\n";
}
}
log << "cpu&gpu:" << FloatOutput{(double)all_total, " KMG", 1024, "B"}
<< "gpu:" << FloatOutput{(double)gpu_total, " KMG", 1024, "B"}
<< "cpu:" << FloatOutput{(double)cpu_total, " KMG", 1024, "B"} >> '\n';
if (use_temp_allocator) {
TempAllocator* temp_allocator = (TempAllocator*)exe.temp_allocator;
log << "\nname:" << temp_allocator->name() << "\n";
log << "used_memory:" << FloatOutput{(double)temp_allocator->used_memory, " KMG", 1024, "B"} << "\n";
log << "unused_memory:" << FloatOutput{(double)temp_allocator->unused_memory, " KMG", 1024, "B"} << "\n";
size_t cpu_free = 0;
#if defined(__linux__)
cpu_free = get_avphys_pages() * sysconf(_SC_PAGESIZE);
#elif defined(__APPLE__)
{
mach_msg_type_number_t count = HOST_VM_INFO_COUNT;
vm_statistics_data_t vmstat;
if (KERN_SUCCESS == host_statistics(mach_host_self(), HOST_VM_INFO, (host_info_t)&vmstat, &count)) {
cpu_free = vmstat.free_count * sysconf(_SC_PAGESIZE);
}
}
#endif
size_t gpu_free = 0, _gpu_total = 0;
(void)gpu_free; (void)_gpu_total;
#ifdef HAS_CUDA
cudaMemGetInfo(&gpu_free, &_gpu_total);
#endif
log << "free: cpu(">>FloatOutput{(double)cpu_free, " KMG", 1024, "B"}
>> ") gpu(">>FloatOutput{(double)gpu_free, " KMG", 1024, "B"} >> ")\n";
if (dump_var) {
vector<Node*> queue;
unordered_set<Node*> visited;
@ -155,22 +186,35 @@ void display_memory_info(const char* fileline, bool dump_var, bool red_color) {
log.end();
}
extern vector<void(*)()> sigquit_callback;
void meminfo_callback() {
display_memory_info();
}
MemInfo::MemInfo() {
#ifndef _WIN32
#if defined(__linux__)
struct sysinfo info = {0};
sysinfo(&info);
total_cpu_ram = info.totalram;
#else
#elif defined(__APPLE__)
int mib[] = {CTL_HW, HW_MEMSIZE};
size_t len=sizeof(total_cpu_ram);
sysctl(mib, 2, &total_cpu_ram, &len, NULL, 0);
#elif defined(_WIN32)
MEMORYSTATUSEX statex;
GlobalMemoryStatusEx (&statex);
total_cpu_ram = statex.ullTotalPhys;
#endif
total_cuda_ram = 0;
#ifdef HAS_CUDA
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
total_cuda_ram = prop.totalGlobalMem;
#endif
sigquit_callback.push_back(&meminfo_callback);
}
MemInfo mem_info;

View File

@ -10,11 +10,11 @@
namespace jittor {
__device__ inline static int floatToOrderedInt(float floatVal) {
int intVal = __float_as_int( floatVal );
return (intVal >= 0 ) ? intVal : intVal ^ 0x7FFFFFFF;
int intVal = __float_as_int( floatVal );
return (intVal >= 0 ) ? intVal : intVal ^ 0x7FFFFFFF;
}
__device__ inline static float orderedIntToFloat(int intVal) {
return __int_as_float((intVal >= 0) ? intVal : intVal ^ 0x7FFFFFFF);
return __int_as_float((intVal >= 0) ? intVal : intVal ^ 0x7FFFFFFF);
}
__global__ inline static void fix_float_kernel(float* x, int num) {
@ -24,7 +24,24 @@ __global__ inline static void fix_float_kernel(float* x, int num) {
x[i] = orderedIntToFloat(__float_as_int(x[i]));
}
inline static void fix_float(float* x, int num) {
__device__ inline static long long floatToOrderedInt(double floatVal) {
long long intVal = __double_as_longlong( floatVal );
return (intVal >= 0 ) ? intVal : intVal ^ 0x7FFFFFFFFFFFFFFF;
}
__device__ inline static double orderedIntToFloat(long long intVal) {
return __longlong_as_double((intVal >= 0) ? intVal : intVal ^ 0x7FFFFFFFFFFFFFFF);
}
__global__ inline static void fix_float_kernel(double* x, int num) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
int tnum = gridDim.x * blockDim.x;
for (int i=tid; i<num; i+=tnum)
x[i] = orderedIntToFloat(__double_as_longlong(x[i]));
}
template<class T>
inline static void fix_float(T* x, int num) {
fix_float_kernel<<<std::min((num-1)/1024+1,256), 1024>>>(x, num);
}
@ -38,6 +55,11 @@ inline float cuda_atomic_max(float* a, float b) {
return orderedIntToFloat(atomicMax((int *)a, floatToOrderedInt(b)));
}
template<> __device__
inline double cuda_atomic_max(double* a, double b) {
return orderedIntToFloat(atomicMax((long long *)a, floatToOrderedInt(b)));
}
template<class T> __device__
T cuda_atomic_min(T* a, T b) {
return atomicMin(a, b);
@ -48,6 +70,11 @@ inline float cuda_atomic_min(float* a, float b) {
return orderedIntToFloat(atomicMin((int *)a, floatToOrderedInt(b)));
}
template<> __device__
inline double cuda_atomic_min(double* a, double b) {
return orderedIntToFloat(atomicMin((long long *)a, floatToOrderedInt(b)));
}
template <class T> struct int_mapper {
typedef T src;
typedef T target;

View File

@ -4,7 +4,7 @@
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#include <bits/stdc++.h>
#include <cstring>
#include "misc/nano_string.h"
namespace jittor {
@ -59,6 +59,9 @@ static unordered_set<string> unary_ops = {
"round",
"floor",
"ceil",
"round_int",
"floor_int",
"ceil_int",
"cast",
"sin",
"asin",
@ -82,9 +85,9 @@ static unordered_set<string> unary_float_ops = {
"sqrt",
};
static unordered_set<string> unary_int_ops = {
"round",
"floor",
"ceil",
"round_int",
"floor_int",
"ceil_int",
};
static unordered_set<string> binary_ops = {

View File

@ -62,6 +62,9 @@ constexpr int ns_max_len = 16;
m(round) \
m(floor) \
m(ceil) \
m(round_int) \
m(floor_int) \
m(ceil_int) \
m(cast) \
\
m(sin) \

View File

@ -159,13 +159,8 @@ struct NanoVector {
for (auto a : v) push_back_check_overflow(a);
}
inline static NanoVector make(const int64* v, int n) {
NanoVector nv;
for (int i=0; i<n; i++) nv.push_back_check_overflow(v[i]);
return nv;
}
inline static NanoVector make(const int32* v, int n) {
template<typename TMakeV>
inline static NanoVector make(const TMakeV* v, int n) {
NanoVector nv;
for (int i=0; i<n; i++) nv.push_back_check_overflow(v[i]);
return nv;
@ -238,6 +233,21 @@ struct NanoVector {
v[i] = at(i);
return v;
}
inline void _unpack(int i) {
return;
}
template<class... Args>
void _unpack(int i, int& x, Args&&... args) {
x = this->operator[](i);
_unpack(i+1, std::forward<Args>(args)...);
}
template<class... Args>
void unpack(Args&&... args) {
_unpack(0, std::forward<Args>(args)...);
}
};

View File

@ -47,13 +47,11 @@ struct RingBuffer {
}
inline ~Cond() {
#ifndef _WIN32
// a dirty hack
// ref: https://stackoverflow.com/questions/20439404/pthread-conditions-and-process-termination
// cv.__data.__wrefs = 0;
#ifdef __linux__
cv.__data = {0};
#else
// cv.__data = 0;
#endif
pthread_cond_destroy(&cv);
}

View File

@ -5,12 +5,22 @@
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#pragma once
#if defined(__clang__)
#include <string_view>
#elif defined(__GNUC__)
#include <experimental/string_view>
#endif
#include "common.h"
namespace jittor {
#if defined(__clang__)
using std::string_view;
#elif defined(__GNUC__)
using std::experimental::string_view;
#endif
template<class T>
struct string_view_map {

View File

@ -104,6 +104,8 @@ int OpCompiler::total_member_count() {
// array need a extra local var
if (op->ops[i]->name()==string("array"))
member_count += 1;
if (op->ops[i]->name()==string("safe_clip"))
member_count += 2;
member_count += v.size();
i += 1;
}
@ -826,11 +828,15 @@ string OpCompiler::__get_fused_src(
const unordered_set<string> members = {
"x", "y", "z", "cond", "output", "extras"
};
const unordered_set<string> scalar_members = {
"left", "right"
};
const unordered_set<string> unchanged = {
"for", "const", "auto", "get_random_engine",
"int", "float", "bool", "CHECK", "STRINGIZE",
"void", "__restrict__", "if", "true", "false",
"Op", "Var", "Node", "itof", "assert", "ASSERT"
"Op", "Var", "Node", "itof", "assert", "ASSERT",
"float64"
};
auto not_change = [&](const string& s) -> bool {
if (unchanged.count(s)) return true;
@ -941,7 +947,8 @@ string OpCompiler::__get_fused_src(
while (l<src.size() && isvar(src[l])) l++;
auto var = src.substr(j, l-j);
if (var[0] == ':' || isdigit(var[0]) || not_change(var) || src[j-1]=='.' || src[j-1]=='>') {} else
if (members.count(var)) {
if (members.count(var) || scalar_members.count(var)) {
bool is_member = members.count(var);
string arg_name = "op" + S(oi) + "_" + var;
if (l<src.size() && src[l]=='[') {
// handle extras[...]
@ -964,7 +971,8 @@ string OpCompiler::__get_fused_src(
" = (("+name3+"Op*)(ops[" + S(oi) + "]))->" + var;
fused_kernel_args += ";\n";
kernel_args.insert(arg_name);
op_members[oi].push_back(arg_name);
if (is_member)
op_members[oi].push_back(arg_name);
}
fused_kernel += arg_name;
j = l-1;

View File

@ -476,6 +476,14 @@ VarPtr BinaryOp::grad(Var* out, Var* dout, Var* v, int v_index) {
return make_binary(ndzx, y2, ns_divide);
}
}
if (ns == ns_mod) {
if (v_index == 0)
return dout;
else {
auto a = make_unary(make_binary(x, y, ns_divide), ns_floor);
return make_unary(a, ns_negative);
}
}
if (ns == ns_maximum || ns == ns_minimum) {
auto zeros = make_number(0, dout);
auto cond = make_binary(y, z, ns_equal);

View File

@ -13,12 +13,12 @@ namespace jittor {
#define pow(T,a,b) ::pow(a,b)
#define maximum(T,a,b) ::max(T(a), T(b))
#define minimum(T,a,b) ::min(T(a), T(b))
#define mod(T,a,b) @if(@strcmp(@Tx,float32)==0,::fmodf(T(a),T(b)),@if(@strcmp(@Tx,float64)==0,::fmod(T(a),T(b)),((a)%(b))))
#define mod(T,a,b) @if(@strcmp(@T,float32)==0,(a-::floorf((a)/(b))*(b)),@if(@strcmp(@Tx,float64)==0,(a-::floor((a)/(b))*(b)),(a%b)))
#else // JIT_cpu
#define pow(T,a,b) std::pow(a,b)
#define maximum(T,a,b) std::max(T(a), T(b))
#define minimum(T,a,b) std::min(T(a), T(b))
#define mod(T,a,b) @if(@strcmp(@Tx,float32)==0 || @strcmp(@Tx,float64)==0,std::fmod((T)a,(T)b),((a)%(b)))
#define mod(T,a,b) @if(@strcmp(@T,float32)==0,(a-std::floor((a)/(b))*(b)),@if(@strcmp(@Tx,float64)==0,(a-std::floor((a)/(b))*(b)),(a%b)))
#endif
#define add(T,a,b) ((a)+(b))
#define subtract(T,a,b) ((a)-(b))

View File

@ -0,0 +1,114 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers: Dun Liang <randonlang@gmail.com>.
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#include "ops/fuse_transpose_op.h"
#include "var.h"
#include "ops/op_register.h"
#include "misc/cuda_flags.h"
namespace jittor {
#ifndef JIT
static auto make_transpose = get_op_info("fuse_transpose")
.get_constructor<VarPtr, Var*, NanoVector>();
static inline NanoVector get_reverse(NanoVector axes) {
NanoVector reverse;
reverse.reserve(axes.size(), axes.size());
for (uint i=0; i<axes.size(); i++)
reverse.set_data(axes[i], i);
return reverse;
}
FuseTransposeOp::FuseTransposeOp(Var* x, NanoVector axes_) : x(x), axes(axes_) {
OpType tp = OpType::broadcast;
if (!x->is_finished()) {
auto type = x->input()->type();
if (type==OpType::broadcast || type==OpType::element)
tp = OpType::reduce;
}
flags.set(NodeFlags::_cpu);
flags.set(NodeFlags::_cuda);
set_type(tp);
int i=0;
for (; i<axes.size(); i++)
if (i!=axes[i]) break;
if (i==axes.size() && axes.size()) {
forward(x);
return;
}
auto xdim = x->shape.size();
if (!axes.size()) {
for (int i=0; i<(int)xdim; i++)
axes.push_back(xdim-1-i);
}
y = create_output(nullptr, x->dtype());
}
void FuseTransposeOp::infer_shape() {
auto xdim = x->shape.size();
CHECK(xdim);
if (!axes.size()) {
for (int i=0; i<(int)xdim; i++)
axes.push_back(xdim-1-i);
} else {
CHECKop(axes.size(),==,xdim);
int64_t mask=0;
for (auto i : axes) mask |= 1<<i;
CHECK(mask==((1ll<<xdim)-1)) << "Invalid axes" << axes;
}
NanoVector shape;
for (uint i=0; i<xdim; i++)
shape.push_back(x->shape[axes[i]]);
y->set_shape(shape);
}
VarPtr FuseTransposeOp::grad(Var* out, Var* dout, Var* v, int v_index) {
return make_transpose(dout, get_reverse(axes));
}
void FuseTransposeOp::jit_prepare(JK& jk) {
auto bc = type()==OpType::broadcast;
auto ax = bc ? axes : get_reverse(axes);
jk << _CS("[Tx:") << x->dtype();
jk << _CS("][DIM=") << JK::hex1(axes.size());
jk << _CS("][BC:") << JK::hex1(bc);
for (uint i=0; i<ax.size(); i++)
jk << _CS("][AXES") << JK::hex1(ax[i]) << '=' << JK::hex1(i);
jk << ']';
}
#else // JIT
void FuseTransposeOp::jit_run() {
auto* __restrict__ xp = x->ptr<Tx>();
auto* __restrict__ yp = y->ptr<Tx>();
@for(i, 0, DIM, index_t yshape@i = y->shape[@i];)
@for(i, 0, DIM, index_t xshape@i = yshape@{AXES@i};)
index_t xstride@{DIM-1} = 1;
@for(i, DIM-2, -1, -1, auto xstride@i = xstride@{i+1} * xshape@{i+1};)
index_t ystride@{DIM-1} = 1;
@for(i, DIM-2, -1, -1, auto ystride@i = ystride@{i+1} * yshape@{i+1};)
@if(BC,
@for(d, 0, DIM, for (index_t i@d=0; i@d < yshape@d; i@d++)) {
auto yid = @for(d, 0, DIM, + i@d * ystride@d);
@for(d, 0, DIM, auto xid@d = i@{AXES@d};)
auto xid = @for(d, 0, DIM, + xid@d * xstride@d);
yp[yid] = xp[xid];
},
@for(d, 0, DIM, for (index_t i@d=0; i@d < xshape@d; i@d++)) {
auto xid = @for(d, 0, DIM, + i@d * xstride@d);
@for(d, 0, DIM, auto yid@d = i@{AXES@d};)
auto yid = @for(d, 0, DIM, + yid@d * ystride@d);
yp[yid] = xp[xid];
}
)
// unused var
(void)xshape0;
}
#endif // JIT
} // jittor

View File

@ -0,0 +1,23 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers: Dun Liang <randonlang@gmail.com>.
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#pragma once
#include "op.h"
namespace jittor {
struct FuseTransposeOp : Op {
Var* x, * y;
NanoVector axes;
FuseTransposeOp(Var* x, NanoVector axes=NanoVector());
const char* name() const override { return "fuse_transpose"; }
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
void infer_shape() override;
DECLARE_jit_run;
};
} // jittor

View File

@ -93,6 +93,9 @@ void GetitemOp::infer_slices(
} else
if (s.is_ellipsis()) {
auto remain_slice = vs.n-vid-1;
for (int i=vid+1; i<vs.n; i++)
if (vs.slices[i].is_none())
remain_slice--;
auto remain_idims = nin-i;
auto ellipsis_size = remain_idims - remain_slice;
ASSERT(ellipsis_size>=0) << "NDims not match";

View File

@ -0,0 +1,47 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers: Dun Liang <randonlang@gmail.com>.
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#include <cmath>
#include "var.h"
#include "ops/safe_clip_op.h"
#include "ops/op_register.h"
namespace jittor {
#ifndef JIT
SafeClipOp::SafeClipOp(Var* x, float64 left, float64 right) : x(x), left(left), right(right) {
flags.set(NodeFlags::_cpu);
flags.set(NodeFlags::_cuda);
set_type(OpType::element);
y = create_output(nullptr, x->dtype());
}
VarPtr SafeClipOp::grad(Var* out, Var* dout, Var* v, int v_index) {
return dout;
}
void SafeClipOp::infer_shape() {
y->set_shape(x->shape);
}
void SafeClipOp::jit_prepare(JK& jk) {
jk << _CS("[Tx:") << x->dtype() <<']';
}
#else // JIT
void SafeClipOp::jit_run() {
auto* __restrict__ xp = x->ptr<Tx>();
Tx left_value = (Tx)std::max((float64)std::numeric_limits<Tx>::lowest(), left);
Tx right_value = (Tx)std::min((float64)std::numeric_limits<Tx>::max(), right);
auto* __restrict__ yp = y->ptr<Tx>();
index_t num = y->num;
for (index_t i=0; i<num; i++)
yp[i] = xp[i] < left_value ? left_value : (xp[i] > right_value ? right_value : xp[i]);
}
#endif // JIT
} // jittor

View File

@ -0,0 +1,33 @@
// ***************************************************************
// Copyright (c) 2021 Jittor. All Rights Reserved.
// Maintainers: Dun Liang <randonlang@gmail.com>.
// This file is subject to the terms and conditions defined in
// file 'LICENSE.txt', which is part of this source code package.
// ***************************************************************
#pragma once
#include "op.h"
namespace jittor {
struct SafeClipOp : Op {
Var* x, * y;
float64 left, right;
/** Safe clip value to a range, and keep
the gradient pass thought.
* [in] x: input value
* [in] left: float64 clip min value.
* [in] right: float64 clip max value.
*/
// @pybind(safe_clip)
SafeClipOp(Var* x, float64 left, float64 right);
const char* name() const override { return "safe_clip"; }
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
void infer_shape() override;
DECLARE_jit_run;
};
} // jittor

View File

@ -316,8 +316,11 @@ void SetitemOp::jit_run() {
checkCudaErrors(cudaMemcpyAsync(op, ip, out->size, cudaMemcpyDefault, 0));
#endif
if (data->allocation == in->allocation &&
data->allocator == in->allocator)
if (flags.get((NodeFlags::Flags(SetitemOp::_data_inplaced))) &&
// array op may move the data allocation, double check
// affect test_contrib.pu
in->allocator == data->allocator &&
in->allocation == data->allocation)
return;
@for(d, 0, ODIM, for (index_t i@d=0; i@d < oshape@d; i@d++)) {

View File

@ -11,6 +11,7 @@
namespace jittor {
struct SetitemOp : Op {
static constexpr int _data_inplaced = NodeFlags::_has_vary_input + 1;
VarSlices vs;
// map i to related var slice
NanoVector i_to_vs;

View File

@ -168,7 +168,7 @@ static unordered_set<string> unary_ops = {
>>> a
jt.Var([ 2.101595 0.33055413 -0.44147047 -0.7720668 ], dtype=float32)
>>> jt.round(a)
jt.Var([ 2 0 0 -1], dtype=int32)
jt.Var([ 2.0 0.0 0.0 -1.0], dtype=float32)
*/
"round",
@ -185,7 +185,7 @@ static unordered_set<string> unary_ops = {
>>> a
jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
>>> jt.floor(a)
jt.Var([-2 -1 -1 -1], dtype=int32)
jt.Var([-2.0 -1.0 -1.0 -1.0], dtype=float32)
*/
"floor",
@ -203,10 +203,63 @@ static unordered_set<string> unary_ops = {
>>> a
jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
>>> jt.ceil(a)
jt.Var([-1 0 0 0], dtype=int32)
jt.Var([-1.0 0.0 0.0 0.0], dtype=float32)
*/
"ceil",
/**
Returns the closest integer of the input ``x``.
----------------
* [in] x: the input jt.Var.
----------------
Example-1::
>>> a = jt.randn(4)
>>> a
jt.Var([ 2.101595 0.33055413 -0.44147047 -0.7720668 ], dtype=float32)
>>> jt.round_int(a)
jt.Var([ 2 0 0 -1], dtype=int32)
*/
"round_int",
/**
Returns the largest integer less than or equal to the input ``x``.
----------------
* [in] x: the input jt.Var.
----------------
Example-1::
>>> a = jt.randn(4)
>>> a
jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
>>> jt.floor_int(a)
jt.Var([-2 -1 -1 -1], dtype=int32)
*/
"floor_int",
/**
Returns the smallest integer greater than or equal to the input ``x``.
----------------
* [in] x: the input jt.Var.
----------------
Example-1::
>>> a = jt.randn(4)
>>> a
jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
>>> jt.ceil_int(a)
jt.Var([-1 0 0 0], dtype=int32)
*/
"ceil_int",
/**
Returns the sine of the input ``x``.

View File

@ -21,6 +21,9 @@ namespace jittor {
#define round(T,x) ((T) ::roundf((x)))
#define floor(T,x) ((T) ::floorf((x)))
#define ceil(T,x) ((T) ::ceilf((x)))
#define round_int(T,x) ((T) ::roundf((x)))
#define floor_int(T,x) ((T) ::floorf((x)))
#define ceil_int(T,x) ((T) ::ceilf((x)))
#define sin(T,x) ((T) ::sinf((x)))
#define asin(T,x) ((T) ::asinf((x)))
@ -49,6 +52,9 @@ namespace jittor {
#define round(T,x) ((T)std::round((x)))
#define floor(T,x) ((T)std::floor((x)))
#define ceil(T,x) ((T)std::ceil((x)))
#define round_int(T,x) ((T)std::round((x)))
#define floor_int(T,x) ((T)std::floor((x)))
#define ceil_int(T,x) ((T)std::ceil((x)))
#define sin(T,x) ((T) std::sin((x)))
#define asin(T,x) ((T) std::asin((x)))

View File

@ -47,6 +47,8 @@ static void setitem_inplace(SetitemOp* op) {
return;
}
auto output = op->outputs().front();
// return if output is all ready shared
if (output->allocator) return;
output->share_with(input);
auto data = op->input(1);
@ -78,13 +80,13 @@ static void setitem_inplace(SetitemOp* op) {
VarSlice s = vs.slices[i];
if (!(s.is_slice())) return;
Slice ss = s.slice;
if (!(ss.start == 0 && ss.stop >= in_shape[i] && ss.step == 1))
if (!(ss.start == 0 && (ss.mask&2) && ss.step == 1))
return;
inplace_size *= in_shape[i];
}
VarSlice s = vs.slices[0];
if (s.is_var()) return;
if (s.is_var() || s.is_str()) return;
auto size = 0;
if (s.is_int())
@ -105,6 +107,7 @@ static void setitem_inplace(SetitemOp* op) {
}
add_dependency(data->input(), {input->node()});
data->share_with(input, size);
op->flags.set((NodeFlags::Flags(SetitemOp::_data_inplaced)));
}
struct BBox {
@ -174,7 +177,10 @@ static void getitem_inplace(GetitemOp* op) {
auto in = op->inputs().front();
auto ou = op->outputs().front();
// return if out is all ready inplaced
if (ou->allocator)
return;
// return if input or output's shape is variable
if (in->num <= 0 || ou->num <= 0)
return;
@ -186,12 +192,12 @@ static void getitem_inplace(GetitemOp* op) {
VarSlice s = vs.slices[i];
if (!(s.is_slice())) return;
Slice ss = s.slice;
if (!(ss.start == 0 && ss.stop >= in_shape[i] && ss.step == 1))
return;
if (!(ss.start == 0 && (ss.mask&2) && ss.step == 1))
return;
}
VarSlice s = vs.slices[0];
if (s.is_var()) return;
if (s.is_var() || s.is_str()) return;
auto size = 0;
if (s.is_int())
@ -213,7 +219,7 @@ void SetitemOp::graph_optimize() {
void GetitemOp::graph_optimize() {
// This optimize is still WIP
// LOGir << "hello getitem graph_optimize";
setitem_grad_opt(this);
// setitem_grad_opt(this);
(void)setitem_grad_opt;
// (void)getitem_inplace;
getitem_inplace(this);

View File

@ -22,7 +22,7 @@ void FloatAtomicFixPass::run() {
if (!choice) return;
unordered_map<string,int> fixed;
auto fix_float_atomic = [&](string name) {
auto fix_float_atomic = [&](string name, Var* v) {
if (fixed.count(name)) return;
fixed[name] = 1;
string namep = name+"p";
@ -38,9 +38,15 @@ void FloatAtomicFixPass::run() {
return;
// fix code a[b] = c -->
// a[b] = __int_as_float(floatToOrderedInt(c))
string new_code = namep+'['+results.at(0)->to_string(true)+
"] = __int_as_float(floatToOrderedInt(" +
results.at(1)->to_string(true) + "));";
string new_code;
if (v->dtype() == ns_float32)
new_code = namep+'['+results.at(0)->to_string(true)+
"] = __int_as_float(floatToOrderedInt(" +
results.at(1)->to_string(true) + "));";
else
new_code = namep+'['+results.at(0)->to_string(true)+
"] = __longlong_as_double(floatToOrderedInt(" +
results.at(1)->to_string(true) + "));";
LOGvvvv << "prev code" << code >> "\nreplace:" << new_code;
code = new_code;
});
@ -74,7 +80,7 @@ void FloatAtomicFixPass::run() {
}
if (!var->dtype().is_float()) return;
LOGvvvv << "find var" << var << "op" << op;
fix_float_atomic(s);
fix_float_atomic(s, var);
});
}

View File

@ -67,6 +67,10 @@ void LoopToFuncPass::run() {
args.push_back(d.get());
continue;
}
if (endswith(d->attrs["lvalue"], "_value")) {
args.push_back(d.get());
continue;
}
}
}
func->push_back(d->clone());
@ -99,7 +103,9 @@ void LoopToFuncPass::run() {
auto& fc = ir->children[i];
fc->attrs["loop_func"] = func->attrs["lvalue"];
}
// ir->remove_all_unused();
#ifdef __APPLE__
ir->remove_all_unused();
#endif
}
} // jittor

View File

@ -13,7 +13,7 @@
namespace jittor {
DECLARE_FLAG(int, para_opt_level);
DEFINE_FLAG(int, para_opt_level, 3, "para_opt_level");
void LoopVarAnalyzePass::run() {
// loop_vars: opi_xx->shape[j]
@ -130,6 +130,7 @@ void LoopVarAnalyzePass::run() {
}
loop_vars.reserve(loop_var->shape.size());
string vname = pm->oc->get_name_by_op_var(op, loop_var);
ASSERT(vname!="__fill__");
for (uint j=0; j<loop_var->shape.size(); j++)
loop_vars.emplace_back(vname+"->shape["+S(j)+"]");
break;

View File

@ -65,9 +65,12 @@ unique_ptr<MemoryChecker>* load_memory_checker(string name) {
LOGvv << "Opening jit lib:" << name;
#ifdef _WIN32
void* handle = (void*)LoadLibrary(name.c_str());
#else
#elif defined(__linux__)
void* handle = dlopen(name.c_str(), RTLD_LAZY | RTLD_DEEPBIND | RTLD_LOCAL);
msg = dlerror();
#else
void* handle = dlopen(name.c_str(), RTLD_LAZY | RTLD_LOCAL);
msg = dlerror();
#endif
CHECK(handle) << "Cannot open library" << name << ":" << msg;

View File

@ -20,6 +20,7 @@
namespace jittor {
DEFINE_FLAG(int, trace_py_var, 0, "Trace py stack max depth for debug.");
DEFINE_FLAG(int, trace_var_data, 0, "Trace py stack max depth for debug.");
Op* trace_grad_op = nullptr;
TraceData trace_data;
@ -185,6 +186,44 @@ static vector<Stack> get_stack_info() {
return stacks;
}
template<class T>
string get_str(T* t, int64 num) {
string s = "";
for (int64 i=0; i<num; i++) {
s += S(t[i]);
if (i != num-1)
s += ',';
}
return s;
}
static inline string get_var_data_str(Var* v) {
if (v->dtype() == ns_int8)
return get_str(v->ptr<int8>(), v->num);
if (v->dtype() == ns_int16)
return get_str(v->ptr<int16>(), v->num);
if (v->dtype() == ns_int32)
return get_str(v->ptr<int32>(), v->num);
if (v->dtype() == ns_int64)
return get_str(v->ptr<int64>(), v->num);
if (v->dtype() == ns_uint8)
return get_str(v->ptr<uint8>(), v->num);
if (v->dtype() == ns_uint16)
return get_str(v->ptr<uint16>(), v->num);
if (v->dtype() == ns_uint32)
return get_str(v->ptr<uint32>(), v->num);
if (v->dtype() == ns_uint64)
return get_str(v->ptr<uint64>(), v->num);
if (v->dtype() == ns_float32)
return get_str(v->ptr<float32>(), v->num);
if (v->dtype() == ns_float64)
return get_str(v->ptr<float64>(), v->num);
return "";
}
void TraceData::record_node(Node* node, bool record_stack) {
if (thread_name.size()) return;
NodeData data;
@ -255,6 +294,8 @@ void TraceData::record_exe_node(Node* node) {
data.attrs["dsize"] = S(v->dtype().dsize());
data.attrs["name"] = v->name.c_str();
data.attrs["is_var"] = "1";
if (trace_var_data && v->mem_ptr)
data.attrs["data"] = get_var_data_str(v);
} else {
auto op = node->op();
data.attrs["name"] = op->name_ex();

View File

@ -86,15 +86,19 @@ ArrayOp::ArrayOp(PyObject* obj) {
// use 32-bit by default
if ((auto_convert_64_to_32 || holder.obj)
&& args.dtype.dsize() == 8 && args.ptr) {
auto num = PyArray_Size(arr)/8;
auto size = PyArray_Size(arr);
args.buffer.reset(new char[size]);
auto pre_data = args.ptr;
args.ptr = args.buffer.get();
auto num = size/8;
if (args.dtype.is_int()) {
auto* __restrict__ i64 = (int64*)args.ptr;
auto* __restrict__ i64 = (int64*)pre_data;
auto* __restrict__ i32 = (int32*)args.ptr;
for (int i=0; i<num; i++)
i32[i] = (int32)i64[i];
args.dtype = ns_int32;
} else if (args.dtype.is_float()) {
auto* __restrict__ f64 = (float64*)args.ptr;
auto* __restrict__ f64 = (float64*)pre_data;
auto* __restrict__ f32 = (float32*)args.ptr;
for (int i=0; i<num; i++)
f32[i] = (float32)f64[i];
@ -136,7 +140,11 @@ ArrayOp::ArrayOp(PyObject* obj) {
std::memcpy(host_ptr, args.ptr, size);
} else {
// this is non-continue numpy array
#if defined(__linux__) || defined(_WIN32)
int64 dims[args.shape.size()];
#elif defined(__APPLE__)
long dims[args.shape.size()];
#endif
for (int i=0; i<args.shape.size(); i++)
dims[i] = args.shape[i];
holder.assign(PyArray_New(

View File

@ -266,7 +266,11 @@ DEF_IS(ArrayArgs, bool) is_type(PyObject* obj) {
}
DEF_IS(ArrayArgs, PyObject*) to_py_object(const T& a) {
#if defined(__linux__) || defined(_WIN32)
int64 dims[a.shape.size()];
#elif defined(__APPLE__)
long dims[a.shape.size()];
#endif
for (int i=0; i<a.shape.size(); i++)
dims[i] = a.shape[i];
PyObjHolder obj(PyArray_SimpleNew(
@ -378,7 +382,11 @@ DEF_IS(VarHolder*, T) from_py_object(PyObject* obj, unique_ptr<VarHolder>& holde
struct DataView;
DEF_IS(DataView, PyObject*) to_py_object(T a) {
#if defined(__linux__) || defined(_WIN32)
int64 dims[a.shape.size()];
#elif defined(__APPLE__)
long dims[a.shape.size()];
#endif
for (int i=0; i<a.shape.size(); i++)
dims[i] = a.shape[i];
PyObjHolder oh(PyArray_New(

View File

@ -109,7 +109,11 @@ static void push_py_object(RingBuffer* rb, PyObject* obj, uint64& __restrict__ o
rb->push_t<NanoString>(args.dtype, offset);
rb->push(size, offset);
args.ptr = rb->get_ptr(size, offset);
#if defined(__linux__) || defined(_WIN32)
int64 dims[args.shape.size()];
#elif defined(__APPLE__)
long dims[args.shape.size()];
#endif
for (int i=0; i<args.shape.size(); i++)
dims[i] = args.shape[i];
PyObjHolder oh(PyArray_New(

View File

@ -415,8 +415,19 @@ inline Console() {
#endif
run("import jittor as jt");
make_pyjt_array = (PyObject* (*)(const vector<int64>& shape, const string& dtype, const void* data))dlsym(RTLD_DEFAULT, "_ZN6jittor15make_pyjt_arrayERKSt6vectorIlSaIlEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKv");
get_pyjt_array = (void (*)(PyObject* obj, vector<int64>& shape, string& dtype, void*& data))dlsym(RTLD_DEFAULT, "_ZN6jittor14get_pyjt_arrayEP7_objectRSt6vectorIlSaIlEERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERPv");
#ifdef __APPLE__
auto symbol_make_pyjt_array = "__ZN6jittor15make_pyjt_arrayERKNSt3__16vectorIxNS0_9allocatorIxEEEERKNS0_12basic_stringIcNS0_11char_traitsIcEENS2_IcEEEEPKv";
auto symbol_gen_pyjt_array = "__ZN6jittor14get_pyjt_arrayEP7_objectRNSt3__16vectorIxNS2_9allocatorIxEEEERNS2_12basic_stringIcNS2_11char_traitsIcEENS4_IcEEEERPv";
#else
auto symbol_make_pyjt_array = "_ZN6jittor15make_pyjt_arrayERKSt6vectorIlSaIlEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKv";
auto symbol_gen_pyjt_array = "_ZN6jittor14get_pyjt_arrayEP7_objectRSt6vectorIlSaIlEERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERPv";
#endif
make_pyjt_array = (PyObject* (*)(const vector<int64>& shape, const string& dtype, const void* data))dlsym(RTLD_DEFAULT, symbol_make_pyjt_array);
get_pyjt_array = (void (*)(PyObject* obj, vector<int64>& shape, string& dtype, void*& data))dlsym(RTLD_DEFAULT, symbol_gen_pyjt_array);
if (!make_pyjt_array || !get_pyjt_array) {
std::cerr << "get symbol failed." << std::endl;
exit(1);
}
}
inline ~Console() {

View File

@ -141,7 +141,7 @@ size_t skip_comments(const string& src, size_t i) {
return i;
}
void process(string src, vector<string>& input_names) {
void process(string src, vector<string>& input_names, string& cmd) {
for (size_t i=0; i<src.size(); i++) {
i = skip_comments(src, i);
if (i>=src.size()) break;
@ -163,6 +163,20 @@ void process(string src, vector<string>& input_names) {
input_names.push_back(inc);
}
}
if (l-k>2 && src[k] == 'J' && src[k+1] == 'T' && j-i==6 && src.substr(i,j-i) == "#ifdef") {
auto inc = src.substr(k, l-k);
auto env = getenv(inc.c_str());
if (env && string(env)!="0") {
string dflag = " -D"+inc+"="+string(env)+" -o ";
if (cmd.find(dflag) == string::npos) {
// -D flags should insert before -o flag
auto cmds = split(cmd, " -o ", 2);
if (cmds.size() == 2) {
cmd = cmds[0] + dflag + cmds[1];
}
}
}
}
i=l;
}
}
@ -191,7 +205,7 @@ static inline bool is_full_path(const string& name) {
#endif
}
bool cache_compile(const string& cmd, const string& cache_path, const string& jittor_path) {
bool cache_compile(string cmd, const string& cache_path, const string& jittor_path) {
vector<string> input_names;
map<string,vector<string>> extra;
string output_name;
@ -199,29 +213,21 @@ bool cache_compile(const string& cmd, const string& cache_path, const string& ji
string output_cache_key;
bool ran = false;
output_cache_key = read_all(output_name+".key");
// string cd_cmd = cache_path.size() ? "cd " + cache_path + " && " + cmd : cmd;
string cd_cmd = cmd;
if (output_cache_key.size() == 0) {
LOGvv << "Cache key of" << output_name << "not found.";
LOGvvv << "Run cmd:" << cmd;
check_win_file(output_name);
system_with_check(cd_cmd.c_str());
ran = true;
}
string cache_key = cmd;
cache_key += "\n";
string cache_key;
unordered_set<string> processed;
auto src_path = join(jittor_path, "src");
const auto& extra_include = extra["I"];
for (size_t i=0; i<input_names.size(); i++) {
if (processed.count(input_names[i]) != 0)
continue;
if (input_names[i] == "dynamic_lookup")
continue;
processed.insert(input_names[i]);
auto src = read_all(input_names[i]);
ASSERT(src.size()) << "Source read failed:" << input_names[i];
auto hash = S(hash64(src));
vector<string> new_names;
process(src, new_names);
process(src, new_names, cmd);
for (auto& name : new_names) {
string full_name;
if (name.substr(0, 4) == "jit/" || name.substr(0, 4) == "gen/")
@ -251,11 +257,18 @@ bool cache_compile(const string& cmd, const string& cache_path, const string& ji
cache_key += hash;
cache_key += "\n";
}
cache_key = cmd + "\n" + cache_key;
if (output_cache_key.size() == 0) {
LOGvv << "Cache key of" << output_name << "not found.";
LOGvvv << "Run cmd:" << cmd;
system_with_check(cmd.c_str());
ran = true;
}
if (output_cache_key.size() != 0 && output_cache_key != cache_key) {
LOGvv << "Cache key of" << output_name << "changed.";
LOGvvv << "Run cmd:" << cmd;
check_win_file(output_name);
system_with_check(cd_cmd.c_str());
system_with_check(cmd.c_str());
ran = true;
}
if (output_cache_key != cache_key) {
@ -326,7 +339,8 @@ void test_find_nams_error(string cmd) {
void test_process(string src, vector<string> files) {
vector<string> ifiles;
jittor::jit_compiler::process(src, ifiles);
string cmd;
jittor::jit_compiler::process(src, ifiles, cmd);
CHECK(files.size() == ifiles.size());
for (size_t i=0; i<files.size(); i++)
CHECKop(files[i],==,ifiles[i]);

View File

@ -14,7 +14,7 @@ string read_all(const string& fname);
void write(const string& fname, const string& src);
bool file_exist(const string& fname);
string join(string a, string b);
bool cache_compile(const string& cmd, const string& cache_path="", const string& jittor_path="");
bool cache_compile(string cmd, const string& cache_path="", const string& jittor_path="");
} // jit_compiler
} // jittor

View File

@ -8,11 +8,11 @@
#include "pyjt/py_converter.h"
#include "pyjt/py_arg_printer.h"
#ifdef __clang__
#pragma clang diagnostic ignored "-Wdefaulted-function-deleted"
// #pragma clang diagnostic ignored "-Wdefaulted-function-deleted"
#endif
#ifdef __GNUC__
#endif
#ifndef _WIN32
#ifdef __linux__
#include <sys/prctl.h>
#endif
#include <signal.h>
@ -23,7 +23,7 @@
namespace jittor {
void init_subprocess() {
#ifndef _WIN32
#ifdef __linux__
prctl(PR_SET_PDEATHSIG, SIGKILL);
#endif
}

View File

@ -206,6 +206,8 @@ int segfault_happen = 0;
string thread_local thread_name;
static int _pid = getpid();
vector<void(*)()> cleanup_callback;
vector<void(*)()> sigquit_callback;
int64 last_q_time;
#ifdef _WIN32
void handle_signal(int signal) {
@ -214,13 +216,49 @@ void handle_signal(int signal) {
abort();
}
#else
static inline void do_exit() {
#ifdef __APPLE__
_Exit(1);
#else
std::quick_exit(1);
#endif
}
void segfault_sigaction(int signal, siginfo_t *si, void *arg) {
if (signal == SIGQUIT) {
if (_pid == getpid()) {
std::cerr << "Caught SIGQUIT" << std::endl;
int64 now = clock();
if (now > last_q_time && last_q_time+CLOCKS_PER_SEC/10 > now) {
last_q_time = now;
std::cerr << "GDB attach..." << std::endl;
breakpoint();
} else {
last_q_time = now;
for (auto f : sigquit_callback)
f();
}
}
return;
}
if (signal == SIGCHLD) {
if (si->si_code != CLD_EXITED && si->si_status != SIGTERM) {
LOGe << "Caught SIGCHLD"
<< "si_errno:" << si->si_errno
<< "si_code:" << si->si_code
<< "si_status:" << si->si_status
<< ", quick exit";
exited = true;
do_exit();
}
return;
}
if (signal == SIGINT) {
if (_pid == getpid()) {
LOGe << "Caught SIGINT, quick exit";
}
exited = true;
std::quick_exit(1);
do_exit();
}
std::cerr << "Caught segfault at address " << si->si_addr << ", "
<< "thread_name: '" << thread_name << "', flush log..." << std::endl;
@ -266,6 +304,7 @@ int register_sigaction() {
sigaction(SIGSTOP, &sa, NULL);
sigaction(SIGFPE, &sa, NULL);
sigaction(SIGINT, &sa, NULL);
sigaction(SIGCHLD, &sa, NULL);
sigaction(SIGILL, &sa, NULL);
sigaction(SIGBUS, &sa, NULL);
sigaction(SIGQUIT, &sa, NULL);
@ -364,6 +403,34 @@ but you can hot fix it by this command:
)";
}
static inline void check_cuda_gcc_version(const string& output) {
/* if such error occur:
error: identifier "__is_assignable" is undefined
this means your gcc version is not match with nvcc,
for example, nvcc 10 support gcc<=7, nvcc 11 support gcc<=9,
https://gist.github.com/ax3l/9489132
*/
string pat = "__is_assignable";
auto id = output.find(pat);
if (id == string::npos) return;
LOGf << output << R"(
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Dear user, your nvcc and gcc version are still not match
after dirty hack, your should install the correct version of g++
or nvcc, for example, nvcc 10 support g++<=7, nvcc 11 support g++<=9,
here is the NVCC Compatibility Matrix:
https://gist.github.com/ax3l/9489132
Please install correct version of gcc, for example:
>>> sudo apt install g++-7
After your g++ is installed, using enviroment variable `cc_path` to
tell jittor use the correct version of g++, for example:
>>> cc_path='g++-7' python3.7 -m jittor.test.test_core
If you still have problems, please contact us:
https://github.com/Jittor/jittor/issues
)";
}
#ifdef _WIN32
int system_popen(const char *cmd) {
HANDLE g_hChildStd_OUT_Rd = NULL;
@ -444,6 +511,7 @@ int system_popen(const char *cmd) {
if (ec) {
check_cuda_unsupport_version(output);
check_cuda_gcc_version(output);
}
return ec;
}
@ -468,6 +536,7 @@ int system_popen(const char* cmd) {
}
if (ret) {
check_cuda_unsupport_version(output);
check_cuda_gcc_version(output);
}
return ret;
}

View File

@ -34,9 +34,9 @@ constexpr int32_t basename_index(const char * const path, const int32_t index =
#define PREDICT_BRANCH_NOT_TAKEN(x) (__builtin_expect(x, 0))
extern "C" uint32_t get_tid();
extern "C" bool g_supports_color;
extern "C" void print_prefix(std::ostream* out);
extern uint32_t get_tid();
extern bool g_supports_color;
extern void print_prefix(std::ostream* out);
#ifdef _WIN32
constexpr char green[] = "\x1b[1;32m";
@ -83,10 +83,10 @@ static void get_color(char level, int verbose, const char*& color_begin, const c
#endif
extern "C" void send_log(std::ostringstream&& out);
extern "C" void flush_log();
extern "C" void log_capture_start();
extern "C" void log_capture_stop();
extern void send_log(std::ostringstream&& out);
extern void flush_log();
extern void log_capture_start();
extern void log_capture_stop();
extern std::vector<std::map<string,string>> log_capture_read();
extern string thread_local thread_name;
@ -170,9 +170,10 @@ template<class T> T get_from_env(const char* name,const T& _default) {
template<> std::string get_from_env(const char* name, const std::string& _default);
#define DECLARE_FLAG(type, name) \
extern "C" type name; \
extern "C" std::string doc_ ## name; \
extern "C" void set_ ## name (const type&);
extern type name; \
extern std::string doc_ ## name; \
extern void set_ ## name (const type&);
#ifdef JIT

View File

@ -25,15 +25,15 @@ bool endswith(const string& a, const string& b) {
vector<string> split(const string& s, const string& sep, int max_split) {
vector<string> ret;
int pos = -1, pos_next;
int pos = 0, pos_next;
while (1) {
pos_next = s.find(sep, pos+1);
pos_next = s.find(sep, pos);
if (pos_next == (int)string::npos || (int)ret.size() == max_split-1) {
ret.push_back(s.substr(pos+sep.size()));
ret.push_back(s.substr(pos));
return ret;
}
ret.push_back(s.substr(pos+sep.size(), pos_next-pos-sep.size()));
pos = pos_next;
ret.push_back(s.substr(pos, pos_next-pos));
pos = pos_next + sep.size();
}
ASSERT(max_split==0);
return ret;

Some files were not shown because too many files have changed in this diff Show More