mirror of https://github.com/Jittor/Jittor
Merge branch 'master' of https://github.com/Jittor/jittor
This commit is contained in:
commit
4e38190483
|
@ -0,0 +1,25 @@
|
|||
---
|
||||
name: Bug report
|
||||
about: Create a report to help us improve
|
||||
title: ''
|
||||
labels: ''
|
||||
assignees: ''
|
||||
|
||||
---
|
||||
|
||||
## Describe the bug
|
||||
A clear and concise description of what the bug is. 使用中文也可以。
|
||||
|
||||
## Full Log
|
||||
|
||||
Provide a full log of Jittor execution, Jittor will log environment information which help us to locate your bugs. Provide a screenshot is also acceptable.
|
||||
|
||||
## Minimal Reproduce
|
||||
|
||||
Reproduce this error with a file or several lines of code.
|
||||
If it is not possible, leave it blank.
|
||||
|
||||
## Expected behavior
|
||||
A clear and concise description of what you expected to happen.
|
||||
|
||||
If you are submitting an issue for the first time, please refer to [our guideline](https://github.com/Jittor/jittor/issues/238)
|
|
@ -1,5 +1,6 @@
|
|||
my
|
||||
.refresh
|
||||
.DS_Store
|
||||
__pycache__
|
||||
.ipynb_checkpoints/
|
||||
.vscode/
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
exclude __data__
|
||||
exclude __pycache__
|
||||
prune **/__data__/
|
||||
prune **/__pycache__
|
||||
prune *.pyc
|
38
README.cn.md
38
README.cn.md
|
@ -89,7 +89,7 @@ for i,(x,y) in enumerate(get_data(n)):
|
|||
Jittor框架对环境要求如下:
|
||||
|
||||
|
||||
* 操作系统: **Linux**(e.g. Ubuntu/CentOS/Arch) 或 **Windows Subsystem of Linux(WSL)**
|
||||
* 操作系统: **Linux**(e.g. Ubuntu/CentOS/Arch), **macOS**(x86_64)或 **Windows Subsystem of Linux(WSL)**
|
||||
* Python:版本 >= 3.7
|
||||
* C++编译器 (需要下列至少一个)
|
||||
- g++ (>=5.4.0)
|
||||
|
@ -100,7 +100,9 @@ Jittor框架对环境要求如下:
|
|||
如果您不希望手动配置环境,我们推荐使用 Docker 进行安装。
|
||||
除此之外,您还可以使用 pip 安装和手动安装。
|
||||
|
||||
注意:目前Jittor通过WSL的方式在Windows操作系统上运行,WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10),WSL版本目前尚不支持CUDA。
|
||||
注意1:目前Jittor通过WSL的方式在Windows操作系统上运行,WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10),WSL版本目前尚不支持CUDA。
|
||||
|
||||
注意2:macOS 用户需要安装额外依赖,请参考 [macOS 安装](#macOS-安装)。
|
||||
|
||||
Jittor 提供了三种安装方法:docker,pip和手动安装:
|
||||
|
||||
|
@ -112,6 +114,7 @@ Jittor 提供了三种安装方法:docker,pip和手动安装:
|
|||
|
||||
|
||||
|
||||
|
||||
## Docker 安装
|
||||
|
||||
我们提供了Docker安装方式,免去您配置环境,Docker安装方法如下:
|
||||
|
@ -145,6 +148,27 @@ python3.7 -m jittor.test.test_example
|
|||
如果测试运行通过,恭喜你已经安装完成.
|
||||
jittor会自动在路径中寻找合适的编译器, 如果您希望手动指定编译器, 请使用环境变量 `cc_path` 和 `nvcc_path`(可选).
|
||||
|
||||
## macOS 安装
|
||||
|
||||
|
||||
macOS 请使用 [homebrew](https://brew.sh) 安装额外的依赖 (python>=3.7, onednn)。
|
||||
|
||||
|
||||
```bash
|
||||
brew install python@3.7 onednn libomp
|
||||
```
|
||||
|
||||
之后您可以通过 pip 安装 jittor,并测试是否可以成功运行。
|
||||
|
||||
|
||||
```bash
|
||||
python3.7 -m pip install jittor
|
||||
python3.7 -m jittor.test.test_example
|
||||
```
|
||||
|
||||
目前在macOS中,jittor 只支持 CPU 计算。
|
||||
|
||||
|
||||
## 手动安装
|
||||
|
||||
|
||||
|
@ -313,11 +337,11 @@ help(jt.ops)
|
|||
|
||||
|
||||
|
||||
[1]: notebook/example.src.md "示例"
|
||||
[2]: notebook/basics.src.md "基本概念"
|
||||
[3]: notebook/meta_op.src.md "元算子"
|
||||
[4]: notebook/custom_op.src.md "自定义算子"
|
||||
[5]: notebook/profiler.src.md "性能分析器"
|
||||
[1]: python/jittor/notebook/example.src.md "示例"
|
||||
[2]: python/jittor/notebook/basics.src.md "基本概念"
|
||||
[3]: python/jittor/notebook/meta_op.src.md "元算子"
|
||||
[4]: python/jittor/notebook/custom_op.src.md "自定义算子"
|
||||
[5]: python/jittor/notebook/profiler.src.md "性能分析器"
|
||||
|
||||
|
||||
这些notebooks可以通过python3.7 -m jittor.notebook在您自己的计算机中运行。
|
||||
|
|
40
README.md
40
README.md
|
@ -92,9 +92,10 @@ We provide some jupyter notebooks to help you quick start with Jittor.
|
|||
|
||||
|
||||
|
||||
|
||||
Jittor environment requirements:
|
||||
|
||||
* System: **Linux**(e.g. Ubuntu/CentOS/Arch) (or **Windows** Subsystem of Linux)
|
||||
* System: **Linux**(e.g. Ubuntu/CentOS/Arch), **macOS**, or **Windows Subsystem of Linux (WSL)**
|
||||
* Python version >= 3.7
|
||||
* CPU compiler (require at least one of the following)
|
||||
* g++ (>=5.4.0)
|
||||
|
@ -105,7 +106,9 @@ Jittor environment requirements:
|
|||
|
||||
|
||||
|
||||
Note: Currently Jittor runs on the Windows operating system through WSL. For the installation method of WSL, please refer to [Microsoft official website](https://docs.microsoft.com/en-us/windows/wsl/install-win10). WSL does not yet support CUDA.
|
||||
Note#1: Currently Jittor runs on the Windows operating system through WSL. For the installation method of WSL, please refer to [Microsoft official website](https://docs.microsoft.com/en-us/windows/wsl/install-win10). WSL does not yet support CUDA.
|
||||
|
||||
Note#2: macOS users have to install additional dependencies, see [macOS install](#macOS-install).
|
||||
|
||||
Jittor offers three ways to install: docker, pip, or manual.
|
||||
|
||||
|
@ -139,6 +142,27 @@ python3.7 -m jittor.test.test_example
|
|||
```
|
||||
|
||||
|
||||
|
||||
## macOS install
|
||||
|
||||
|
||||
Please first install additional dependencies with [homebrew](https://brew.sh).
|
||||
|
||||
```bash
|
||||
brew install python@3.7 onednn libomp
|
||||
```
|
||||
|
||||
|
||||
Then you can install jittor through pip and run the example.
|
||||
|
||||
```bash
|
||||
python3.7 -m pip install jittor
|
||||
python3.7 -m jittor.test.test_example
|
||||
```
|
||||
|
||||
|
||||
Currently jittor only supports CPU in macOS.
|
||||
|
||||
## manual install
|
||||
|
||||
We will show how to install Jittor in Ubuntu 16.04 step by step, Other Linux distributions may have similar commands.
|
||||
|
@ -307,11 +331,11 @@ If you want to know more about Jittor, please check out the notebooks below:
|
|||
|
||||
|
||||
|
||||
[1]: notebook/example.src.md "example"
|
||||
[2]: notebook/basics.src.md "basics"
|
||||
[3]: notebook/meta_op.src.md "meta_op"
|
||||
[4]: notebook/custom_op.src.md "custom_op"
|
||||
[5]: notebook/profiler.src.md "profiler"
|
||||
[1]: python/jittor/notebook/example.src.md "example"
|
||||
[2]: python/jittor/notebook/basics.src.md "basics"
|
||||
[3]: python/jittor/notebook/meta_op.src.md "meta_op"
|
||||
[4]: python/jittor/notebook/custom_op.src.md "custom_op"
|
||||
[5]: python/jittor/notebook/profiler.src.md "profiler"
|
||||
|
||||
Those notebooks can be started in your own computer by `python3.7 -m jittor.notebook`
|
||||
|
||||
|
@ -366,7 +390,7 @@ Jittor is currently maintained by the [Tsinghua CSCG Group](https://cg.cs.tsingh
|
|||
@article{hu2020jittor,
|
||||
title={Jittor: a novel deep learning framework with meta-operators and unified graph execution},
|
||||
author={Hu, Shi-Min and Liang, Dun and Yang, Guo-Ye and Yang, Guo-Wei and Zhou, Wen-Yang},
|
||||
journal={Information Sciences},
|
||||
journal={Science China Information Sciences},
|
||||
volume={63},
|
||||
number={222103},
|
||||
pages={1--21},
|
||||
|
|
|
@ -113,7 +113,7 @@ We provide some jupyter notebooks to help you quick start with Jittor.
|
|||
Jittor框架对环境要求如下:
|
||||
|
||||
|
||||
* 操作系统: **Linux**(e.g. Ubuntu/CentOS/Arch) 或 **Windows Subsystem of Linux(WSL)**
|
||||
* 操作系统: **Linux**(e.g. Ubuntu/CentOS/Arch), **macOS**(x86_64)或 **Windows Subsystem of Linux(WSL)**
|
||||
* Python:版本 >= 3.7
|
||||
* C++编译器 (需要下列至少一个)
|
||||
- g++ (>=5.4.0)
|
||||
|
@ -124,13 +124,15 @@ Jittor框架对环境要求如下:
|
|||
如果您不希望手动配置环境,我们推荐使用 Docker 进行安装。
|
||||
除此之外,您还可以使用 pip 安装和手动安装。
|
||||
|
||||
注意:目前Jittor通过WSL的方式在Windows操作系统上运行,WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10),WSL版本目前尚不支持CUDA。
|
||||
注意1:目前Jittor通过WSL的方式在Windows操作系统上运行,WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10),WSL版本目前尚不支持CUDA。
|
||||
|
||||
注意2:macOS 用户需要安装额外依赖,请参考 [macOS 安装](#macOS-安装)。
|
||||
|
||||
Jittor 提供了三种安装方法:docker,pip和手动安装:
|
||||
|
||||
Jittor environment requirements:
|
||||
|
||||
* System: **Linux**(e.g. Ubuntu/CentOS/Arch) (or **Windows** Subsystem of Linux)
|
||||
* System: **Linux**(e.g. Ubuntu/CentOS/Arch), **macOS**, or **Windows Subsystem of Linux (WSL)**
|
||||
* Python version >= 3.7
|
||||
* CPU compiler (require at least one of the following)
|
||||
* g++ (>=5.4.0)
|
||||
|
@ -141,7 +143,9 @@ Jittor environment requirements:
|
|||
|
||||
|
||||
|
||||
Note: Currently Jittor runs on the Windows operating system through WSL. For the installation method of WSL, please refer to [Microsoft official website](https://docs.microsoft.com/en-us/windows/wsl/install-win10). WSL does not yet support CUDA.
|
||||
Note#1: Currently Jittor runs on the Windows operating system through WSL. For the installation method of WSL, please refer to [Microsoft official website](https://docs.microsoft.com/en-us/windows/wsl/install-win10). WSL does not yet support CUDA.
|
||||
|
||||
Note#2: macOS users have to install additional dependencies, see [macOS install](#macOS-install).
|
||||
|
||||
Jittor offers three ways to install: docker, pip, or manual.
|
||||
|
||||
|
@ -183,6 +187,31 @@ python3.7 -m jittor.test.test_example
|
|||
如果测试运行通过,恭喜你已经安装完成.
|
||||
jittor会自动在路径中寻找合适的编译器, 如果您希望手动指定编译器, 请使用环境变量 `cc_path` 和 `nvcc_path`(可选).
|
||||
|
||||
## macOS 安装
|
||||
|
||||
## macOS install
|
||||
|
||||
macOS 请使用 [homebrew](https://brew.sh) 安装额外的依赖 (python>=3.7, onednn)。
|
||||
|
||||
Please first install additional dependencies with [homebrew](https://brew.sh).
|
||||
|
||||
```bash
|
||||
brew install python@3.7 onednn libomp
|
||||
```
|
||||
|
||||
之后您可以通过 pip 安装 jittor,并测试是否可以成功运行。
|
||||
|
||||
Then you can install jittor through pip and run the example.
|
||||
|
||||
```bash
|
||||
python3.7 -m pip install jittor
|
||||
python3.7 -m jittor.test.test_example
|
||||
```
|
||||
|
||||
目前在macOS中,jittor 只支持 CPU 计算。
|
||||
|
||||
Currently jittor only supports CPU in macOS.
|
||||
|
||||
## 手动安装
|
||||
## manual install
|
||||
|
||||
|
@ -387,16 +416,16 @@ If you want to know more about Jittor, please check out the notebooks below:
|
|||
|
||||
|
||||
|
||||
[1]: notebook/example.src.md "example"
|
||||
[2]: notebook/basics.src.md "basics"
|
||||
[3]: notebook/meta_op.src.md "meta_op"
|
||||
[4]: notebook/custom_op.src.md "custom_op"
|
||||
[5]: notebook/profiler.src.md "profiler"
|
||||
[1]: notebook/example.src.md "示例"
|
||||
[2]: notebook/basics.src.md "基本概念"
|
||||
[3]: notebook/meta_op.src.md "元算子"
|
||||
[4]: notebook/custom_op.src.md "自定义算子"
|
||||
[5]: notebook/profiler.src.md "性能分析器"
|
||||
[1]: python/jittor/notebook/example.src.md "example"
|
||||
[2]: python/jittor/notebook/basics.src.md "basics"
|
||||
[3]: python/jittor/notebook/meta_op.src.md "meta_op"
|
||||
[4]: python/jittor/notebook/custom_op.src.md "custom_op"
|
||||
[5]: python/jittor/notebook/profiler.src.md "profiler"
|
||||
[1]: python/jittor/notebook/example.src.md "示例"
|
||||
[2]: python/jittor/notebook/basics.src.md "基本概念"
|
||||
[3]: python/jittor/notebook/meta_op.src.md "元算子"
|
||||
[4]: python/jittor/notebook/custom_op.src.md "自定义算子"
|
||||
[5]: python/jittor/notebook/profiler.src.md "性能分析器"
|
||||
|
||||
Those notebooks can be started in your own computer by `python3.7 -m jittor.notebook`
|
||||
|
||||
|
|
|
@ -27,6 +27,9 @@
|
|||
jittor.mpi
|
||||
jittor.linalg
|
||||
jittor.console
|
||||
jittor.distributions
|
||||
jittor.attention
|
||||
jittor.loss3d
|
||||
|
||||
|
||||
.. toctree::
|
||||
|
@ -34,7 +37,6 @@
|
|||
:caption: 其他:
|
||||
|
||||
教程 <https://cg.cs.tsinghua.edu.cn/jittor/tutorial/>
|
||||
todo
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
jittor.attention
|
||||
=====================
|
||||
|
||||
这里是Jittor的 数据变换 模块的API文档,您可以通过`from jittor import attention`来获取该模块。
|
||||
这里是Jittor的 注意力 模块的API文档,您可以通过`from jittor import attention`来获取该模块。
|
||||
|
||||
```eval_rst
|
||||
.. automodule:: jittor.attention
|
||||
|
|
|
@ -0,0 +1,10 @@
|
|||
jittor.distributions
|
||||
=====================
|
||||
|
||||
这里是Jittor的随机分布模块的API文档,您可以通过`from jittor import distributions`来获取该模块。
|
||||
|
||||
```eval_rst
|
||||
.. automodule:: jittor.distributions
|
||||
:members:
|
||||
:undoc-members:
|
||||
```
|
|
@ -0,0 +1,10 @@
|
|||
jittor.loss3d
|
||||
=====================
|
||||
|
||||
这里是Jittor的 3d 损失函数 模块的API文档,您可以通过`from jittor import loss3d`来获取该模块。
|
||||
|
||||
```eval_rst
|
||||
.. automodule:: jittor.loss3d
|
||||
:members: chamfer_loss, ChamferLoss, earth_mover_distance, EarthMoverDistance
|
||||
:undoc-members:
|
||||
```
|
|
@ -95,6 +95,9 @@ def val(epoch):
|
|||
|
||||
下面是 jittor 的 mpi api reference.
|
||||
|
||||
* `jt.world_rank`: 获取当前进程总数量,如果没有用mpi,则为1。
|
||||
* `jt.rank`: 获取当前进程的编号,区间为`0 ~ jt.world_rank-1`, 如果没有用mpi,则为0。
|
||||
|
||||
```eval_rst
|
||||
.. automodule:: jittor_mpi_core
|
||||
:members:
|
||||
|
|
|
@ -10,7 +10,7 @@ jittor.nn
|
|||
|
||||
.. automodule:: jittor.nn
|
||||
:imported-members:
|
||||
:members: Pool, pool, AdaptiveAvgPool2d
|
||||
:members: Pool, pool, AdaptiveAvgPool2d, Pool3d, AdaptiveMaxPool2d, AdaptiveAvgPool3d, AdaptiveMaxPool2d, pool3d, AvgPool2d, AvgPool3d, avg_pool2d, MaxPool2d, MaxPool3d, max_pool2d, max_pool3d, MaxUnpool2d, MaxUnpool3d
|
||||
:undoc-members:
|
||||
|
||||
.. autoclass:: jittor.nn.ReLU
|
||||
|
|
|
@ -9,7 +9,7 @@
|
|||
# file 'LICENSE.txt', which is part of this source code package.
|
||||
# ***************************************************************
|
||||
|
||||
__version__ = '1.2.3.12'
|
||||
__version__ = '1.2.3.92'
|
||||
from jittor_utils import lock
|
||||
with lock.lock_scope():
|
||||
ori_int = int
|
||||
|
@ -23,7 +23,7 @@ with lock.lock_scope():
|
|||
from jittor_core import *
|
||||
from jittor_core.ops import *
|
||||
from . import compile_extern
|
||||
from .compile_extern import mkl_ops, mpi, mpi_ops, in_mpi, rank
|
||||
from .compile_extern import mkl_ops, mpi, mpi_ops, in_mpi, rank, world_size
|
||||
if core.get_device_count() == 0:
|
||||
has_cuda = compile_extern.has_cuda = compiler.has_cuda = False
|
||||
if has_cuda:
|
||||
|
@ -437,11 +437,11 @@ def pow(x, y):
|
|||
Var.pow = Var.__pow__ = pow
|
||||
|
||||
def argmax(x, dim, keepdims:bool=False):
|
||||
return x.arg_reduce("max", dim, keepdims)
|
||||
return jt.arg_reduce(x, "max", dim, keepdims)
|
||||
Var.argmax = argmax
|
||||
|
||||
def argmin(x, dim, keepdims:bool=False):
|
||||
return x.arg_reduce("min", dim, keepdims)
|
||||
return jt.arg_reduce(x, "min", dim, keepdims)
|
||||
Var.argmin = argmin
|
||||
|
||||
def randn(*size, dtype="float32", requires_grad=True) -> Var:
|
||||
|
@ -562,7 +562,7 @@ def randint(low, high=None, shape=(1,), dtype="int32") -> Var:
|
|||
'''
|
||||
if high is None: low, high = 0, low
|
||||
v = (jt.random(shape) * (high - low) + low).clamp(low, high-0.5)
|
||||
v = jt.floor(v)
|
||||
v = jt.floor_int(v)
|
||||
return v.astype(dtype)
|
||||
|
||||
def randint_like(x, low, high=None) -> Var:
|
||||
|
@ -780,11 +780,16 @@ class Module:
|
|||
stack = []
|
||||
def callback(parents, k, v, n):
|
||||
stack.append(str(k))
|
||||
for k2, p in v.__dict__.items():
|
||||
if k2.startswith("_"): continue
|
||||
dc = v.__dict__
|
||||
if isinstance(v, nn.ParameterList):
|
||||
dc = v.params
|
||||
for k2, p in dc.items():
|
||||
if isinstance(k2, str) and k2.startswith("_"): continue
|
||||
if isinstance(p, Var):
|
||||
ps.append(p)
|
||||
p.name(".".join(stack[1:]+[str(k2)]))
|
||||
pname = ".".join(stack[1:]+[str(k2)])
|
||||
if len(pname) > len(p.name()):
|
||||
p.name(pname)
|
||||
def callback_leave(parents, k, v, n):
|
||||
stack.pop()
|
||||
self.dfs([], None, callback, callback_leave)
|
||||
|
@ -822,53 +827,101 @@ class Module:
|
|||
self.dfs([], "", callback, callback_leave)
|
||||
return ms
|
||||
|
||||
def register_forward_hook(self, func):
|
||||
def requires_grad_(self, requires_grad=True):
|
||||
self._requires_grad = requires_grad
|
||||
self._place_hooker()
|
||||
return self
|
||||
|
||||
def __hooked_call__(self, *args, **kw):
|
||||
if hasattr(self, "__fhook2__"):
|
||||
if len(kw):
|
||||
self.__fhook2__(self, args, kw)
|
||||
else:
|
||||
self.__fhook2__(self, args)
|
||||
if hasattr(self, "__bihook__"):
|
||||
if len(kw):
|
||||
LOG.w("backward hook not support kw")
|
||||
args = grad_hooker(args, self.__bihook__)
|
||||
if hasattr(self, "_requires_grad") and not self._requires_grad:
|
||||
with jt.no_grad():
|
||||
ret = self.__hooked_call__(*args, **kw)
|
||||
else:
|
||||
ret = self.__hooked_call__(*args, **kw)
|
||||
if hasattr(self, "__bohook__"):
|
||||
if len(kw):
|
||||
LOG.w("backward hook not support kw")
|
||||
if isinstance(ret, Var):
|
||||
ret = grad_hooker((ret,), self.__bohook__)[0]
|
||||
else:
|
||||
ret = grad_hooker(ret, self.__bohook__)
|
||||
if hasattr(self, "__fhook__"):
|
||||
if len(kw):
|
||||
self.__fhook__(self, args, ret, kw)
|
||||
else:
|
||||
self.__fhook__(self, args, ret)
|
||||
return ret
|
||||
|
||||
def _place_hooker(self):
|
||||
cls = self.__class__
|
||||
self.__fhook__ = func
|
||||
if hasattr(cls, "__hooked__"):
|
||||
return
|
||||
cls.__hooked__ = True
|
||||
origin_call = cls.__call__
|
||||
def new_call(self, *args, **kw):
|
||||
ret = origin_call(self, *args, **kw)
|
||||
if hasattr(self, "__fhook__"):
|
||||
if len(kw):
|
||||
self.__fhook__(self, args, ret, kw)
|
||||
else:
|
||||
self.__fhook__(self, args, ret)
|
||||
return ret
|
||||
self.__class__.__call__ = new_call
|
||||
cls.__call__, cls.__hooked_call__ = \
|
||||
cls.__hooked_call__, cls.__call__
|
||||
|
||||
def register_forward_hook(self, func):
|
||||
self.__fhook__ = func
|
||||
self._place_hooker()
|
||||
|
||||
def remove_forward_hook(self):
|
||||
cls = self.__class__
|
||||
if hasattr(cls,"__hooked__"):
|
||||
delattr(cls,"__hooked__")
|
||||
if hasattr(self,"__fhook__"):
|
||||
delattr(self,"__fhook__")
|
||||
|
||||
def register_pre_forward_hook(self, func):
|
||||
cls = self.__class__
|
||||
self.__fhook2__ = func
|
||||
if hasattr(cls, "__hooked2__"):
|
||||
return
|
||||
cls.__hooked2__ = True
|
||||
origin_call = cls.__call__
|
||||
def new_call(self, *args, **kw):
|
||||
if hasattr(self, "__fhook2__"):
|
||||
if len(kw):
|
||||
self.__fhook2__(self, args, kw)
|
||||
else:
|
||||
self.__fhook2__(self, args)
|
||||
return origin_call(self, *args, **kw)
|
||||
self.__class__.__call__ = new_call
|
||||
self._place_hooker()
|
||||
|
||||
def remove_pre_forward_hook(self):
|
||||
cls = self.__class__
|
||||
if hasattr(cls,"__hooked2__"):
|
||||
delattr(cls,"__hooked2__")
|
||||
if hasattr(self,"__fhook2__"):
|
||||
delattr(self,"__fhook2__")
|
||||
|
||||
def register_input_backward_hook(self, func):
|
||||
self.__bihook__ = func
|
||||
self._place_hooker()
|
||||
|
||||
def remove_input_backward_hook(self):
|
||||
if hasattr(self,"__bihook__"):
|
||||
delattr(self,"__bihook__")
|
||||
|
||||
def register_output_backward_hook(self, func):
|
||||
self.__bohook__ = func
|
||||
self._place_hooker()
|
||||
|
||||
def remove_output_backward_hook(self):
|
||||
if hasattr(self,"__bohook__"):
|
||||
delattr(self,"__bohook__")
|
||||
|
||||
def register_backward_hook(self, func):
|
||||
''' hook both input and output on backpropergation of this module.
|
||||
Arguments of hook are defined as::
|
||||
|
||||
hook(module, grad_input:tuple(jt.Var), grad_output:tuple(jt.Var)) -> tuple(jt.Var) or None
|
||||
|
||||
`grad_input` is the origin gradients of input of this module, `grad_input` is the gradients of output of this module, return value is used to replace the gradient of input.
|
||||
'''
|
||||
_grad_output = None
|
||||
def bohook(grad_output):
|
||||
nonlocal _grad_output
|
||||
_grad_output = grad_output
|
||||
def bihook(grad_input):
|
||||
return func(self, grad_input, _grad_output)
|
||||
self.register_input_backward_hook(bihook)
|
||||
self.register_output_backward_hook(bohook)
|
||||
|
||||
def remove_backward_hook(self):
|
||||
self.remove_input_backward_hook()
|
||||
self.remove_output_backward_hook()
|
||||
|
||||
def children(self):
|
||||
cd = []
|
||||
def callback(parents, k, v, n):
|
||||
|
@ -1051,7 +1104,7 @@ can store value for backward computation)::
|
|||
return grad0 * self.y, grad1 * self.x
|
||||
a = jt.array(3.0)
|
||||
b = jt.array(4.0)
|
||||
func = MyFunc()
|
||||
func = MyFunc.apply
|
||||
c,d = func(a, b)
|
||||
da, db = jt.grad(c+d*3, [a, b])
|
||||
assert da.data == 4
|
||||
|
@ -1074,7 +1127,7 @@ can also be None)::
|
|||
return grad0 * self.y, None
|
||||
a = jt.array(3.0)
|
||||
b = jt.array(4.0)
|
||||
func = MyFunc()
|
||||
func = MyFunc.apply
|
||||
c,d = func(a, b)
|
||||
d.stop_grad()
|
||||
da, db = jt.grad(c+d*3, [a, b])
|
||||
|
@ -1144,6 +1197,44 @@ can also be None)::
|
|||
func = cls()
|
||||
return func(*args, **kw)
|
||||
|
||||
class GradHooker(Function):
|
||||
def __init__(self, hook):
|
||||
self.hook = hook
|
||||
|
||||
def execute(self, *args):
|
||||
return args
|
||||
|
||||
def grad(self, *grad_input):
|
||||
ret = self.hook(grad_input)
|
||||
if ret: grad_input = ret
|
||||
return grad_input
|
||||
|
||||
def grad_hooker(args, hook):
|
||||
hooker = GradHooker(hook)
|
||||
return hooker(*args)
|
||||
|
||||
def register_hook(v, hook):
|
||||
""" register hook of any jittor Variables, if hook return not None,
|
||||
the gradient of this variable will be alter, Example::
|
||||
|
||||
x = jt.array([0.0, 0.0])
|
||||
y = x * [1,2]
|
||||
y.register_hook(lambda g: g*2)
|
||||
dx = jt.grad(y, x)
|
||||
print(dx)
|
||||
# will be [2, 4]
|
||||
|
||||
"""
|
||||
def _hook(grads):
|
||||
g = hook(grads[0])
|
||||
if g is not None:
|
||||
return (g,)
|
||||
return None
|
||||
hooker = GradHooker(_hook)
|
||||
v.swap(hooker(v)[0])
|
||||
return v
|
||||
|
||||
Var.register_hook = register_hook
|
||||
|
||||
def make_module(func, exec_n_args=1):
|
||||
class MakeModule(Module):
|
||||
|
@ -1169,9 +1260,11 @@ def dirty_fix_pytorch_runtime_error():
|
|||
jt.dirty_fix_pytorch_runtime_error()
|
||||
import torch
|
||||
'''
|
||||
import os
|
||||
os.RTLD_GLOBAL = os.RTLD_GLOBAL | os.RTLD_DEEPBIND
|
||||
import os, platform
|
||||
|
||||
if platform.system() == 'Linux':
|
||||
os.RTLD_GLOBAL = os.RTLD_GLOBAL | os.RTLD_DEEPBIND
|
||||
|
||||
|
||||
import atexit
|
||||
|
||||
|
@ -1249,11 +1342,15 @@ def get_len(var):
|
|||
Var.__len__ = get_len
|
||||
int = int32
|
||||
Var.int = Var.int32
|
||||
Var.long = Var.int32
|
||||
float = float32
|
||||
Var.float = Var.float32
|
||||
double = float64
|
||||
Var.double = Var.float64
|
||||
|
||||
def is_var(v):
|
||||
return isinstance(v, Var)
|
||||
|
||||
# __array__ interface is used for np.array(jt_var)
|
||||
Var.__array__ = Var.numpy
|
||||
Var.__array_priority__ = 2000
|
||||
|
@ -1271,3 +1368,5 @@ from . import numpy2cupy
|
|||
from .contrib import concat
|
||||
from .misc import *
|
||||
from . import sparse
|
||||
from . import optim
|
||||
from . import dataset
|
||||
|
|
|
@ -5,6 +5,7 @@
|
|||
# file 'LICENSE.txt', which is part of this source code package.
|
||||
# ***************************************************************
|
||||
import os, sys, shutil
|
||||
import platform
|
||||
from .compiler import *
|
||||
from jittor_utils import run_cmd, get_version, get_int_version
|
||||
from jittor_utils.misc import download_url_to_local
|
||||
|
@ -23,71 +24,99 @@ def search_file(dirs, name, prefer_version=()):
|
|||
def install_mkl(root_folder):
|
||||
# origin url is
|
||||
# url = "https://github.com/intel/mkl-dnn/releases/download/v1.0.2/mkldnn_lnx_1.0.2_cpu_gomp.tgz"
|
||||
url = "https://cloud.tsinghua.edu.cn/f/da02bf62b55b4aa3b8ee/?dl=1"
|
||||
filename = "mkldnn_lnx_1.0.2_cpu_gomp.tgz"
|
||||
# newest version for oneDNN
|
||||
# url = "https://github.com/oneapi-src/oneDNN/releases/download/v2.2/dnnl_lnx_2.2.0_cpu_gomp.tgz"
|
||||
# filename = "dnnl_lnx_2.2.0_cpu_gomp.tgz"
|
||||
import platform
|
||||
if platform.system()=="Linux":
|
||||
if platform.machine()=='x86_64':
|
||||
filename = "dnnl_lnx_2.2.0_cpu_gomp.tgz"
|
||||
md5 = "35bbbdf550a9d8ad54db798e372000f6"
|
||||
elif platform.machine()=='aarch64':
|
||||
filename = "dnnl_lnx_2.2.0_cpu_gomp_aarch64.tgz"
|
||||
md5 = "72cf9b0b8fd6c3c786d35a9daaee22b8"
|
||||
else:
|
||||
raise RuntimeError(f"platform.machine()=={platform.machine()} not support yet,"
|
||||
" Please contact us on https://github.com/jittor/jittor ")
|
||||
else:
|
||||
raise RuntimeError(f"platform.machine()=={platform.machine()} not support yet,"
|
||||
" Please contact us on https://github.com/jittor/jittor ")
|
||||
|
||||
url = "https://cg.cs.tsinghua.edu.cn/jittor/assets/" + filename
|
||||
fullname = os.path.join(root_folder, filename)
|
||||
dirname = os.path.join(root_folder, filename.replace(".tgz",""))
|
||||
|
||||
if not os.path.isfile(os.path.join(dirname, "examples", "test")):
|
||||
if not os.path.isfile(os.path.join(dirname, "lib", "libmkldnn.so")):
|
||||
LOG.i("Downloading mkl...")
|
||||
download_url_to_local(url, filename, root_folder, "47187284ede27ad3bd64b5f0e7d5e730")
|
||||
# newest version for oneDNN
|
||||
# download_url_to_local(url, filename, root_folder, "35bbbdf550a9d8ad54db798e372000f6")
|
||||
download_url_to_local(url, filename, root_folder, md5)
|
||||
import tarfile
|
||||
|
||||
with tarfile.open(fullname, "r") as tar:
|
||||
tar.extractall(root_folder)
|
||||
|
||||
assert 0 == os.system(f"cd {dirname}/examples && "
|
||||
f"{cc_path} -std=c++14 cpu_cnn_inference_f32.cpp -Ofast -lmkldnn -I ../include -L ../lib -o test && LD_LIBRARY_PATH=../lib/ ./test")
|
||||
# newest version for oneDNN
|
||||
# assert 0 == os.system(f"cd {dirname}/examples && "
|
||||
# f"{cc_path} -std=c++14 cnn_inference_f32.cpp -Ofast -lmkldnn -I ../include -L ../lib -o test && LD_LIBRARY_PATH=../lib/ ./test")
|
||||
f"{cc_path} -std=c++14 cnn_inference_f32.cpp -Ofast -lmkldnn -I ../include -L ../lib -o test && LD_LIBRARY_PATH=../lib/ ./test")
|
||||
|
||||
def setup_mkl():
|
||||
global mkl_ops, use_mkl
|
||||
use_mkl = os.environ.get("use_mkl", "1")=="1"
|
||||
mkl_ops = None
|
||||
if not use_mkl: return
|
||||
|
||||
# pytorch mkl is conflict with jittor mkl
|
||||
# yield error "free: invalide size" or
|
||||
# "mmap error"
|
||||
# import pytorch(>1.8) first can fix this problem
|
||||
# try:
|
||||
# # jt.dirty_fix_pytorch_runtime_error()
|
||||
# import torch
|
||||
# from torch import nn
|
||||
# except:
|
||||
# torch = None
|
||||
|
||||
mkl_include_path = os.environ.get("mkl_include_path")
|
||||
mkl_lib_path = os.environ.get("mkl_lib_path")
|
||||
|
||||
if mkl_lib_path is None or mkl_include_path is None:
|
||||
mkl_install_sh = os.path.join(jittor_path, "script", "install_mkl.sh")
|
||||
LOG.v("setup mkl...")
|
||||
# mkl_path = os.path.join(cache_path, "mkl")
|
||||
# mkl_path decouple with cc_path
|
||||
from pathlib import Path
|
||||
mkl_path = os.path.join(str(Path.home()), ".cache", "jittor", "mkl")
|
||||
|
||||
make_cache_dir(mkl_path)
|
||||
install_mkl(mkl_path)
|
||||
mkl_home = ""
|
||||
for name in os.listdir(mkl_path):
|
||||
if name.startswith("mkldnn_lnx") and os.path.isdir(os.path.join(mkl_path, name)):
|
||||
mkl_home = os.path.join(mkl_path, name)
|
||||
break
|
||||
assert mkl_home!=""
|
||||
if platform.system() == 'Linux':
|
||||
if mkl_lib_path is None or mkl_include_path is None:
|
||||
mkl_install_sh = os.path.join(jittor_path, "script", "install_mkl.sh")
|
||||
LOG.v("setup mkl...")
|
||||
# mkl_path = os.path.join(cache_path, "mkl")
|
||||
# mkl_path decouple with cc_path
|
||||
from pathlib import Path
|
||||
mkl_path = os.path.join(str(Path.home()), ".cache", "jittor", "mkl")
|
||||
|
||||
make_cache_dir(mkl_path)
|
||||
install_mkl(mkl_path)
|
||||
mkl_home = ""
|
||||
for name in os.listdir(mkl_path):
|
||||
if name.startswith("dnnl") and os.path.isdir(os.path.join(mkl_path, name)):
|
||||
mkl_home = os.path.join(mkl_path, name)
|
||||
break
|
||||
assert mkl_home!=""
|
||||
mkl_include_path = os.path.join(mkl_home, "include")
|
||||
mkl_lib_path = os.path.join(mkl_home, "lib")
|
||||
|
||||
mkl_lib_name = os.path.join(mkl_lib_path, "libmkldnn.so")
|
||||
assert os.path.isdir(mkl_include_path)
|
||||
assert os.path.isdir(mkl_lib_path)
|
||||
assert os.path.isfile(mkl_lib_name)
|
||||
LOG.v(f"mkl_include_path: {mkl_include_path}")
|
||||
LOG.v(f"mkl_lib_path: {mkl_lib_path}")
|
||||
LOG.v(f"mkl_lib_name: {mkl_lib_name}")
|
||||
# We do not link manualy, link in custom ops
|
||||
# ctypes.CDLL(mkl_lib_name, dlopen_flags)
|
||||
mkl_lib_name = os.path.join(mkl_lib_path, "libmkldnn.so")
|
||||
assert os.path.isdir(mkl_include_path)
|
||||
assert os.path.isdir(mkl_lib_path)
|
||||
assert os.path.isfile(mkl_lib_name)
|
||||
LOG.v(f"mkl_include_path: {mkl_include_path}")
|
||||
LOG.v(f"mkl_lib_path: {mkl_lib_path}")
|
||||
LOG.v(f"mkl_lib_name: {mkl_lib_name}")
|
||||
# We do not link manualy, link in custom ops
|
||||
# ctypes.CDLL(mkl_lib_name, dlopen_flags)
|
||||
extra_flags = f" -I'{mkl_include_path}' -L'{mkl_lib_path}' -lmkldnn -Wl,-rpath='{mkl_lib_path}' "
|
||||
|
||||
elif platform.system() == 'Darwin':
|
||||
mkl_lib_paths = [
|
||||
"/usr/local/lib/libmkldnn.dylib", # x86_64
|
||||
"/opt/homebrew/lib/libmkldnn.dylib", # arm64
|
||||
]
|
||||
if not any([os.path.exists(lib) for lib in mkl_lib_paths]):
|
||||
raise RuntimeError("Not found onednn, please install it by the command 'brew install onednn'")
|
||||
extra_flags = f" -lmkldnn "
|
||||
|
||||
mkl_op_dir = os.path.join(jittor_path, "extern", "mkl", "ops")
|
||||
mkl_op_files = [os.path.join(mkl_op_dir, name) for name in os.listdir(mkl_op_dir)]
|
||||
mkl_ops = compile_custom_ops(mkl_op_files,
|
||||
extra_flags=f" -I'{mkl_include_path}' -L'{mkl_lib_path}' -lmkldnn -Wl,-rpath='{mkl_lib_path}' ")
|
||||
mkl_ops = compile_custom_ops(mkl_op_files, extra_flags=extra_flags)
|
||||
LOG.vv("Get mkl_ops: "+str(dir(mkl_ops)))
|
||||
|
||||
|
||||
|
@ -154,13 +183,20 @@ def setup_cuda_extern():
|
|||
line = traceback.format_exc()
|
||||
LOG.w(f"CUDA found but {lib_name} is not loaded:\n{line}")
|
||||
if lib_name == "cudnn":
|
||||
LOG.w(f"""Develop version of CUDNN not found,
|
||||
msg = """Develop version of CUDNN not found,
|
||||
please refer to CUDA offical tar file installation:
|
||||
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar
|
||||
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar"""
|
||||
if platform.machine() == "x86_64":
|
||||
msg += """
|
||||
or you can let jittor install cuda and cudnn for you:
|
||||
>>> python3.{sys.version_info.minor} -m jittor_utils.install_cuda""")
|
||||
>>> python3.{sys.version_info.minor} -m jittor_utils.install_cuda
|
||||
"""
|
||||
LOG.w(msg)
|
||||
|
||||
def setup_cuda_lib(lib_name, link=True, extra_flags=""):
|
||||
arch_key = "x86_64"
|
||||
if platform.machine() != "x86_64":
|
||||
arch_key = "aarch64"
|
||||
globals()[lib_name+"_ops"] = None
|
||||
globals()[lib_name] = None
|
||||
if not has_cuda: return
|
||||
|
@ -172,20 +208,26 @@ def setup_cuda_lib(lib_name, link=True, extra_flags=""):
|
|||
|
||||
link_flags = ""
|
||||
if link:
|
||||
extra_include_path = os.path.abspath(os.path.join(cuda_include, "..", "targets/x86_64-linux/include"))
|
||||
extra_lib_path = os.path.abspath(os.path.join(cuda_lib, "..", "targets/x86_64-linux/lib"))
|
||||
extra_include_path = os.path.abspath(os.path.join(cuda_include, "..", f"targets/{arch_key}-linux/include"))
|
||||
extra_lib_path = os.path.abspath(os.path.join(cuda_lib, "..", f"targets/{arch_key}-linux/lib"))
|
||||
cuda_include_name = search_file([cuda_include, extra_include_path, "/usr/include"], lib_name+".h")
|
||||
# cuda11 prefer cudnn 8
|
||||
nvcc_version = get_int_version(nvcc_path)
|
||||
prefer_version = ()
|
||||
if nvcc_version[0] == 11:
|
||||
prefer_version = ("8",)
|
||||
culib_path = search_file([cuda_lib, extra_lib_path, "/usr/lib/x86_64-linux-gnu"], f"lib{lib_name}.so", prefer_version)
|
||||
culib_path = search_file([cuda_lib, extra_lib_path, f"/usr/lib/{arch_key}-linux-gnu", "/usr/lib"], f"lib{lib_name}.so", prefer_version)
|
||||
|
||||
if lib_name == "cublas":
|
||||
if lib_name == "cublas" and nvcc_version[0] >= 10:
|
||||
# manual link libcublasLt.so
|
||||
cublas_lt_lib_path = search_file([cuda_lib, extra_lib_path, "/usr/lib/x86_64-linux-gnu"], f"libcublasLt.so", nvcc_version)
|
||||
ctypes.CDLL(cublas_lt_lib_path, dlopen_flags)
|
||||
try:
|
||||
cublas_lt_lib_path = search_file([cuda_lib, extra_lib_path, f"/usr/lib/{arch_key}-linux-gnu", "/usr/lib"], f"libcublasLt.so", nvcc_version)
|
||||
ctypes.CDLL(cublas_lt_lib_path, dlopen_flags)
|
||||
except:
|
||||
# some aarch64 os, such as uos with FT2000 cpu,
|
||||
# it's cuda 10 doesn't have libcublasLt.so
|
||||
pass
|
||||
|
||||
|
||||
|
||||
if lib_name == "cudnn":
|
||||
|
@ -193,7 +235,7 @@ def setup_cuda_lib(lib_name, link=True, extra_flags=""):
|
|||
if nvcc_version >= (11,0,0):
|
||||
libs = ["libcudnn_ops_infer.so", "libcudnn_ops_train.so", "libcudnn_cnn_infer.so", "libcudnn_cnn_train.so"]
|
||||
for l in libs:
|
||||
ex_cudnn_path = search_file([cuda_lib, extra_lib_path, "/usr/lib/x86_64-linux-gnu"], l, prefer_version)
|
||||
ex_cudnn_path = search_file([cuda_lib, extra_lib_path, f"/usr/lib/{arch_key}-linux-gnu", "/usr/lib"], l, prefer_version)
|
||||
ctypes.CDLL(ex_cudnn_path, dlopen_flags)
|
||||
|
||||
# dynamic link cuda library
|
||||
|
@ -326,7 +368,7 @@ def install_nccl(root_folder):
|
|||
if len(flags.cuda_archs):
|
||||
arch_flag = f" -arch=compute_{min(flags.cuda_archs)} "
|
||||
arch_flag += ''.join(map(lambda x:f' -code=sm_{x} ', flags.cuda_archs))
|
||||
run_cmd(f"make -j8 src.build CUDA_HOME='{cuda_home}' NVCC_GENCODE='{arch_flag} --cudart=shared -ccbin=\"{cc_path}\" ' ", cwd=dirname)
|
||||
run_cmd(f"CC=\"{cc_path}\" CXX=\"{cc_path}\" make -j8 src.build CUDA_HOME='{cuda_home}' NVCC_GENCODE='{arch_flag} --cudart=shared ' ", cwd=dirname)
|
||||
return dirname
|
||||
|
||||
def setup_nccl():
|
||||
|
@ -453,12 +495,22 @@ def setup_mpi():
|
|||
if k == "mpi_test": continue
|
||||
setattr(core.Var, k, warper(mpi_ops.__dict__[k]))
|
||||
|
||||
if os.environ.get("FIX_TORCH_ERROR", "0") == "1":
|
||||
try:
|
||||
import torch
|
||||
except:
|
||||
pass
|
||||
|
||||
setup_mpi()
|
||||
in_mpi = inside_mpi()
|
||||
rank = mpi.world_rank() if in_mpi else 0
|
||||
world_size = mpi.world_size() if in_mpi else 1
|
||||
setup_nccl()
|
||||
|
||||
setup_cutt()
|
||||
setup_mkl()
|
||||
try:
|
||||
setup_mkl()
|
||||
except Exception as e:
|
||||
LOG.w("MKL install failed, msg:", e)
|
||||
|
||||
setup_cuda_extern()
|
||||
|
|
|
@ -12,7 +12,9 @@ import glob
|
|||
import inspect
|
||||
import datetime
|
||||
import threading
|
||||
import platform
|
||||
import ctypes
|
||||
import platform
|
||||
from ctypes import cdll
|
||||
from ctypes.util import find_library
|
||||
|
||||
|
@ -22,6 +24,7 @@ from . import pyjt_compiler
|
|||
from jittor_utils import lock
|
||||
from jittor_utils import install_cuda
|
||||
from jittor import __version__
|
||||
import hashlib
|
||||
|
||||
def find_jittor_path():
|
||||
return os.path.dirname(__file__)
|
||||
|
@ -71,10 +74,11 @@ def compile(compiler, flags, inputs, output, combind_build=False):
|
|||
output = os.path.join(cache_path, output)
|
||||
# don't recompile object file in inputs
|
||||
obj_files = []
|
||||
ex_obj_files = []
|
||||
new_inputs = []
|
||||
for name in inputs:
|
||||
if name[-1] in 'oa':
|
||||
obj_files.append(name)
|
||||
ex_obj_files.append(name)
|
||||
else:
|
||||
new_inputs.append(os.path.join(jittor_path, name))
|
||||
obj_files.append(os.path.join(
|
||||
|
@ -104,12 +108,11 @@ def compile(compiler, flags, inputs, output, combind_build=False):
|
|||
cmd = cmd.replace("-Ofast", "-O2")
|
||||
cmds.append(cmd)
|
||||
jit_utils.run_cmds(cmds, cache_path, jittor_path, "Compiling "+base_output)
|
||||
obj_files += ex_obj_files
|
||||
cmd = f"\"{compiler}\" {' '.join(obj_files)} {flags} {lto_flags} {link} -o {output}"
|
||||
return do_compile(cmd)
|
||||
|
||||
def gen_jit_tests():
|
||||
# all_src = run_cmd('find -L src/ | grep "cc$"', jittor_path).splitlines()
|
||||
# all_src = glob.glob(os.path.join(jittor_path,"src","**","*.cc"), recursive=True)
|
||||
all_src = glob.glob(jittor_path+"/src/**/*.cc", recursive=True)
|
||||
jit_declares = []
|
||||
re_def = re.compile("JIT_TEST\\((.*?)\\)")
|
||||
|
@ -159,7 +162,6 @@ def gen_jit_tests():
|
|||
f.write(jit_src)
|
||||
|
||||
def gen_jit_flags():
|
||||
# all_src = run_cmd('find -L src/ | grep "cc$"', jittor_path).splitlines()
|
||||
all_src = glob.glob(jittor_path+"/src/**/*.cc", recursive=True)
|
||||
jit_declares = []
|
||||
re_def = re.compile("DEFINE_FLAG(_WITH_SETTER)?\\((.*?)\\);", re.DOTALL)
|
||||
|
@ -608,7 +610,7 @@ def compile_custom_ops(
|
|||
filenames,
|
||||
extra_flags="",
|
||||
return_module=False,
|
||||
dlopen_flags=os.RTLD_GLOBAL | os.RTLD_NOW | os.RTLD_DEEPBIND,
|
||||
dlopen_flags=None,
|
||||
gen_name_ = ""):
|
||||
"""Compile custom ops
|
||||
filenames: path of op source files, filenames must be
|
||||
|
@ -618,6 +620,11 @@ def compile_custom_ops(
|
|||
return_module: return module rather than ops(default: False)
|
||||
return: compiled ops
|
||||
"""
|
||||
if dlopen_flags is None:
|
||||
dlopen_flags = os.RTLD_GLOBAL | os.RTLD_NOW
|
||||
if platform.system() == 'Linux':
|
||||
dlopen_flags |= os.RTLD_DEEPBIND
|
||||
|
||||
srcs = {}
|
||||
headers = {}
|
||||
builds = []
|
||||
|
@ -650,7 +657,7 @@ def compile_custom_ops(
|
|||
if gen_name_ != "":
|
||||
gen_name = gen_name_
|
||||
if len(gen_name) > 100:
|
||||
gen_name = gen_name[:80] + "___hash" + str(hash(gen_name))
|
||||
gen_name = gen_name[:80] + "___hash" + hashlib.md5(gen_name.encode()).hexdigest()
|
||||
|
||||
includes = sorted(list(set(includes)))
|
||||
includes = "".join(map(lambda x: f" -I'{x}' ", includes))
|
||||
|
@ -716,7 +723,7 @@ def get_full_path_of_executable(name):
|
|||
|
||||
def compile_extern():
|
||||
# compile llvm passes
|
||||
if cc_type != "clang":
|
||||
if cc_type != "clang" or platform.system() != 'Linux':
|
||||
return
|
||||
global kernel_opt_flags
|
||||
cache_path_llvm = os.path.join(cache_path, "llvm")
|
||||
|
@ -777,7 +784,7 @@ def compile_extern():
|
|||
LOG.vv(f"Compile extern llvm passes: {str(files)}")
|
||||
|
||||
def check_cuda():
|
||||
if nvcc_path == "":
|
||||
if not nvcc_path:
|
||||
return
|
||||
global cc_flags, has_cuda, core_link_flags, cuda_dir, cuda_lib, cuda_include, cuda_home
|
||||
cuda_dir = os.path.dirname(get_full_path_of_executable(nvcc_path))
|
||||
|
@ -859,11 +866,15 @@ def check_debug_flags():
|
|||
|
||||
cc_flags = " "
|
||||
# os.RTLD_NOW | os.RTLD_GLOBAL cause segfault when import torch first
|
||||
import_flags = os.RTLD_NOW | os.RTLD_GLOBAL | os.RTLD_DEEPBIND
|
||||
import_flags = os.RTLD_NOW | os.RTLD_GLOBAL
|
||||
if platform.system() == 'Linux':
|
||||
import_flags |= os.RTLD_DEEPBIND
|
||||
# if cc_type=="icc":
|
||||
# # weird link problem, icc omp library may conflict and cause segfault
|
||||
# import_flags = os.RTLD_NOW | os.RTLD_GLOBAL
|
||||
dlopen_flags = os.RTLD_NOW | os.RTLD_GLOBAL | os.RTLD_DEEPBIND
|
||||
dlopen_flags = os.RTLD_NOW | os.RTLD_GLOBAL
|
||||
if platform.system() == 'Linux':
|
||||
import_flags |= os.RTLD_DEEPBIND
|
||||
|
||||
with jit_utils.import_scope(import_flags):
|
||||
jit_utils.try_import_jit_utils_core()
|
||||
|
@ -885,9 +896,21 @@ python_path = sys.executable
|
|||
ex_python_path = python_path + '.' + str(sys.version_info.minor)
|
||||
if os.path.isfile(ex_python_path):
|
||||
python_path = ex_python_path
|
||||
nvcc_path = env_or_try_find('nvcc_path', 'nvcc') or try_find_exe('/usr/local/cuda/bin/nvcc') or try_find_exe('/usr/bin/nvcc')
|
||||
|
||||
# if jtcuda is already installed
|
||||
nvcc_path = None
|
||||
if install_cuda.has_installation():
|
||||
nvcc_path = install_cuda.install_cuda()
|
||||
if nvcc_path:
|
||||
nvcc_path = try_find_exe(nvcc_path)
|
||||
# check system installed cuda
|
||||
if not nvcc_path:
|
||||
nvcc_path = env_or_try_find('nvcc_path', 'nvcc') or \
|
||||
try_find_exe('/usr/local/cuda/bin/nvcc') or \
|
||||
try_find_exe('/usr/bin/nvcc') or \
|
||||
try_find_exe('/opt/cuda/bin/nvcc')
|
||||
# if system has no cuda, install jtcuda
|
||||
if not nvcc_path:
|
||||
cuda_driver = install_cuda.get_cuda_driver()
|
||||
nvcc_path = install_cuda.install_cuda()
|
||||
if nvcc_path:
|
||||
nvcc_path = try_find_exe(nvcc_path)
|
||||
|
@ -897,14 +920,44 @@ gdb_path = try_find_exe('gdb')
|
|||
addr2line_path = try_find_exe('addr2line')
|
||||
has_pybt = check_pybt(gdb_path, python_path)
|
||||
|
||||
cc_flags += " -Wall -Werror -Wno-unknown-pragmas -std=c++14 -fPIC -march=native "
|
||||
|
||||
def check_clang_latest_supported_cpu():
|
||||
output = run_cmd('clang --print-supported-cpus')
|
||||
apple_cpus = [l.strip() for l in output.split('\n') if 'apple-a' in l]
|
||||
apple_cpus_id = max([int(cpu[7:]) for cpu in apple_cpus])
|
||||
return f'apple-a{apple_cpus_id}'
|
||||
|
||||
# cc_flags += " -Wall -Werror -Wno-unknown-pragmas -std=c++14 -fPIC "
|
||||
cc_flags += " -Wall -Wno-unknown-pragmas -std=c++14 -fPIC "
|
||||
# 1. Arch/CPU specific optimization
|
||||
if platform.machine() == "x86_64":
|
||||
cc_flags += " -march=native "
|
||||
elif platform.machine() == 'arm64' and platform.system() == "Darwin":
|
||||
cc_flags += f" -mcpu={check_clang_latest_supported_cpu()} "
|
||||
cc_flags += " -fdiagnostics-color=always "
|
||||
# 2. Non standard include path
|
||||
if platform.system() == 'Darwin' and platform.machine() == 'arm64':
|
||||
cc_flags += " -I/opt/homebrew/include "
|
||||
# 3. User specified flags
|
||||
if "cc_flags" in os.environ:
|
||||
cc_flags += os.environ["cc_flags"] + ' '
|
||||
|
||||
link_flags = " -lstdc++ -ldl -shared "
|
||||
if platform.system() == 'Darwin':
|
||||
# TODO: if not using apple clang, there is no need to add -lomp
|
||||
link_flags += "-undefined dynamic_lookup -lomp "
|
||||
if platform.machine() == "arm64":
|
||||
link_flags += " -L/opt/homebrew/lib "
|
||||
|
||||
core_link_flags = ""
|
||||
opt_flags = ""
|
||||
kernel_opt_flags = os.environ.get("kernel_flags", "") + opt_flags + " -fopenmp "
|
||||
|
||||
kernel_opt_flags = os.environ.get("kernel_flags", "") + opt_flags
|
||||
if platform.system() == 'Darwin':
|
||||
# TODO: if not using apple clang, cannot add -Xpreprocessor
|
||||
kernel_opt_flags = kernel_opt_flags + " -Xpreprocessor -fopenmp "
|
||||
else:
|
||||
kernel_opt_flags = kernel_opt_flags + " -fopenmp "
|
||||
if os.name == 'nt':
|
||||
link_flags = link_flags.replace('-ldl', '')
|
||||
py3_link_path = '-L"' + os.path.join(
|
||||
|
@ -978,7 +1031,6 @@ if has_cuda:
|
|||
gen_jit_flags()
|
||||
gen_jit_tests()
|
||||
op_headers = glob.glob(jittor_path+"/src/ops/**/*op.h", recursive=True)
|
||||
# op_headers = run_cmd('find -L src/ops/ | grep "op.h$"', jittor_path).splitlines()
|
||||
jit_src = gen_jit_op_maker(op_headers)
|
||||
LOG.vvvv(jit_src)
|
||||
with open(os.path.join(cache_path, "gen", "jit_op_maker.h"), 'w') as f:
|
||||
|
@ -1023,54 +1075,49 @@ for file in jit_utils_core_files:
|
|||
files.remove(file)
|
||||
LOG.vv("compile order:", files)
|
||||
|
||||
if os.name != 'nt':
|
||||
# manual Link omp using flags(os.RTLD_NOW | os.RTLD_GLOBAL)
|
||||
# if cc_type=="icc":
|
||||
# os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
|
||||
if platform.system() == 'Linux':
|
||||
libname = {"clang":"omp", "icc":"iomp5", "g++":"gomp"}[cc_type]
|
||||
libname = ctypes.util.find_library(libname)
|
||||
assert libname is not None, "openmp library not found"
|
||||
ctypes.CDLL(libname, os.RTLD_NOW | os.RTLD_GLOBAL)
|
||||
|
||||
# get os release
|
||||
if platform.machine()=='sw_64':
|
||||
import ssl
|
||||
ssl._create_default_https_context = ssl._create_unverified_context
|
||||
|
||||
with open("/etc/os-release", "r", encoding='utf8') as f:
|
||||
s = f.read().splitlines()
|
||||
os_release = {}
|
||||
for line in s:
|
||||
a = line.split('=')
|
||||
if len(a) != 2: continue
|
||||
os_release[a[0]] = a[1].replace("\"", "")
|
||||
|
||||
os_type = {
|
||||
"ubuntu": "ubuntu",
|
||||
"debian": "ubuntu",
|
||||
"centos": "centos",
|
||||
"rhel": "ubuntu",
|
||||
"fedora": "ubuntu",
|
||||
}
|
||||
version_file = os.path.join(jittor_path, "version")
|
||||
if os.path.isfile(version_file) and not os.path.isdir(os.path.join(jittor_path, "src", "__data__")):
|
||||
with open(version_file, 'r') as f:
|
||||
version = f.read().strip()
|
||||
# key = f"{version}-{cc_type}-{'cuda' if has_cuda else 'cpu'}.o"
|
||||
key = f"{version}-g++-cpu"
|
||||
os_id = os_release["ID"]
|
||||
os_key = os_type.get(os_id, "ubuntu")
|
||||
if "os_key" in os.environ:
|
||||
os_key = os.environ['os_key']
|
||||
LOG.i("OS type:", os_id, " OS key:", os_key)
|
||||
key += '-' + os_key + '.o'
|
||||
# TODO: open the website
|
||||
extra_obj = os.path.join(cache_path, key)
|
||||
url = os.path.join("https://cg.cs.tsinghua.edu.cn/jittor/assets/build/"+key)
|
||||
jit_utils.download(url, extra_obj)
|
||||
files.append(extra_obj)
|
||||
data_gz_path = os.path.join(jittor_path, "utils", "data.gz")
|
||||
use_data_gz = os.path.isfile(data_gz_path)
|
||||
if os.environ.get("use_data_gz", "1") == "0":
|
||||
use_data_gz = False
|
||||
if use_data_gz:
|
||||
import gzip
|
||||
with gzip.open(data_gz_path, 'rb') as f:
|
||||
data = f.read()
|
||||
md5 = hashlib.md5(data).hexdigest()
|
||||
target_md5 = None
|
||||
data_gz_md5_path = os.path.join(cache_path, "data.md5")
|
||||
if os.path.isfile(data_gz_md5_path):
|
||||
with open(data_gz_md5_path, 'r') as f:
|
||||
target_md5 = f.read()
|
||||
data_o_path = os.path.join(cache_path, "data.o")
|
||||
if target_md5 != md5:
|
||||
data_s_path = os.path.join(cache_path, "data.cc")
|
||||
with open(data_s_path, "w") as f:
|
||||
f.write(data.decode("utf8"))
|
||||
dflags = (cc_flags+opt_flags)\
|
||||
.replace("-Wall", "") \
|
||||
.replace("-Werror", "")
|
||||
run_cmd(f"{cc_path} {dflags} \"-D_P(...)=\" {data_s_path} -c -o {data_o_path}")
|
||||
os.remove(data_s_path)
|
||||
with open(data_gz_md5_path, 'w') as f:
|
||||
f.write(md5)
|
||||
files.append(data_o_path)
|
||||
files = [f for f in files if "__data__" not in f]
|
||||
|
||||
compile(cc_path, cc_flags+opt_flags, files, 'jittor_core'+extension_suffix)
|
||||
|
||||
# TODO: move to compile_extern.py
|
||||
compile_extern()
|
||||
# compile_extern()
|
||||
|
||||
with jit_utils.import_scope(import_flags):
|
||||
import jittor_core as core
|
||||
|
|
|
@ -206,6 +206,23 @@ def setitem(x, slices, value):
|
|||
jt.Var.__getitem__ = jt.Var.slice_var = getitem
|
||||
jt.Var.__setitem__ = setitem
|
||||
|
||||
|
||||
def _merge_dtypes(dtypes):
|
||||
s = -1
|
||||
e = -1
|
||||
names = ["bool","uint","int","float"]
|
||||
dbytes = ["8","16","32","64"]
|
||||
for d in dtypes:
|
||||
for name in names:
|
||||
if d.startswith(name):
|
||||
s = max(s,names.index(name))
|
||||
for db in dbytes:
|
||||
if d.endswith(db):
|
||||
e = max(e,dbytes.index(db))
|
||||
assert s>=0 and s<4 and e<4
|
||||
dtype = names[s]+("" if e ==-1 else dbytes[e])
|
||||
return dtype
|
||||
|
||||
def concat(arr, dim=0):
|
||||
'''Concat Operator can concat a list of jt Var at a specfic dimension.
|
||||
|
||||
|
@ -226,12 +243,14 @@ Example::
|
|||
raise ValueError("need at least one array to concat")
|
||||
total_dim = 0
|
||||
if dim < 0: dim += len(arr[0].shape)
|
||||
dtypes = []
|
||||
for a in arr:
|
||||
total_dim += a.shape[dim]
|
||||
dtypes.append(str(a.dtype))
|
||||
cdim = 0
|
||||
shape = list(a.shape)
|
||||
shape[dim] = total_dim
|
||||
s = jt.empty(shape, a.dtype)
|
||||
s = jt.empty(shape, dtype = _merge_dtypes(dtypes))
|
||||
slices = [slice(None)]*len(a.shape)
|
||||
for a in arr:
|
||||
if a.shape[dim] == 0:
|
||||
|
|
|
@ -1,5 +1,6 @@
|
|||
|
||||
from .dataset import Dataset, ImageFolder
|
||||
from .dataset import Dataset, ImageFolder, dataset_root, TensorDataset, VarDataset
|
||||
from .mnist import MNIST
|
||||
from .cifar import CIFAR10, CIFAR100
|
||||
from .voc import VOC
|
||||
from .sampler import *
|
|
@ -0,0 +1,189 @@
|
|||
|
||||
import os
|
||||
from jittor_utils.misc import download_and_extract_archive, check_integrity
|
||||
from PIL import Image
|
||||
import sys, pickle
|
||||
import numpy as np
|
||||
from jittor.dataset import Dataset, dataset_root
|
||||
|
||||
class CIFAR10(Dataset):
|
||||
"""`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
|
||||
|
||||
Args:
|
||||
root (string): Root directory of dataset where directory
|
||||
``cifar-10-batches-py`` exists or will be saved to if download is set to True.
|
||||
train (bool, optional): If True, creates dataset from training set, otherwise
|
||||
creates from test set.
|
||||
transform (callable, optional): A function/transform that takes in an PIL image
|
||||
and returns a transformed version. E.g, ``transforms.RandomCrop``
|
||||
target_transform (callable, optional): A function/transform that takes in the
|
||||
target and transforms it.
|
||||
download (bool, optional): If true, downloads the dataset from the internet and
|
||||
puts it in root directory. If dataset is already downloaded, it is not
|
||||
downloaded again.
|
||||
|
||||
Example::
|
||||
|
||||
|
||||
from jittor.dataset.cifar import CIFAR10
|
||||
a = CIFAR10()
|
||||
a.set_attrs(batch_size=16)
|
||||
for imgs, labels in a:
|
||||
print(imgs.shape, labels.shape)
|
||||
break
|
||||
|
||||
"""
|
||||
base_folder = 'cifar-10-batches-py'
|
||||
url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
|
||||
filename = "cifar-10-python.tar.gz"
|
||||
tgz_md5 = 'c58f30108f718f92721af3b95e74349a'
|
||||
train_list = [
|
||||
['data_batch_1', 'c99cafc152244af753f735de768cd75f'],
|
||||
['data_batch_2', 'd4bba439e000b95fd0a9bffe97cbabec'],
|
||||
['data_batch_3', '54ebc095f3ab1f0389bbae665268c751'],
|
||||
['data_batch_4', '634d18415352ddfa80567beed471001a'],
|
||||
['data_batch_5', '482c414d41f54cd18b22e5b47cb7c3cb'],
|
||||
]
|
||||
|
||||
test_list = [
|
||||
['test_batch', '40351d587109b95175f43aff81a1287e'],
|
||||
]
|
||||
meta = {
|
||||
'filename': 'batches.meta',
|
||||
'key': 'label_names',
|
||||
'md5': '5ff9c542aee3614f3951f8cda6e48888',
|
||||
}
|
||||
|
||||
def __init__(self, root=dataset_root+"/cifar_data/", train=True, transform=None, target_transform=None,
|
||||
download=True):
|
||||
|
||||
super(CIFAR10, self).__init__()
|
||||
self.root = root
|
||||
self.transform=transform
|
||||
self.target_transform=target_transform
|
||||
|
||||
self.train = train # training set or test set
|
||||
|
||||
if download:
|
||||
self.download()
|
||||
|
||||
if not self._check_integrity():
|
||||
raise RuntimeError('Dataset not found or corrupted.' +
|
||||
' You can use download=True to download it')
|
||||
|
||||
if self.train:
|
||||
downloaded_list = self.train_list
|
||||
else:
|
||||
downloaded_list = self.test_list
|
||||
|
||||
self.data = []
|
||||
self.targets = []
|
||||
|
||||
# now load the picked numpy arrays
|
||||
for file_name, checksum in downloaded_list:
|
||||
file_path = os.path.join(self.root, self.base_folder, file_name)
|
||||
with open(file_path, 'rb') as f:
|
||||
if sys.version_info[0] == 2:
|
||||
entry = pickle.load(f)
|
||||
else:
|
||||
entry = pickle.load(f, encoding='latin1')
|
||||
self.data.append(entry['data'])
|
||||
if 'labels' in entry:
|
||||
self.targets.extend(entry['labels'])
|
||||
else:
|
||||
self.targets.extend(entry['fine_labels'])
|
||||
|
||||
self.data = np.vstack(self.data).reshape(-1, 3, 32, 32)
|
||||
self.data = self.data.transpose((0, 2, 3, 1)) # convert to HWC
|
||||
|
||||
self._load_meta()
|
||||
|
||||
def _load_meta(self):
|
||||
path = os.path.join(self.root, self.base_folder, self.meta['filename'])
|
||||
if not check_integrity(path, self.meta['md5']):
|
||||
raise RuntimeError('Dataset metadata file not found or corrupted.' +
|
||||
' You can use download=True to download it')
|
||||
with open(path, 'rb') as infile:
|
||||
if sys.version_info[0] == 2:
|
||||
data = pickle.load(infile)
|
||||
else:
|
||||
data = pickle.load(infile, encoding='latin1')
|
||||
self.classes = data[self.meta['key']]
|
||||
self.class_to_idx = {_class: i for i, _class in enumerate(self.classes)}
|
||||
|
||||
def __getitem__(self, index):
|
||||
"""
|
||||
Args:
|
||||
index (int): Index
|
||||
|
||||
Returns:
|
||||
tuple: (image, target) where target is index of the target class.
|
||||
"""
|
||||
img, target = self.data[index], self.targets[index]
|
||||
|
||||
# doing this so that it is consistent with all other datasets
|
||||
# to return a PIL Image
|
||||
img = Image.fromarray(img)
|
||||
|
||||
if self.transform is not None:
|
||||
img = self.transform(img)
|
||||
|
||||
if self.target_transform is not None:
|
||||
target = self.target_transform(target)
|
||||
|
||||
return img, target
|
||||
|
||||
def __len__(self):
|
||||
return len(self.data)
|
||||
|
||||
def _check_integrity(self):
|
||||
root = self.root
|
||||
for fentry in (self.train_list + self.test_list):
|
||||
filename, md5 = fentry[0], fentry[1]
|
||||
fpath = os.path.join(root, self.base_folder, filename)
|
||||
if not check_integrity(fpath, md5):
|
||||
return False
|
||||
return True
|
||||
|
||||
def download(self):
|
||||
if self._check_integrity():
|
||||
print('Files already downloaded and verified')
|
||||
return
|
||||
download_and_extract_archive(self.url, self.root, filename=self.filename, md5=self.tgz_md5)
|
||||
|
||||
def extra_repr(self):
|
||||
return "Split: {}".format("Train" if self.train is True else "Test")
|
||||
|
||||
|
||||
class CIFAR100(CIFAR10):
|
||||
"""`CIFAR100 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
|
||||
|
||||
This is a subclass of the `CIFAR10` Dataset.
|
||||
|
||||
|
||||
Example::
|
||||
|
||||
|
||||
from jittor.dataset.cifar import CIFAR100
|
||||
a = CIFAR100()
|
||||
a.set_attrs(batch_size=16)
|
||||
for imgs, labels in a:
|
||||
print(imgs.shape, labels.shape)
|
||||
break
|
||||
"""
|
||||
base_folder = 'cifar-100-python'
|
||||
url = "https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz"
|
||||
filename = "cifar-100-python.tar.gz"
|
||||
tgz_md5 = 'eb9058c3a382ffc7106e4002c42a8d85'
|
||||
train_list = [
|
||||
['train', '16019d7e3df5f24257cddd939b257f8d'],
|
||||
]
|
||||
|
||||
test_list = [
|
||||
['test', 'f0ef6b0ae62326f3e7ffdfab6717acfc'],
|
||||
]
|
||||
meta = {
|
||||
'filename': 'meta',
|
||||
'key': 'fine_label_names',
|
||||
'md5': '7973b15100ade9c7d40fb424638fde48',
|
||||
}
|
|
@ -26,6 +26,7 @@ dataset_root = os.path.join(pathlib.Path.home(), ".cache", "jittor", "dataset")
|
|||
mp_log_v = os.environ.get("mp_log_v", 0)
|
||||
mpi = jt.mpi
|
||||
img_open_hook = HookTimer(Image, "open")
|
||||
CHECK_MEMORY = int(os.environ.get("CHECK_MEMORY", "0"))
|
||||
|
||||
class Worker:
|
||||
def __init__(self, target, args, buffer_size, keep_numpy_array=False):
|
||||
|
@ -48,6 +49,8 @@ class Dataset(object):
|
|||
[in] drop_last(bool): if true, the last batch of dataset might smaller than batch_size, default True.
|
||||
[in] num_workers(int): number of workers for loading data.
|
||||
[in] buffer_size(int): buffer size for each worker in bytes, default(512MB).
|
||||
[in] keep_numpy_array(bool): return numpy array rather than jittor array, default(False).
|
||||
[in] endless(bool): will this dataset yield data forever, default(False).
|
||||
|
||||
Example::
|
||||
|
||||
|
@ -70,8 +73,11 @@ class Dataset(object):
|
|||
num_workers = 0,
|
||||
buffer_size = 512*1024*1024,
|
||||
stop_grad = True,
|
||||
keep_numpy_array = False):
|
||||
keep_numpy_array = False,
|
||||
endless = False):
|
||||
super().__init__()
|
||||
if os.environ.get("DISABLE_MULTIPROCESSING", '0') == '1':
|
||||
num_workers = 0
|
||||
self.total_len = None
|
||||
self.batch_size = batch_size
|
||||
self.shuffle = shuffle
|
||||
|
@ -80,7 +86,10 @@ class Dataset(object):
|
|||
self.buffer_size = buffer_size
|
||||
self.stop_grad = stop_grad
|
||||
self.keep_numpy_array = keep_numpy_array
|
||||
self.endless = endless
|
||||
self.epoch_id = 0
|
||||
self.sampler = None
|
||||
self._disable_workers = False
|
||||
|
||||
def __getitem__(self, index):
|
||||
raise NotImplementedError
|
||||
|
@ -129,13 +138,16 @@ class Dataset(object):
|
|||
if self.stop_grad else jt.array(x)
|
||||
if isinstance(batch, np.ndarray):
|
||||
return to_jt(batch)
|
||||
if isinstance(batch, dict):
|
||||
new_batch = {}
|
||||
for k,v in batch.items():
|
||||
new_batch[k] = self.to_jittor(v)
|
||||
return new_batch
|
||||
if not isinstance(batch, (list, tuple)):
|
||||
return batch
|
||||
new_batch = []
|
||||
for a in batch:
|
||||
if isinstance(a, np.ndarray) or \
|
||||
isinstance(a, int) or \
|
||||
isinstance(a, float):
|
||||
if isinstance(a, np.ndarray):
|
||||
new_batch.append(to_jt(a))
|
||||
else:
|
||||
new_batch.append(self.to_jittor(a))
|
||||
|
@ -180,15 +192,20 @@ class Dataset(object):
|
|||
while True:
|
||||
# get id
|
||||
with gid_lock:
|
||||
while gid_obj.value >= self.batch_len or buffer.is_stop():
|
||||
while buffer.is_stop() or self.idqueue.is_stop() or \
|
||||
gid_obj.value >= self.batch_len:
|
||||
self.num_idle.value += 1
|
||||
self.num_idle_c.notify()
|
||||
self.gidc.wait()
|
||||
self.num_idle.value -= 1
|
||||
cid = gid_obj.value
|
||||
self.idmap[cid] = worker_id
|
||||
batch_index_list = self.index_list_numpy[
|
||||
cid*self.real_batch_size:
|
||||
min(self.real_len, (cid+1)*self.real_batch_size)
|
||||
].copy()
|
||||
gid_obj.value += 1
|
||||
self.gidc.notify()
|
||||
with self.idqueue_lock:
|
||||
self.idqueue.push(worker_id)
|
||||
now = time.time()
|
||||
other_time = now - start
|
||||
start = now
|
||||
|
@ -197,8 +214,8 @@ class Dataset(object):
|
|||
batch = []
|
||||
if mp_log_v:
|
||||
print(f"#{worker_id} {os.getpid()} load batch", cid*self.real_batch_size, min(self.real_len, (cid+1)*self.real_batch_size))
|
||||
for i in range(cid*self.real_batch_size, min(self.real_len, (cid+1)*self.real_batch_size)):
|
||||
batch.append(self[self.index_list[i]])
|
||||
for i in batch_index_list:
|
||||
batch.append(self[i])
|
||||
batch = self.collate_batch(batch)
|
||||
now = time.time()
|
||||
data_time = now - start
|
||||
|
@ -276,10 +293,10 @@ Example::
|
|||
if not hasattr(self, "workers"):
|
||||
return
|
||||
msg = [""]
|
||||
msg.append(f"progress:{self.last_id}/{self.batch_len}")
|
||||
msg.append(f"progress:{self.batch_id}/{self.batch_len}")
|
||||
msg.append(f"batch(s): {self.batch_time:.3f}\twait(s):{self.wait_time:.3f}")
|
||||
msg.append(f"recv(s): {self.recv_time:.3f}\tto_jittor(s):{self.to_jittor_time:.3f}")
|
||||
msg.append(f"last 10 workers: {self.idmap[max(0, self.last_id-9):self.last_id+1]}")
|
||||
msg.append(f"last 10 workers: {self.last_ids}")
|
||||
msg.append(f"ID\twait(s)\topen(s)\tload(s)\tsend(s)\ttotal(s)")
|
||||
for i in range(self.num_workers):
|
||||
w = self.workers[i]
|
||||
|
@ -291,6 +308,7 @@ Example::
|
|||
# stop workers
|
||||
for w in self.workers:
|
||||
w.buffer.stop()
|
||||
self.idqueue.stop()
|
||||
# wait until all workers idle
|
||||
if self.num_idle.value < self.num_workers:
|
||||
with self.gid.get_lock():
|
||||
|
@ -304,29 +322,34 @@ Example::
|
|||
# clean workers' buffer
|
||||
for w in self.workers:
|
||||
w.buffer.clear()
|
||||
self.idqueue.clear()
|
||||
self.gid_obj.value = 0
|
||||
|
||||
def _init_workers(self):
|
||||
def _init_workers(self, index_list):
|
||||
jt.clean()
|
||||
jt.gc()
|
||||
self.index_list = mp.Array('i', self.real_len, lock=False)
|
||||
workers = []
|
||||
# batch id to worker id
|
||||
self.idmap = mp.Array('i', self.batch_len, lock=False)
|
||||
# get worker id
|
||||
self.idqueue = jt.RingBuffer(2048)
|
||||
self.idqueue_lock = mp.Lock()
|
||||
# global token index
|
||||
self.gid = mp.Value('i', self.batch_len)
|
||||
self.gid.value = 0
|
||||
# global token index condition
|
||||
self.gidc = mp.Condition(self.gid.get_lock())
|
||||
# number of idle workers
|
||||
self.num_idle = mp.Value('i', 0, lock=False)
|
||||
# number of idle workers condition
|
||||
self.num_idle_c = mp.Condition(self.gid.get_lock())
|
||||
self.index_list_numpy = np.ndarray(dtype='int32', shape=self.real_len, buffer=self.index_list)
|
||||
self.index_list_numpy[:] = index_list
|
||||
for i in range(self.num_workers):
|
||||
w = Worker(target=self._worker_main, args=(i,),
|
||||
buffer_size=self.buffer_size,
|
||||
keep_numpy_array=self.keep_numpy_array)
|
||||
workers.append(w)
|
||||
self.workers = workers
|
||||
self.index_list_numpy = np.ndarray(dtype='int32', shape=self.real_len, buffer=self.index_list)
|
||||
|
||||
def reset(self):
|
||||
if not hasattr(self, "workers"):
|
||||
|
@ -351,8 +374,8 @@ Example::
|
|||
if self.total_len is None:
|
||||
self.total_len = len(self)
|
||||
return self.total_len
|
||||
|
||||
def __iter__(self):
|
||||
|
||||
def _get_index_list(self):
|
||||
if self.total_len is None:
|
||||
self.total_len = len(self)
|
||||
# maybe rewrite by sampler
|
||||
|
@ -381,7 +404,8 @@ Example::
|
|||
world_size = mpi.world_size()
|
||||
world_rank = mpi.world_rank()
|
||||
index_list = np.int32(index_list)
|
||||
mpi.broadcast(index_list, 0)
|
||||
# TODO: mpi broadcast in subprocess has bug, fix it
|
||||
# mpi.broadcast(index_list, 0)
|
||||
|
||||
assert self.batch_size >= world_size, \
|
||||
f"Batch size({self.batch_size}) is smaller than MPI world_size({world_size})"
|
||||
|
@ -416,71 +440,104 @@ Example::
|
|||
else:
|
||||
self.real_len = self.total_len
|
||||
self.real_batch_size = self.batch_size
|
||||
|
||||
self.batch_len = self.__batch_len__()
|
||||
return index_list
|
||||
|
||||
def _epochs(self):
|
||||
if self.endless:
|
||||
while True:
|
||||
yield
|
||||
self.epoch_id += 1
|
||||
else:
|
||||
yield
|
||||
|
||||
def __iter__(self):
|
||||
if self._disable_workers:
|
||||
self.num_workers = 0
|
||||
index_list = self._get_index_list()
|
||||
|
||||
if not hasattr(self, "workers") and self.num_workers:
|
||||
self._init_workers()
|
||||
self._init_workers(index_list)
|
||||
self.last_ids = [-1] * 10
|
||||
|
||||
if self.num_workers:
|
||||
self._stop_all_workers()
|
||||
self.index_list_numpy[:] = index_list
|
||||
gid_obj = self.gid.get_obj()
|
||||
gid_lock = self.gid.get_lock()
|
||||
with gid_lock:
|
||||
gid_obj.value = 0
|
||||
self.gidc.notify_all()
|
||||
start = time.time()
|
||||
self.batch_time = 0
|
||||
for i in range(self.batch_len):
|
||||
# try not get lock first
|
||||
if gid_obj.value <= i:
|
||||
with gid_lock:
|
||||
if gid_obj.value <= i:
|
||||
if mp_log_v:
|
||||
print("wait")
|
||||
self.gidc.wait()
|
||||
now = time.time()
|
||||
self.wait_time = now - start
|
||||
start = now
|
||||
gid_obj = self.gid.get_obj()
|
||||
gid_lock = self.gid.get_lock()
|
||||
|
||||
self.last_id = i
|
||||
worker_id = self.idmap[i]
|
||||
w = self.workers[worker_id]
|
||||
if mp_log_v:
|
||||
print(f"#{worker_id} {os.getpid()} recv buffer", w.buffer)
|
||||
batch = w.buffer.recv()
|
||||
now = time.time()
|
||||
self.recv_time = now - start
|
||||
start = now
|
||||
for _ in self._epochs():
|
||||
with gid_lock:
|
||||
if self.num_idle.value:
|
||||
self.gidc.notify_all()
|
||||
|
||||
if mp_log_v:
|
||||
print(f"#{worker_id} {os.getpid()} recv", type(batch).__name__, [ type(b).__name__ for b in batch ])
|
||||
batch = self.to_jittor(batch)
|
||||
now = time.time()
|
||||
self.to_jittor_time = now - start
|
||||
start = now
|
||||
for i in range(self.batch_len):
|
||||
if self.num_idle.value:
|
||||
with gid_lock:
|
||||
if self.num_idle.value and \
|
||||
gid_obj.value >= self.batch_len:
|
||||
index_list = self._get_index_list()
|
||||
self.index_list_numpy[:] = index_list
|
||||
gid_obj.value = 0
|
||||
self.gidc.notify_all()
|
||||
|
||||
yield batch
|
||||
# get which worker has this batch
|
||||
worker_id = self.idqueue.pop()
|
||||
|
||||
now = time.time()
|
||||
self.batch_time = now - start
|
||||
start = now
|
||||
now = time.time()
|
||||
self.wait_time = now - start
|
||||
start = now
|
||||
|
||||
self.last_ids[i%10] = worker_id
|
||||
self.batch_id = i
|
||||
w = self.workers[worker_id]
|
||||
if mp_log_v:
|
||||
print(f"#{worker_id} {os.getpid()} recv buffer", w.buffer)
|
||||
batch = w.buffer.recv()
|
||||
|
||||
now = time.time()
|
||||
self.recv_time = now - start
|
||||
start = now
|
||||
|
||||
if mp_log_v:
|
||||
print(f"#{worker_id} {os.getpid()} recv", type(batch).__name__, [ type(b).__name__ for b in batch ])
|
||||
batch = self.to_jittor(batch)
|
||||
|
||||
now = time.time()
|
||||
self.to_jittor_time = now - start
|
||||
start = now
|
||||
|
||||
yield batch
|
||||
|
||||
now = time.time()
|
||||
self.batch_time = now - start
|
||||
start = now
|
||||
|
||||
if CHECK_MEMORY and self.batch_id % CHECK_MEMORY == 0:
|
||||
jt.display_memory_info()
|
||||
else:
|
||||
batch_data = []
|
||||
for idx in index_list:
|
||||
batch_data.append(self[int(idx)])
|
||||
if len(batch_data) == self.real_batch_size:
|
||||
for _ in self._epochs():
|
||||
self.batch_id = 0
|
||||
batch_data = []
|
||||
for idx in index_list:
|
||||
batch_data.append(self[int(idx)])
|
||||
if len(batch_data) == self.real_batch_size:
|
||||
batch_data = self.collate_batch(batch_data)
|
||||
tmp = batch_data
|
||||
batch_data = self.to_jittor(batch_data)
|
||||
# breakpoint()
|
||||
yield batch_data
|
||||
self.batch_id += 1
|
||||
if CHECK_MEMORY and self.batch_id % CHECK_MEMORY == 0:
|
||||
jt.display_memory_info()
|
||||
batch_data = []
|
||||
|
||||
# depend on drop_last
|
||||
if not self.drop_last and len(batch_data) > 0:
|
||||
batch_data = self.collate_batch(batch_data)
|
||||
batch_data = self.to_jittor(batch_data)
|
||||
self.batch_id += 1
|
||||
yield batch_data
|
||||
batch_data = []
|
||||
|
||||
# depend on drop_last
|
||||
if not self.drop_last and len(batch_data) > 0:
|
||||
batch_data = self.collate_batch(batch_data)
|
||||
batch_data = self.to_jittor(batch_data)
|
||||
yield batch_data
|
||||
|
||||
|
||||
class ImageFolder(Dataset):
|
||||
|
@ -537,3 +594,47 @@ class ImageFolder(Dataset):
|
|||
if self.transform:
|
||||
img = self.transform(img)
|
||||
return img, self.imgs[k][1]
|
||||
|
||||
class VarDataset(Dataset):
|
||||
""" Dataset using Var directly, TensorDataset is alias of VarDataset, Example::
|
||||
|
||||
import jittor as jt
|
||||
from jittor.dataset import VarDataset
|
||||
|
||||
x = jt.array([1,2,3])
|
||||
y = jt.array([4,5,6])
|
||||
z = jt.array([7,8,9])
|
||||
dataset = VarDataset(x, y, z)
|
||||
dataset.set_attrs(batch_size=1)
|
||||
|
||||
for a,b,c in dataset:
|
||||
print(a,b,c)
|
||||
# will print
|
||||
# 1,4,7
|
||||
# 2,5,8
|
||||
# 3,6,9
|
||||
|
||||
"""
|
||||
def __init__(self, *args):
|
||||
super().__init__()
|
||||
self.args = args
|
||||
self._disable_workers = True
|
||||
assert len(args), "At lease one args"
|
||||
l = len(args[0])
|
||||
for a in args:
|
||||
assert l == len(a), "Len should be the same"
|
||||
self.set_attrs(total_len=l)
|
||||
|
||||
def __getitem__(self, idx):
|
||||
return [ a[idx] for a in self.args ]
|
||||
|
||||
|
||||
def collate_batch(self, batch):
|
||||
b = collate_batch(batch)
|
||||
for i in range(len(self.args)):
|
||||
x = b[i]
|
||||
if jt.is_var(self.args[i]) and self.args[i].ndim == 1:
|
||||
x.assign(x.squeeze(-1))
|
||||
return b
|
||||
|
||||
TensorDataset = VarDataset
|
|
@ -27,8 +27,7 @@ def collate_batch(batch):
|
|||
elem = batch[0]
|
||||
elem_type = type(elem)
|
||||
if isinstance(elem, jt.Var):
|
||||
# TODO: use jittor
|
||||
temp_data = np.stack([data.data for data in batch], 0)
|
||||
temp_data = jt.stack([data for data in batch], 0)
|
||||
return temp_data
|
||||
if elem_type is np.ndarray:
|
||||
temp_data = np.stack([data for data in batch], 0)
|
||||
|
|
|
@ -29,18 +29,7 @@ kernel(in0->num/in0->shape[in0->shape.size()-1], 0, in0_p, out0_p, in0->shape[in
|
|||
|
||||
class OneHotCategorical:
|
||||
def __init__(self, probs=None, logits=None):
|
||||
assert not (probs is None and logits is None)
|
||||
if probs is None:
|
||||
# cannot align to pytorch
|
||||
probs = jt.sigmoid(logits)
|
||||
elif logits is None:
|
||||
logits = jt.log(probs)
|
||||
with jt.no_grad():
|
||||
self.probs = probs / probs.sum(-1, True)
|
||||
self.cum_probs = simple_presum(self.probs)
|
||||
self.cum_probs_l = self.cum_probs[..., :-1]
|
||||
self.cum_probs_r = self.cum_probs[..., 1:]
|
||||
self.logits = logits
|
||||
Categorical.__init__(self, probs, logits)
|
||||
|
||||
def sample(self, sample_shape=[]):
|
||||
shape = sample_shape + self.probs.shape[:-1] + (1,)
|
||||
|
@ -48,17 +37,12 @@ class OneHotCategorical:
|
|||
one_hot = jt.logical_and(self.cum_probs_l < rand, rand <= self.cum_probs_r).float()
|
||||
return one_hot
|
||||
|
||||
def log_prob(self,x):
|
||||
if len(x.shape) == 1:
|
||||
x = x.unsqueeze(0)
|
||||
logits = self.logits.broadcast(x.shape)
|
||||
indices = jt.argmax(x, dim=-1)[0]
|
||||
return logits.gather(1, indices.unsqueeze(-1)).reshape(-1)
|
||||
def log_prob(self, x):
|
||||
x = jt.argmax(x, dim=-1)[0]
|
||||
return Categorical.log_prob(self, x)
|
||||
|
||||
def entropy(self):
|
||||
min_real = -(math.pow(2,23)-1) / math.pow(2,22) * math.pow(2,127)
|
||||
logits = jt.clamp(self.logits,min_v=min_real)
|
||||
p_log_p = logits * self.probs
|
||||
p_log_p = self.logits * self.probs
|
||||
return -p_log_p.sum(-1)
|
||||
|
||||
|
||||
|
@ -68,29 +52,32 @@ class Categorical:
|
|||
if probs is None:
|
||||
# cannot align to pytorch
|
||||
probs = jt.sigmoid(logits)
|
||||
elif logits is None:
|
||||
logits = jt.log(probs)
|
||||
probs = probs / probs.sum(-1, True)
|
||||
if logits is None:
|
||||
logits = jt.safe_log(probs)
|
||||
with jt.no_grad():
|
||||
self.probs = probs / probs.sum(-1, True)
|
||||
self.probs = probs
|
||||
self.logits = logits
|
||||
self.cum_probs = simple_presum(probs)
|
||||
self.cum_probs = simple_presum(self.probs)
|
||||
self.cum_probs_l = self.cum_probs[..., :-1]
|
||||
self.cum_probs_r = self.cum_probs[..., 1:]
|
||||
|
||||
def sample(self, sample_shape=[]):
|
||||
def sample(self, sample_shape=()):
|
||||
shape = sample_shape + self.probs.shape[:-1] + (1,)
|
||||
rand = jt.rand(shape)
|
||||
one_hot = jt.logical_and(self.cum_probs_l < rand, rand <= self.cum_probs_r)
|
||||
index = one_hot.index(one_hot.ndim-1)
|
||||
index = one_hot.index(one_hot.ndim - 1)
|
||||
return (one_hot * index).sum(-1)
|
||||
|
||||
|
||||
def log_prob(self, x):
|
||||
return jt.log(self.probs)[0,x]
|
||||
|
||||
a = self.probs.ndim
|
||||
b = x.ndim
|
||||
indexes = tuple( f'i{i}' for i in range(b-a+1, b) )
|
||||
indexes = indexes + (x,)
|
||||
return jt.safe_log(self.probs).getitem(indexes)
|
||||
|
||||
def entropy(self):
|
||||
min_real = -(math.pow(2,23)-1) / math.pow(2,22) * math.pow(2,127)
|
||||
logits = jt.clamp(self.logits,min_v=min_real)
|
||||
p_log_p = logits * self.probs
|
||||
p_log_p = self.logits * self.probs
|
||||
return -p_log_p.sum(-1)
|
||||
|
||||
|
||||
|
@ -104,11 +91,11 @@ class Normal:
|
|||
|
||||
def log_prob(self, x):
|
||||
var = self.sigma**2
|
||||
log_scale = jt.log(self.sigma)
|
||||
log_scale = jt.safe_log(self.sigma)
|
||||
return -((x-self.mu)**2) / (2*var) - log_scale-np.log(np.sqrt(2*np.pi))
|
||||
|
||||
def entropy(self):
|
||||
return 0.5+0.5*np.log(2*np.pi)+jt.log(self.sigma)
|
||||
return 0.5+0.5*np.log(2*np.pi)+jt.safe_log(self.sigma)
|
||||
|
||||
|
||||
class Uniform:
|
||||
|
@ -123,10 +110,10 @@ class Uniform:
|
|||
def log_prob(self,x):
|
||||
if x < self.low or x >= self.high:
|
||||
return math.inf
|
||||
return -jt.log(self.high - self.low)
|
||||
return -jt.safe_log(self.high - self.low)
|
||||
|
||||
def entropy(self):
|
||||
return jt.log(self.high - self.low)
|
||||
return jt.safe_log(self.high - self.low)
|
||||
|
||||
|
||||
class Geometric:
|
||||
|
@ -138,15 +125,14 @@ class Geometric:
|
|||
self.logits = logits
|
||||
elif logits is None:
|
||||
self.prob = p
|
||||
self.logits = -jt.log(1. / p - 1)
|
||||
self.logits = -jt.safe_log(1. / p - 1)
|
||||
|
||||
def sample(self, sample_shape):
|
||||
tiny = jt.info(self.probs.dtype).tiny
|
||||
u = jt.clamp(jt.rand(sample_shape),min_v=tiny)
|
||||
return (jt.log(u) / (jt.log(-self.probs+1))).floor()
|
||||
u = jt.rand(sample_shape)
|
||||
return (jt.safe_log(u) / (jt.safe_log(-self.probs+1))).floor_int()
|
||||
|
||||
def log_prob(self, x):
|
||||
return x*jt.log(-self.prob+1)+jt.log(self.prob)
|
||||
return x*jt.safe_log(-self.prob+1)+jt.safe_log(self.prob)
|
||||
|
||||
def entropy(self):
|
||||
return binary_cross_entropy_with_logits(jt.array(self.logits),jt.array(self.prob)) / self.prob
|
||||
|
@ -157,16 +143,14 @@ def kl_divergence(cur_dist, old_dist):
|
|||
if isinstance(cur_dist, Normal):
|
||||
vr = (cur_dist.sigma / old_dist.sigma)**2
|
||||
t1 = ((cur_dist.mu - old_dist.mu) / old_dist.sigma)**2
|
||||
return 0.5*(vr+t1-1-jt.log(vr))
|
||||
return 0.5*(vr+t1-1-jt.safe_log(vr))
|
||||
if isinstance(cur_dist, Categorical) or isinstance(cur_dist,OneHotCategorical):
|
||||
t = cur_dist.probs * (cur_dist.logits-old_dist.logits)
|
||||
t[jt.array((old_dist.probs == 0))] = math.inf
|
||||
t[jt.array((cur_dist.probs == 0))] = 0
|
||||
return t.sum(-1)
|
||||
if isinstance(cur_dist, Uniform):
|
||||
res = jt.log((old_dist.high - old_dist.low) / (cur_dist.high - cur_dist.low))
|
||||
res = jt.safe_log((old_dist.high - old_dist.low) / (cur_dist.high - cur_dist.low))
|
||||
if old_dist.low > cur_dist.low or old_dist.high < cur_dist.high:
|
||||
res = math.inf
|
||||
return res
|
||||
if isinstance(cur_dist, Geometric):
|
||||
return -cur_dist.entropy() - jt.log(-old_dist.prob+1) / cur_dist.prob - old_dist.logits
|
||||
return -cur_dist.entropy() - jt.safe_log(-old_dist.prob+1) / cur_dist.prob - old_dist.logits
|
||||
|
|
|
@ -11,6 +11,7 @@
|
|||
#include <cuda_runtime.h>
|
||||
#include <cublas_v2.h>
|
||||
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
#include "fp16_emu.h"
|
||||
#include "common.h"
|
||||
|
|
|
@ -58,6 +58,7 @@
|
|||
|
||||
// CUDA and CUBLAS functions
|
||||
#include <helper_functions.h>
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
|
||||
#ifndef min
|
||||
|
|
|
@ -8,6 +8,7 @@
|
|||
#include <cuda_runtime.h>
|
||||
#include <cudnn.h>
|
||||
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
#include "fp16_emu.h"
|
||||
#include "common.h"
|
||||
|
|
|
@ -0,0 +1,288 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers:
|
||||
// Dun Liang <randonlang@gmail.com>
|
||||
// Guowei Yang <471184555@qq.com>
|
||||
//
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#include "mem/allocator.h"
|
||||
#include "var.h"
|
||||
#include "cudnn_conv3d_backward_w_op.h"
|
||||
#include "cudnn_warper.h"
|
||||
#include "executor.h"
|
||||
#include "ops/op_register.h"
|
||||
|
||||
using namespace std;
|
||||
|
||||
namespace jittor {
|
||||
|
||||
#pragma GCC diagnostic ignored "-Wunused-variable"
|
||||
|
||||
#ifndef JIT
|
||||
|
||||
CudnnConv3dBackwardWOp::CudnnConv3dBackwardWOp(Var* x, Var* dy, int kd, int kh, int kw, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups, string xformat)
|
||||
: x(x), dy(dy), kd(kd), kh(kh), kw(kw), strided(strided), strideh(strideh), stridew(stridew), paddingd(paddingd), paddingh(paddingh), paddingw(paddingw), dilationd(dilationd), dilationh(dilationh), dilationw(dilationw), groups(groups),
|
||||
xformat(move(xformat)) {
|
||||
flags.set(NodeFlags::_cuda, 1);
|
||||
flags.set(NodeFlags::_cpu, 0);
|
||||
dw = create_output(nullptr, dtype_infer(dy->ns, x->ns));
|
||||
}
|
||||
|
||||
void CudnnConv3dBackwardWOp::infer_shape() {
|
||||
ASSERTop(x->shape.size(),==,5);
|
||||
ASSERTop(dy->shape.size(),==,5);
|
||||
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
|
||||
if (xformat == "ncdhw") {
|
||||
x->shape.unpack(xn, xc, xd, xh, xw);
|
||||
dy->shape.unpack(yn, yc, yd, yh, yw);
|
||||
} else {
|
||||
x->shape.unpack(xn, xd, xh, xw, xc);
|
||||
dy->shape.unpack(yn, yd, yh, yw, yc);
|
||||
}
|
||||
wco = yc, wci = xc / groups;
|
||||
wh = kh;
|
||||
ww = kw;
|
||||
wd = kd;
|
||||
dw->set_shape(NanoVector(wco, wci, wd, wh, ww));
|
||||
}
|
||||
|
||||
void CudnnConv3dBackwardWOp::jit_prepare(JK& jk) {
|
||||
jk << _CS("[Tx:") << x->dtype();
|
||||
jk << _CS("][Ty:") << dy->dtype();
|
||||
jk << _CS("][Tw:") << dw->dtype();
|
||||
jk << ']';
|
||||
}
|
||||
|
||||
static auto make_conv3d = get_op_info("cudnn_conv3d")
|
||||
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, string>();
|
||||
static auto make_backwardx = get_op_info("cudnn_conv3d_backward_x")
|
||||
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, int, int, int, string>();
|
||||
|
||||
|
||||
VarPtr CudnnConv3dBackwardWOp::grad(Var* out, Var* dout, Var* v, int v_index) {
|
||||
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
|
||||
if (xformat == "ncdhw") {
|
||||
x->shape.unpack(xn, xc, xd, xh, xw);
|
||||
dy->shape.unpack(yn, yc, yd, yh, yw);
|
||||
} else {
|
||||
x->shape.unpack(xn, xd, xh, xw, xc);
|
||||
dy->shape.unpack(yn, yd, yh, yw, yc);
|
||||
}
|
||||
|
||||
if (v_index == 0) {
|
||||
return make_backwardx(dout, dy, xd, xh, xw, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
|
||||
} else {
|
||||
return make_conv3d(x, dout, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
|
||||
}
|
||||
}
|
||||
|
||||
// unordered_map<string, cudnnConvolutionBwdFilterAlgo_t> bwdw_algo_cache;
|
||||
|
||||
#else // JIT
|
||||
#ifdef JIT_cuda
|
||||
|
||||
#pragma clang diagnostic ignored "-Wtautological-compare"
|
||||
|
||||
extern unordered_map<string, cudnnConvolutionBwdFilterAlgo_t> bwdw_algo_cache;
|
||||
|
||||
template <typename T_ELEM> __inline__ cudnnDataType_t getDataType();
|
||||
template <> __inline__ cudnnDataType_t getDataType<half1>() { return CUDNN_DATA_HALF; }
|
||||
template <> __inline__ cudnnDataType_t getDataType<float>() { return CUDNN_DATA_FLOAT; }
|
||||
|
||||
void CudnnConv3dBackwardWOp::jit_run() {
|
||||
auto w = dw;
|
||||
auto y = dy;
|
||||
cudnnHandle_t& handle_ = cudnn_handle;
|
||||
|
||||
cudnnTensorDescriptor_t cudnnIdesc;
|
||||
cudnnFilterDescriptor_t cudnnFdesc;
|
||||
cudnnTensorDescriptor_t cudnnOdesc;
|
||||
cudnnConvolutionDescriptor_t cudnnConvDesc;
|
||||
|
||||
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnIdesc ));
|
||||
checkCudaErrors(cudnnCreateFilterDescriptor( &cudnnFdesc ));
|
||||
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnOdesc ));
|
||||
checkCudaErrors(cudnnCreateConvolutionDescriptor( &cudnnConvDesc ));
|
||||
checkCudaErrors(cudnnSetConvolutionGroupCount( cudnnConvDesc, groups ));
|
||||
|
||||
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
int sx[] = {0,0,0,0,1};
|
||||
for (int i=3; i>=0; i--) sx[i] = sx[i+1] * x->shape[i+1];
|
||||
int strideX[5];
|
||||
if (xformat == "ncdhw") {
|
||||
x->shape.unpack(xn, xc, xd, xh, xw);
|
||||
int tmp[5] = {sx[0],sx[1],sx[2],sx[3],sx[4]};
|
||||
memcpy(strideX, tmp, sizeof(tmp));
|
||||
} else {
|
||||
x->shape.unpack(xn, xd, xh, xw, xc);
|
||||
int tmp[5] = {sx[0],sx[2],sx[3],sx[4],sx[1]};
|
||||
memcpy(strideX, tmp, sizeof(tmp));
|
||||
}
|
||||
int dimX[] = {xn, xc, xd, xh, xw};
|
||||
// dimX: ncdhw
|
||||
checkCudaErrors(cudnnSetTensorNdDescriptor(
|
||||
cudnnIdesc, getDataType<Tx>(),
|
||||
5, dimX, strideX
|
||||
));
|
||||
|
||||
auto ws = w->shape;
|
||||
int dimW[] = {(int)ws[0],(int)ws[1],(int)ws[2],(int)ws[3],(int)ws[4]};
|
||||
// cudnn only support this two format
|
||||
// https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnSetFilterNdDescriptor
|
||||
#define filterFormat_oihw CUDNN_TENSOR_NCHW
|
||||
#define filterFormat_ohwi CUDNN_TENSOR_NHWC
|
||||
|
||||
// dimW: KCRS(oihw)
|
||||
checkCudaErrors(cudnnSetFilterNdDescriptor(
|
||||
cudnnFdesc, getDataType<Tw>(),
|
||||
// filterFormat_@WFORMAT, 5, dimW
|
||||
filterFormat_oihw, 5, dimW
|
||||
));
|
||||
|
||||
int padA[] = {paddingd, paddingh, paddingw};
|
||||
int convstrideA[] = {strided, strideh, stridew};
|
||||
int dilationA[] = {dilationd, dilationh, dilationw};
|
||||
// difference between
|
||||
// CUDNN_CONVOLUTION and CUDNN_CROSS_CORRELATION
|
||||
// is the kernel rc order
|
||||
// currently, No perf difference is observed between
|
||||
// this two mode
|
||||
checkCudaErrors(cudnnSetConvolutionNdDescriptor(
|
||||
cudnnConvDesc, 3,
|
||||
padA, convstrideA, dilationA,
|
||||
CUDNN_CROSS_CORRELATION, getDataType<Ty>()
|
||||
));
|
||||
|
||||
// using tensor core
|
||||
// checkCudaErrors( cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );
|
||||
|
||||
|
||||
int sy[] = {0,0,0,0,1};
|
||||
for (int i=3; i>=0; i--) sy[i] = sy[i+1] * y->shape[i+1];
|
||||
int strideY[5];
|
||||
if (xformat == "ncdhw") {
|
||||
y->shape.unpack(yn, yc, yd, yh, yw);
|
||||
int tmp[5] = {sy[0],sy[1],sy[2],sy[3],sy[4]};
|
||||
memcpy(strideY, tmp, sizeof(tmp));
|
||||
} else {
|
||||
y->shape.unpack(yn, yd, yh, yw, yc);
|
||||
int tmp[5] = {sy[0],sy[2],sy[3],sy[4],sy[1]};
|
||||
memcpy(strideY, tmp, sizeof(tmp));
|
||||
}
|
||||
int dimY[] = {yn, yc, yd, yh, yw};
|
||||
|
||||
checkCudaErrors(cudnnSetTensorNdDescriptor(
|
||||
cudnnOdesc, getDataType<Ty>(),
|
||||
5, dimY, strideY
|
||||
));
|
||||
|
||||
cudnnConvolutionBwdFilterAlgo_t algos[] = {
|
||||
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0,
|
||||
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1,
|
||||
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT,
|
||||
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3,
|
||||
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_WINOGRAD_NONFUSED,
|
||||
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING,
|
||||
};
|
||||
int num_algos = CUDNN_CONVOLUTION_BWD_FILTER_ALGO_COUNT;
|
||||
int perf_count;
|
||||
cudnnConvolutionBwdFilterAlgoPerf_t perf_results[num_algos];
|
||||
cudnnConvolutionBwdFilterAlgo_t algo;
|
||||
bool benchmark=true;
|
||||
|
||||
jk.clear();
|
||||
jk << dimX[0] << "," << dimX[1] << "," << dimX[2] << "," << dimX[3] << "," << dimX[4] << ",";
|
||||
jk << dimW[0] << "," << dimW[1] << "," << dimW[2] << "," << dimW[3] << "," << dimW[4] << ",";
|
||||
jk << paddingd << paddingh << paddingw << "," << strided << strideh <<stridew << "," << dilationd << dilationh << dilationw << "," << groups << ".";
|
||||
auto iter = bwdw_algo_cache.find(jk.to_string());
|
||||
|
||||
if (iter!=bwdw_algo_cache.end()) algo = iter->second;
|
||||
else {
|
||||
if (bwdw_algo_cache.size()>=max_cache_size) benchmark = false;
|
||||
if (benchmark) {
|
||||
size_t max_ws_size = 0;
|
||||
for (int i = 0; i < num_algos; i++) {
|
||||
size_t sz;
|
||||
cudnnStatus_t ret = cudnnGetConvolutionBackwardFilterWorkspaceSize(handle_, cudnnIdesc, cudnnOdesc, cudnnConvDesc, cudnnFdesc, algos[i], &sz);
|
||||
// continue if use too much workspace
|
||||
if (sz > mem_info.total_cuda_ram * max_workspace_ratio) continue;
|
||||
if (CUDNN_STATUS_SUCCESS == ret && sz > max_ws_size) max_ws_size = sz;
|
||||
}
|
||||
size_t allocation;
|
||||
void* ws = exe.temp_allocator->alloc(max_ws_size, allocation);
|
||||
checkCudaErrors(cudnnFindConvolutionBackwardFilterAlgorithmEx(
|
||||
handle_,
|
||||
cudnnIdesc, x->ptr<Tx>(),
|
||||
cudnnOdesc, y->ptr<Ty>(),
|
||||
cudnnConvDesc,
|
||||
cudnnFdesc, w->ptr<Tw>(),
|
||||
num_algos,
|
||||
&perf_count,
|
||||
perf_results,
|
||||
ws,
|
||||
max_ws_size));
|
||||
exe.temp_allocator->free(ws, max_ws_size, allocation);
|
||||
} else {
|
||||
checkCudaErrors(cudnnGetConvolutionBackwardFilterAlgorithm_v7(
|
||||
handle_,
|
||||
cudnnIdesc,
|
||||
cudnnOdesc,
|
||||
cudnnConvDesc,
|
||||
cudnnFdesc,
|
||||
num_algos,
|
||||
&perf_count,
|
||||
perf_results));
|
||||
}
|
||||
int best_algo_idx=-1;
|
||||
for (int i = 0; i < perf_count; i++)
|
||||
if (perf_results[i].status == CUDNN_STATUS_SUCCESS){
|
||||
best_algo_idx=i;
|
||||
break;
|
||||
}
|
||||
ASSERT(best_algo_idx!=-1);
|
||||
algo=perf_results[best_algo_idx].algo;
|
||||
if (benchmark) {
|
||||
bwdw_algo_cache[jk.to_string()] = algo;
|
||||
if (bwdw_algo_cache.size()==max_cache_size)
|
||||
LOGw << "backward w algorithm cache is full";
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: warp work space
|
||||
void *workSpace = 0;
|
||||
size_t workSpaceSize;
|
||||
checkCudaErrors (cudnnGetConvolutionBackwardFilterWorkspaceSize(
|
||||
handle_, cudnnIdesc, cudnnOdesc, cudnnConvDesc,
|
||||
cudnnFdesc, algo, &workSpaceSize));
|
||||
size_t allocation;
|
||||
if (workSpaceSize > 0) {
|
||||
workSpace = exe.temp_allocator->alloc(workSpaceSize, allocation);
|
||||
}
|
||||
float alpha=1, beta=0;
|
||||
checkCudaErrors(cudnnConvolutionBackwardFilter(
|
||||
handle_,
|
||||
(void*)(&alpha),
|
||||
cudnnIdesc, x->ptr<Tx>(),
|
||||
cudnnOdesc, y->ptr<Ty>(),
|
||||
cudnnConvDesc,
|
||||
algo,
|
||||
workSpace, workSpaceSize,
|
||||
(void*)(&beta),
|
||||
cudnnFdesc, w->ptr<Tw>())
|
||||
);
|
||||
if (workSpace)
|
||||
exe.temp_allocator->free(workSpace, workSpaceSize, allocation);
|
||||
|
||||
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnIdesc ));
|
||||
checkCudaErrors(cudnnDestroyFilterDescriptor( cudnnFdesc ));
|
||||
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnOdesc ));
|
||||
checkCudaErrors(cudnnDestroyConvolutionDescriptor( cudnnConvDesc ));
|
||||
}
|
||||
#endif
|
||||
#endif // JIT
|
||||
|
||||
} // jittor
|
|
@ -0,0 +1,28 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers:
|
||||
// Dun Liang <randonlang@gmail.com>
|
||||
// Guowei Yang <471184555@qq.com>
|
||||
//
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#pragma once
|
||||
#include "op.h"
|
||||
|
||||
namespace jittor {
|
||||
|
||||
struct CudnnConv3dBackwardWOp : Op {
|
||||
Var* x, * dy, * dw;
|
||||
int kd, kh, kw, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups;
|
||||
string xformat;
|
||||
|
||||
CudnnConv3dBackwardWOp(Var* x, Var* y, int kd, int kh, int kw, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups=1, string xformat="ncdhw");
|
||||
|
||||
const char* name() const override { return "cudnn_conv3d_backward_w"; }
|
||||
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
|
||||
void infer_shape() override;
|
||||
DECLARE_jit_run;
|
||||
};
|
||||
|
||||
} // jittor
|
|
@ -0,0 +1,279 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers:
|
||||
// Dun Liang <randonlang@gmail.com>
|
||||
// Guowei Yang <471184555@qq.com>
|
||||
//
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#include "mem/allocator.h"
|
||||
#include "var.h"
|
||||
#include "cudnn_conv3d_backward_x_op.h"
|
||||
#include "cudnn_warper.h"
|
||||
#include "executor.h"
|
||||
#include "ops/op_register.h"
|
||||
|
||||
using namespace std;
|
||||
|
||||
namespace jittor {
|
||||
|
||||
#pragma GCC diagnostic ignored "-Wunused-variable"
|
||||
|
||||
#ifndef JIT
|
||||
|
||||
CudnnConv3dBackwardXOp::CudnnConv3dBackwardXOp(Var* w, Var* dy, int depth, int height, int width, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups, string xformat)
|
||||
: w(w), dy(dy), xd(depth), xh(height), xw(width), strided(strided), strideh(strideh), stridew(stridew), paddingd(paddingd), paddingh(paddingh), paddingw(paddingw), dilationd(dilationd), dilationh(dilationh), dilationw(dilationw), groups(groups),
|
||||
xformat(move(xformat)) {
|
||||
flags.set(NodeFlags::_cuda, 1);
|
||||
flags.set(NodeFlags::_cpu, 0);
|
||||
dx = create_output(nullptr, dtype_infer(dy->ns, w->ns));
|
||||
}
|
||||
|
||||
void CudnnConv3dBackwardXOp::infer_shape() {
|
||||
ASSERTop(w->shape.size(),==,5);
|
||||
ASSERTop(dy->shape.size(),==,5);
|
||||
int xn, xc, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
w->shape.unpack(wco, wci, wd, wh, ww);
|
||||
if (xformat == "ncdhw")
|
||||
dy->shape.unpack(yn, yc, yd, yh, yw);
|
||||
else
|
||||
dy->shape.unpack(yn, yd, yh, yw, yc);
|
||||
xn = yn, xc = wci * groups;
|
||||
if (xformat == "ncdhw")
|
||||
dx->set_shape(NanoVector(xn, xc, xd, xh, xw));
|
||||
else
|
||||
dx->set_shape(NanoVector(xn, xd, xh, xw, xc));
|
||||
}
|
||||
|
||||
void CudnnConv3dBackwardXOp::jit_prepare(JK& jk) {
|
||||
jk << _CS("[Tx:") << dx->dtype();
|
||||
jk << _CS("][Ty:") << dy->dtype();
|
||||
jk << _CS("][Tw:") << w->dtype();
|
||||
jk << ']';
|
||||
}
|
||||
|
||||
|
||||
static auto make_conv3d = get_op_info("cudnn_conv3d")
|
||||
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, string>();
|
||||
static auto make_backwardw = get_op_info("cudnn_conv3d_backward_w")
|
||||
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, int, int, int, string>();
|
||||
|
||||
|
||||
VarPtr CudnnConv3dBackwardXOp::grad(Var* out, Var* dout, Var* v, int v_index) {
|
||||
int xn, xc, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
w->shape.unpack(wco, wci, wd, wh, ww);
|
||||
|
||||
if (v_index == 0) {
|
||||
return make_backwardw(dout, dy, wd, wh, ww, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
|
||||
} else {
|
||||
return make_conv3d(dout, w, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
|
||||
}
|
||||
}
|
||||
// unordered_map<string, cudnnConvolutionBwdDataAlgo_t> bwdx_algo_cache;
|
||||
|
||||
#else // JIT
|
||||
#ifdef JIT_cuda
|
||||
|
||||
#pragma clang diagnostic ignored "-Wtautological-compare"
|
||||
|
||||
extern unordered_map<string, cudnnConvolutionBwdDataAlgo_t> bwdx_algo_cache;
|
||||
|
||||
template <typename T_ELEM> __inline__ cudnnDataType_t getDataType();
|
||||
template <> __inline__ cudnnDataType_t getDataType<half1>() { return CUDNN_DATA_HALF; }
|
||||
template <> __inline__ cudnnDataType_t getDataType<float>() { return CUDNN_DATA_FLOAT; }
|
||||
|
||||
void CudnnConv3dBackwardXOp::jit_run() {
|
||||
auto x = dx;
|
||||
auto y = dy;
|
||||
cudnnHandle_t& handle_ = cudnn_handle;
|
||||
|
||||
cudnnTensorDescriptor_t cudnnIdesc;
|
||||
cudnnFilterDescriptor_t cudnnFdesc;
|
||||
cudnnTensorDescriptor_t cudnnOdesc;
|
||||
cudnnConvolutionDescriptor_t cudnnConvDesc;
|
||||
|
||||
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnIdesc ));
|
||||
checkCudaErrors(cudnnCreateFilterDescriptor( &cudnnFdesc ));
|
||||
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnOdesc ));
|
||||
checkCudaErrors(cudnnCreateConvolutionDescriptor( &cudnnConvDesc ));
|
||||
checkCudaErrors(cudnnSetConvolutionGroupCount( cudnnConvDesc, groups ));
|
||||
|
||||
|
||||
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
int sx[] = {0,0,0,0,1};
|
||||
for (int i=3; i>=0; i--) sx[i] = sx[i+1] * x->shape[i+1];
|
||||
int strideX[5];
|
||||
if (xformat == "ncdhw") {
|
||||
x->shape.unpack(xn, xc, xd, xh, xw);
|
||||
int tmp[5] = {sx[0],sx[1],sx[2],sx[3],sx[4]};
|
||||
memcpy(strideX, tmp, sizeof(tmp));
|
||||
} else {
|
||||
x->shape.unpack(xn, xd, xh, xw, xc);
|
||||
int tmp[5] = {sx[0],sx[2],sx[3],sx[4],sx[1]};
|
||||
memcpy(strideX, tmp, sizeof(tmp));
|
||||
}
|
||||
int dimX[] = {xn, xc, xd, xh, xw};
|
||||
// dimX: ncdhw
|
||||
checkCudaErrors(cudnnSetTensorNdDescriptor(
|
||||
cudnnIdesc, getDataType<Tx>(),
|
||||
5, dimX, strideX
|
||||
));
|
||||
|
||||
auto ws = w->shape;
|
||||
int dimW[] = {(int)ws[0],(int)ws[1],(int)ws[2],(int)ws[3],(int)ws[4]};
|
||||
// cudnn only support this two format
|
||||
// https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnSetFilterNdDescriptor
|
||||
#define filterFormat_oihw CUDNN_TENSOR_NCHW
|
||||
#define filterFormat_ohwi CUDNN_TENSOR_NHWC
|
||||
|
||||
// dimW: KCRS(oihw)
|
||||
checkCudaErrors(cudnnSetFilterNdDescriptor(
|
||||
cudnnFdesc, getDataType<Tw>(),
|
||||
// filterFormat_@WFORMAT, 5, dimW
|
||||
filterFormat_oihw, 5, dimW
|
||||
));
|
||||
|
||||
int padA[] = {paddingd, paddingh, paddingw};
|
||||
int convstrideA[] = {strided, strideh, stridew};
|
||||
int dilationA[] = {dilationd, dilationh, dilationw};
|
||||
// difference between
|
||||
// CUDNN_CONVOLUTION and CUDNN_CROSS_CORRELATION
|
||||
// is the kernel rc order
|
||||
// currently, No perf difference is observed between
|
||||
// this two mode
|
||||
checkCudaErrors(cudnnSetConvolutionNdDescriptor(
|
||||
cudnnConvDesc, 3,
|
||||
padA, convstrideA, dilationA,
|
||||
CUDNN_CROSS_CORRELATION, getDataType<Ty>()
|
||||
));
|
||||
|
||||
// using tensor core
|
||||
// checkCudaErrors( cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );
|
||||
|
||||
|
||||
int sy[] = {0,0,0,0,1};
|
||||
for (int i=3; i>=0; i--) sy[i] = sy[i+1] * y->shape[i+1];
|
||||
int strideY[5];
|
||||
if (xformat == "ncdhw") {
|
||||
y->shape.unpack(yn, yc, yd, yh, yw);
|
||||
int tmp[5] = {sy[0],sy[1],sy[2],sy[3],sy[4]};
|
||||
memcpy(strideY, tmp, sizeof(tmp));
|
||||
} else {
|
||||
y->shape.unpack(yn, yd, yh, yw, yc);
|
||||
int tmp[5] = {sy[0],sy[2],sy[3],sy[4],sy[1]};
|
||||
memcpy(strideY, tmp, sizeof(tmp));
|
||||
}
|
||||
int dimY[] = {yn, yc, yd, yh, yw};
|
||||
|
||||
checkCudaErrors(cudnnSetTensorNdDescriptor(
|
||||
cudnnOdesc, getDataType<Ty>(),
|
||||
5, dimY, strideY
|
||||
));
|
||||
|
||||
cudnnConvolutionBwdDataAlgo_t algos[] = {
|
||||
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0,
|
||||
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1,
|
||||
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT,
|
||||
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING,
|
||||
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD,
|
||||
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD_NONFUSED
|
||||
};
|
||||
int num_algos = CUDNN_CONVOLUTION_BWD_DATA_ALGO_COUNT;
|
||||
int perf_count;
|
||||
cudnnConvolutionBwdDataAlgoPerf_t perf_results[num_algos];
|
||||
cudnnConvolutionBwdDataAlgo_t algo;
|
||||
bool benchmark=true;
|
||||
|
||||
jk.clear();
|
||||
jk << dimX[0] << "," << dimX[1] << "," << dimX[2] << "," << dimX[3] << "," << dimX[4] << ",";
|
||||
jk << dimW[0] << "," << dimW[1] << "," << dimW[2] << "," << dimW[3] << "," << dimW[4] << ",";
|
||||
jk << paddingd << paddingh << paddingw << "," << strided << strideh <<stridew << "," << dilationd << dilationh << dilationw << "," << groups << ".";
|
||||
auto iter = bwdx_algo_cache.find(jk.to_string());
|
||||
|
||||
if (iter!=bwdx_algo_cache.end()) algo = iter->second;
|
||||
else {
|
||||
if (bwdx_algo_cache.size()>=max_cache_size) benchmark = false;
|
||||
if (benchmark) {
|
||||
size_t max_ws_size = 0;
|
||||
for (int i = 0; i < num_algos; i++) {
|
||||
size_t sz;
|
||||
cudnnStatus_t ret = cudnnGetConvolutionBackwardDataWorkspaceSize(handle_, cudnnFdesc, cudnnOdesc, cudnnConvDesc, cudnnIdesc, algos[i], &sz);
|
||||
// continue if use too much workspace
|
||||
if (sz > mem_info.total_cuda_ram * max_workspace_ratio) continue;
|
||||
if (CUDNN_STATUS_SUCCESS == ret && sz > max_ws_size) max_ws_size = sz;
|
||||
}
|
||||
size_t allocation;
|
||||
void* ws = exe.temp_allocator->alloc(max_ws_size, allocation);
|
||||
checkCudaErrors(cudnnFindConvolutionBackwardDataAlgorithmEx(
|
||||
handle_,
|
||||
cudnnFdesc, w->ptr<Tw>(),
|
||||
cudnnOdesc, y->ptr<Ty>(),
|
||||
cudnnConvDesc,
|
||||
cudnnIdesc, x->ptr<Tx>(),
|
||||
num_algos,
|
||||
&perf_count,
|
||||
perf_results,
|
||||
ws,
|
||||
max_ws_size));
|
||||
exe.temp_allocator->free(ws, max_ws_size, allocation);
|
||||
} else {
|
||||
checkCudaErrors(cudnnGetConvolutionBackwardDataAlgorithm_v7(
|
||||
handle_,
|
||||
cudnnFdesc,
|
||||
cudnnOdesc,
|
||||
cudnnConvDesc,
|
||||
cudnnIdesc,
|
||||
num_algos,
|
||||
&perf_count,
|
||||
perf_results));
|
||||
}
|
||||
int best_algo_idx=-1;
|
||||
for (int i = 0; i < perf_count; i++)
|
||||
if (perf_results[i].status == CUDNN_STATUS_SUCCESS){
|
||||
best_algo_idx=i;
|
||||
break;
|
||||
}
|
||||
ASSERT(best_algo_idx!=-1);
|
||||
algo=perf_results[best_algo_idx].algo;
|
||||
if (benchmark) {
|
||||
bwdx_algo_cache[jk.to_string()] = algo;
|
||||
if (bwdx_algo_cache.size()==max_cache_size)
|
||||
LOGw << "backward x algorithm cache is full";
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: warp work space
|
||||
void *workSpace = 0;
|
||||
size_t workSpaceSize;
|
||||
checkCudaErrors (cudnnGetConvolutionBackwardDataWorkspaceSize(
|
||||
handle_, cudnnFdesc, cudnnOdesc, cudnnConvDesc,
|
||||
cudnnIdesc, algo, &workSpaceSize));
|
||||
size_t allocation;
|
||||
if (workSpaceSize > 0) {
|
||||
workSpace = exe.temp_allocator->alloc(workSpaceSize, allocation);
|
||||
}
|
||||
float alpha=1, beta=0;
|
||||
checkCudaErrors(cudnnConvolutionBackwardData(
|
||||
handle_,
|
||||
(void*)(&alpha),
|
||||
cudnnFdesc, w->ptr<Tw>(),
|
||||
cudnnOdesc, y->ptr<Ty>(),
|
||||
cudnnConvDesc,
|
||||
algo,
|
||||
workSpace, workSpaceSize,
|
||||
(void*)(&beta),
|
||||
cudnnIdesc, x->ptr<Tx>())
|
||||
);
|
||||
if (workSpace)
|
||||
exe.temp_allocator->free(workSpace, workSpaceSize, allocation);
|
||||
|
||||
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnIdesc ));
|
||||
checkCudaErrors(cudnnDestroyFilterDescriptor( cudnnFdesc ));
|
||||
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnOdesc ));
|
||||
checkCudaErrors(cudnnDestroyConvolutionDescriptor( cudnnConvDesc ));
|
||||
}
|
||||
#endif
|
||||
#endif // JIT
|
||||
|
||||
} // jittor
|
|
@ -0,0 +1,28 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers:
|
||||
// Dun Liang <randonlang@gmail.com>
|
||||
// Guowei Yang <471184555@qq.com>
|
||||
//
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#pragma once
|
||||
#include "op.h"
|
||||
|
||||
namespace jittor {
|
||||
|
||||
struct CudnnConv3dBackwardXOp : Op {
|
||||
Var* w, * dy, * dx;
|
||||
int xd, xh, xw, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups;
|
||||
string xformat;
|
||||
|
||||
CudnnConv3dBackwardXOp(Var* w, Var* y, int depth, int height, int width, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups=1, string xformat="ncdhw");
|
||||
|
||||
const char* name() const override { return "cudnn_conv3d_backward_x"; }
|
||||
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
|
||||
void infer_shape() override;
|
||||
DECLARE_jit_run;
|
||||
};
|
||||
|
||||
} // jittor
|
|
@ -0,0 +1,284 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers: Dun Liang <randonlang@gmail.com>.
|
||||
//
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#include "var.h"
|
||||
#include "cudnn_conv3d_op.h"
|
||||
#include "cudnn_warper.h"
|
||||
#include "executor.h"
|
||||
#include "ops/op_register.h"
|
||||
|
||||
using namespace std;
|
||||
|
||||
namespace jittor {
|
||||
|
||||
#pragma GCC diagnostic ignored "-Wunused-variable"
|
||||
|
||||
#ifndef JIT
|
||||
|
||||
CudnnConv3dOp::CudnnConv3dOp(Var* x, Var* w, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd, int dilationh, int dilationw, int groups, string xformat)
|
||||
: x(x), w(w), strided(strided), strideh(strideh), stridew(stridew), paddingd(paddingd), paddingh(paddingh), paddingw(paddingw), dilationd(dilationd), dilationh(dilationh), dilationw(dilationw), groups(groups),
|
||||
xformat(move(xformat)) {
|
||||
flags.set(NodeFlags::_cuda, 1);
|
||||
flags.set(NodeFlags::_cpu, 0);
|
||||
y = create_output(nullptr, dtype_infer(x->ns, w->ns));
|
||||
}
|
||||
|
||||
void CudnnConv3dOp::infer_shape() {
|
||||
ASSERTop(x->shape.size(),==,5);
|
||||
ASSERTop(w->shape.size(),==,5);
|
||||
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
if (xformat == "ncdhw")
|
||||
x->shape.unpack(xn, xc, xd, xh, xw);
|
||||
else
|
||||
x->shape.unpack(xn, xd, xh, xw, xc);
|
||||
w->shape.unpack(wco, wci, wd, wh, ww);
|
||||
ASSERTop(wci * groups,==,xc);
|
||||
yn = xn, yc = wco;
|
||||
yd = (xd+paddingd*2-wd*dilationd+dilationd-1)/strided+1;
|
||||
yh = (xh+paddingh*2-wh*dilationh+dilationh-1)/strideh+1;
|
||||
yw = (xw+paddingw*2-ww*dilationw+dilationw-1)/stridew+1;
|
||||
if (xformat == "ncdhw")
|
||||
y->set_shape(NanoVector(yn, yc, yd, yh, yw));
|
||||
else
|
||||
y->set_shape(NanoVector(yn, yd, yh, yw, yc));
|
||||
}
|
||||
|
||||
void CudnnConv3dOp::jit_prepare(JK& jk) {
|
||||
jk << _CS("[Tx:") << x->dtype();
|
||||
jk << _CS("][Ty:") << y->dtype();
|
||||
jk << _CS("][Tw:") << w->dtype();
|
||||
jk << ']';
|
||||
}
|
||||
|
||||
static auto make_backwardx = get_op_info("cudnn_conv3d_backward_x")
|
||||
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, int, int, int, string>();
|
||||
static auto make_backwardw = get_op_info("cudnn_conv3d_backward_w")
|
||||
.get_constructor<VarPtr, Var*, Var*, int, int, int, int, int, int, int, int, int, int, int, int, int, string>();
|
||||
|
||||
VarPtr CudnnConv3dOp::grad(Var* out, Var* dout, Var* v, int v_index) {
|
||||
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
if (xformat == "ncdhw")
|
||||
x->shape.unpack(xn, xc, xd, xh, xw);
|
||||
else
|
||||
x->shape.unpack(xn, xd, xh, xw, xc);
|
||||
w->shape.unpack(wco, wci, wd, wh, ww);
|
||||
if (v_index == 0) {
|
||||
return make_backwardx(w, dout, xd, xh, xw, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
|
||||
} else {
|
||||
return make_backwardw(x, dout, wd, wh, ww, strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups, xformat);
|
||||
}
|
||||
}
|
||||
|
||||
// unordered_map<string, cudnnConvolutionFwdAlgo_t> fwd_algo_cache;
|
||||
|
||||
#else // JIT
|
||||
#ifdef JIT_cuda
|
||||
|
||||
#pragma clang diagnostic ignored "-Wtautological-compare"
|
||||
|
||||
extern unordered_map<string, cudnnConvolutionFwdAlgo_t> fwd_algo_cache;
|
||||
|
||||
template <typename T_ELEM> __inline__ cudnnDataType_t getDataType();
|
||||
template <> __inline__ cudnnDataType_t getDataType<half1>() { return CUDNN_DATA_HALF; }
|
||||
template <> __inline__ cudnnDataType_t getDataType<float>() { return CUDNN_DATA_FLOAT; }
|
||||
|
||||
void CudnnConv3dOp::jit_run() {
|
||||
cudnnHandle_t& handle_ = cudnn_handle;
|
||||
|
||||
cudnnTensorDescriptor_t cudnnIdesc;
|
||||
cudnnFilterDescriptor_t cudnnFdesc;
|
||||
cudnnTensorDescriptor_t cudnnOdesc;
|
||||
cudnnConvolutionDescriptor_t cudnnConvDesc;
|
||||
|
||||
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnIdesc ));
|
||||
checkCudaErrors(cudnnCreateFilterDescriptor( &cudnnFdesc ));
|
||||
checkCudaErrors(cudnnCreateTensorDescriptor( &cudnnOdesc ));
|
||||
checkCudaErrors(cudnnCreateConvolutionDescriptor( &cudnnConvDesc ));
|
||||
checkCudaErrors(cudnnSetConvolutionGroupCount( cudnnConvDesc, groups ));
|
||||
|
||||
|
||||
int xn, xc, xd, xh, xw, wd, wh, ww, wci, wco, yn, yc, yd, yh, yw;
|
||||
int sx[] = {0,0,0,0,1};
|
||||
for (int i=3; i>=0; i--) sx[i] = sx[i+1] * x->shape[i+1];
|
||||
int strideX[5];
|
||||
if (xformat == "ncdhw") {
|
||||
x->shape.unpack(xn, xc, xd, xh, xw);
|
||||
int tmp[5] = {sx[0],sx[1],sx[2],sx[3],sx[4]};
|
||||
memcpy(strideX, tmp, sizeof(tmp));
|
||||
} else {
|
||||
x->shape.unpack(xn, xd, xh, xw, xc);
|
||||
int tmp[5] = {sx[0],sx[2],sx[3],sx[4],sx[1]};
|
||||
memcpy(strideX, tmp, sizeof(tmp));
|
||||
}
|
||||
int dimX[] = {xn, xc, xd, xh, xw};
|
||||
// dimX: ncdhw
|
||||
checkCudaErrors(cudnnSetTensorNdDescriptor(
|
||||
cudnnIdesc, getDataType<Tx>(),
|
||||
5, dimX, strideX
|
||||
));
|
||||
|
||||
auto ws = w->shape;
|
||||
int dimW[] = {(int)ws[0],(int)ws[1],(int)ws[2],(int)ws[3],(int)ws[4]};
|
||||
// cudnn only support this two format
|
||||
// https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnSetFilterNdDescriptor
|
||||
#define filterFormat_oihw CUDNN_TENSOR_NCHW
|
||||
#define filterFormat_ohwi CUDNN_TENSOR_NHWC
|
||||
|
||||
// dimW: KCRS(oihw)
|
||||
checkCudaErrors(cudnnSetFilterNdDescriptor(
|
||||
cudnnFdesc, getDataType<Tw>(),
|
||||
// filterFormat_@WFORMAT, 5, dimW
|
||||
filterFormat_oihw, 5, dimW
|
||||
));
|
||||
|
||||
int padA[] = {paddingd, paddingh, paddingw};
|
||||
int convstrideA[] = {strided, strideh, stridew};
|
||||
int dilationA[] = {dilationd, dilationh, dilationw};
|
||||
// difference between
|
||||
// CUDNN_CONVOLUTION and CUDNN_CROSS_CORRELATION
|
||||
// is the kernel rc order
|
||||
// currently, No perf difference is observed between
|
||||
// this two mode
|
||||
checkCudaErrors(cudnnSetConvolutionNdDescriptor(
|
||||
cudnnConvDesc, 3,
|
||||
padA, convstrideA, dilationA,
|
||||
CUDNN_CROSS_CORRELATION, getDataType<Ty>()
|
||||
));
|
||||
|
||||
// using tensor core
|
||||
// checkCudaErrors( cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );
|
||||
|
||||
|
||||
int sy[] = {0,0,0,0,1};
|
||||
for (int i=3; i>=0; i--) sy[i] = sy[i+1] * y->shape[i+1];
|
||||
int strideY[5];
|
||||
if (xformat == "ncdhw") {
|
||||
y->shape.unpack(yn, yc, yd, yh, yw);
|
||||
int tmp[5] = {sy[0],sy[1],sy[2],sy[3],sy[4]};
|
||||
memcpy(strideY, tmp, sizeof(tmp));
|
||||
} else {
|
||||
y->shape.unpack(yn, yd, yh, yw, yc);
|
||||
int tmp[5] = {sy[0],sy[2],sy[3],sy[4],sy[1]};
|
||||
memcpy(strideY, tmp, sizeof(tmp));
|
||||
}
|
||||
int dimY[] = {yn, yc, yd, yh, yw};
|
||||
|
||||
checkCudaErrors(cudnnSetTensorNdDescriptor(
|
||||
cudnnOdesc, getDataType<Ty>(),
|
||||
5, dimY, strideY
|
||||
));
|
||||
|
||||
cudnnConvolutionFwdAlgo_t algos[] = {
|
||||
CUDNN_CONVOLUTION_FWD_ALGO_GEMM,
|
||||
CUDNN_CONVOLUTION_FWD_ALGO_FFT,
|
||||
CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING,
|
||||
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM,
|
||||
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM,
|
||||
CUDNN_CONVOLUTION_FWD_ALGO_DIRECT,
|
||||
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD,
|
||||
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED,
|
||||
};
|
||||
int num_algos = CUDNN_CONVOLUTION_FWD_ALGO_COUNT;
|
||||
int perf_count;
|
||||
cudnnConvolutionFwdAlgoPerf_t perf_results[num_algos];
|
||||
cudnnConvolutionFwdAlgo_t algo;
|
||||
bool benchmark=true;
|
||||
|
||||
jk.clear();
|
||||
jk << dimX[0] << "," << dimX[1] << "," << dimX[2] << "," << dimX[3] << "," << dimX[4] << ",";
|
||||
jk << dimW[0] << "," << dimW[1] << "," << dimW[2] << "," << dimW[3] << "," << dimW[4] << ",";
|
||||
jk << paddingd << paddingh << paddingw << "," << strided << strideh <<stridew << "," << dilationd << dilationh << dilationw << "," << groups << ".";
|
||||
auto iter = fwd_algo_cache.find(jk.to_string());
|
||||
|
||||
if (iter!=fwd_algo_cache.end()) algo = iter->second;
|
||||
else {
|
||||
if (fwd_algo_cache.size()>=max_cache_size) benchmark = false;
|
||||
if (benchmark) {
|
||||
size_t max_ws_size = 0;
|
||||
for (int i = 0; i < num_algos; i++) {
|
||||
size_t sz;
|
||||
cudnnStatus_t ret = cudnnGetConvolutionForwardWorkspaceSize(
|
||||
handle_, cudnnIdesc, cudnnFdesc, cudnnConvDesc,
|
||||
cudnnOdesc, algos[i], &sz);
|
||||
// continue if use too much workspace
|
||||
if (sz > mem_info.total_cuda_ram * max_workspace_ratio) continue;
|
||||
if (CUDNN_STATUS_SUCCESS == ret && sz > max_ws_size) max_ws_size = sz;
|
||||
}
|
||||
size_t allocation;
|
||||
void* ws = exe.temp_allocator->alloc(max_ws_size, allocation);
|
||||
checkCudaErrors(cudnnFindConvolutionForwardAlgorithmEx(
|
||||
handle_,
|
||||
cudnnIdesc, x->ptr<Tx>(),
|
||||
cudnnFdesc, w->ptr<Tw>(),
|
||||
cudnnConvDesc,
|
||||
cudnnOdesc, y->ptr<Ty>(),
|
||||
num_algos,
|
||||
&perf_count,
|
||||
perf_results,
|
||||
ws,
|
||||
max_ws_size));
|
||||
exe.temp_allocator->free(ws, max_ws_size, allocation);
|
||||
} else {
|
||||
checkCudaErrors(cudnnGetConvolutionForwardAlgorithm_v7(
|
||||
handle_,
|
||||
cudnnIdesc,
|
||||
cudnnFdesc,
|
||||
cudnnConvDesc,
|
||||
cudnnOdesc,
|
||||
num_algos,
|
||||
&perf_count,
|
||||
perf_results));
|
||||
}
|
||||
int best_algo_idx=-1;
|
||||
for (int i = 0; i < perf_count; i++)
|
||||
if (perf_results[i].status == CUDNN_STATUS_SUCCESS){
|
||||
best_algo_idx=i;
|
||||
break;
|
||||
}
|
||||
ASSERT(best_algo_idx!=-1);
|
||||
algo=perf_results[best_algo_idx].algo;
|
||||
if (benchmark) {
|
||||
fwd_algo_cache[jk.to_string()] = algo;
|
||||
if (fwd_algo_cache.size()==max_cache_size)
|
||||
LOGw << "forward_ algorithm cache is full";
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: warp work space
|
||||
void *workSpace = 0;
|
||||
size_t workSpaceSize;
|
||||
checkCudaErrors (cudnnGetConvolutionForwardWorkspaceSize(
|
||||
handle_, cudnnIdesc, cudnnFdesc, cudnnConvDesc,
|
||||
cudnnOdesc, algo, &workSpaceSize) );
|
||||
size_t allocation;
|
||||
if (workSpaceSize > 0) {
|
||||
workSpace = exe.temp_allocator->alloc(workSpaceSize, allocation);
|
||||
}
|
||||
float alpha=1, beta=0;
|
||||
checkCudaErrors(cudnnConvolutionForward(
|
||||
handle_,
|
||||
(void*)(&alpha),
|
||||
cudnnIdesc, x->ptr<Tx>(),
|
||||
cudnnFdesc, w->ptr<Tw>(),
|
||||
cudnnConvDesc,
|
||||
algo,
|
||||
workSpace, workSpaceSize,
|
||||
(void*)(&beta),
|
||||
cudnnOdesc, y->ptr<Ty>())
|
||||
);
|
||||
if (workSpace)
|
||||
exe.temp_allocator->free(workSpace, workSpaceSize, allocation);
|
||||
|
||||
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnIdesc ));
|
||||
checkCudaErrors(cudnnDestroyFilterDescriptor( cudnnFdesc ));
|
||||
checkCudaErrors(cudnnDestroyTensorDescriptor( cudnnOdesc ));
|
||||
checkCudaErrors(cudnnDestroyConvolutionDescriptor( cudnnConvDesc ));
|
||||
}
|
||||
#endif
|
||||
#endif // JIT
|
||||
|
||||
} // jittor
|
|
@ -0,0 +1,24 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers: Dun Liang <randonlang@gmail.com>.
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#pragma once
|
||||
#include "op.h"
|
||||
|
||||
namespace jittor {
|
||||
|
||||
struct CudnnConv3dOp : Op {
|
||||
Var* x, * w, * y;
|
||||
int strided, strideh, stridew, paddingd, paddingh, paddingw, dilationd, dilationh, dilationw, groups;
|
||||
string xformat;
|
||||
CudnnConv3dOp(Var* x, Var* w, int strided, int strideh, int stridew, int paddingd, int paddingh, int paddingw, int dilationd=1, int dilationh=1, int dilationw=1, int groups=1, string xformat="ncdhw");
|
||||
|
||||
const char* name() const override { return "cudnn_conv3d"; }
|
||||
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
|
||||
void infer_shape() override;
|
||||
DECLARE_jit_run;
|
||||
};
|
||||
|
||||
} // jittor
|
|
@ -65,6 +65,7 @@
|
|||
#include <assert.h>
|
||||
|
||||
#include <cudnn.h>
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
#include "fp16_dev.h"
|
||||
#include "fp16_emu.h"
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
#include <cudnn.h>
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
|
||||
const char *_cudaGetErrorEnum(cudnnStatus_t error) {
|
||||
|
|
|
@ -11,6 +11,7 @@
|
|||
|
||||
#include <cuda_runtime.h>
|
||||
#include <cublas.h>
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
#include <curand.h>
|
||||
|
||||
|
|
|
@ -89,7 +89,7 @@ void CuttTransposeOp::jit_run() {
|
|||
reverse[i] = dim-1-new_axes[dim-1-i];
|
||||
for (int i=0; i<dim; i++)
|
||||
x_shape[i] = new_shape[dim-1-i];
|
||||
if (dim == 1) {
|
||||
if (dim == 1 || x->num==1) {
|
||||
checkCudaErrors(cudaMemcpyAsync(yp, xp, x->size, cudaMemcpyDefault, 0));
|
||||
return;
|
||||
}
|
||||
|
@ -105,7 +105,9 @@ void CuttTransposeOp::jit_run() {
|
|||
cuttExecute(iter->second, xp, yp);
|
||||
} else {
|
||||
cuttHandle plan;
|
||||
CHECK(0==cuttPlan(&plan, dim, x_shape.data(), reverse.data(), x->dtype().dsize(), 0));
|
||||
checkCudaErrors(cudaDeviceSynchronize());
|
||||
auto ret = cuttPlan(&plan, dim, x_shape.data(), reverse.data(), x->dtype().dsize(), 0);
|
||||
CHECK(0==ret) << ret << jk.to_string() << x << y;
|
||||
cutt_plan_cache[jk.to_string()] = plan;
|
||||
cuttExecute(plan, xp, yp);
|
||||
}
|
||||
|
|
|
@ -12,6 +12,7 @@
|
|||
|
||||
#include <cuda_runtime.h>
|
||||
#include <nccl.h>
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
|
||||
namespace jittor {
|
||||
|
|
|
@ -9,6 +9,7 @@
|
|||
*
|
||||
*/
|
||||
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
#include "fp16_dev.h"
|
||||
|
||||
|
|
|
@ -13,6 +13,7 @@
|
|||
// These are CUDA Helper functions for initialization and error checking
|
||||
|
||||
#include <cuda_runtime.h>
|
||||
#include "utils/log.h"
|
||||
#include "helper_cuda.h"
|
||||
|
||||
#ifdef _CUFFT_H_
|
||||
|
|
|
@ -778,7 +778,7 @@ void simple_net(int times = 100) {
|
|||
s.wait();
|
||||
}
|
||||
|
||||
// extern "C" int mkl_test_entry();
|
||||
// extern int mkl_test_entry();
|
||||
|
||||
int mkl_test_entry() {
|
||||
try {
|
||||
|
|
|
@ -15,25 +15,25 @@ def eye(shape, dtype):
|
|||
return jt.array(np.identity(shape[0])).unary(dtype)
|
||||
|
||||
def eye_(var):
|
||||
var.assign(eye(var.shape, var.dtype))
|
||||
return var.assign(eye(var.shape, var.dtype))
|
||||
|
||||
def constant(shape, dtype, value=0.0):
|
||||
return jt.array(value).unary(dtype).broadcast(shape)
|
||||
|
||||
def constant_(var, value=0.0):
|
||||
var.assign(constant(var.shape, var.dtype, value))
|
||||
return var.assign(constant(var.shape, var.dtype, value))
|
||||
|
||||
def uniform(shape, dtype, low, high):
|
||||
return jt.random(shape, dtype) * (low - high) + high
|
||||
|
||||
def uniform_(var, low, high):
|
||||
var.assign(uniform(var.shape, var.dtype, low, high))
|
||||
return var.assign(uniform(var.shape, var.dtype, low, high))
|
||||
|
||||
def gauss(shape, dtype, mean=0.0, std=1.0):
|
||||
return jt.random(shape, dtype, "normal") * std + mean
|
||||
|
||||
def gauss_(var, mean=0.0, std=1.0):
|
||||
var.assign(gauss(var.shape, var.dtype, mean, std))
|
||||
return var.assign(gauss(var.shape, var.dtype, mean, std))
|
||||
|
||||
def invariant_uniform(shape, dtype, mode="fan_in"):
|
||||
assert len(shape)>1
|
||||
|
@ -61,7 +61,7 @@ def relu_invariant_gauss(shape, dtype, mode="fan_in"):
|
|||
return gauss(shape, dtype, 0, std)
|
||||
|
||||
def relu_invariant_gauss_(var, mode="fan_in"):
|
||||
var.assign(relu_invariant_gauss(tuple(var.shape), var.dtype, mode))
|
||||
return var.assign(relu_invariant_gauss(tuple(var.shape), var.dtype, mode))
|
||||
|
||||
def calculate_std(var,mode,nonlinearity,param=0.01):
|
||||
mode = mode.lower()
|
||||
|
@ -100,7 +100,6 @@ def kaiming_normal_(var, a=0, mode='fan_in', nonlinearity='leaky_relu'):
|
|||
return gauss_(var,0, std)
|
||||
|
||||
|
||||
#TODO: bound = gain * math.sqrt(6.0/fan) ??
|
||||
def xavier_uniform(shape, dtype, gain=1.0):
|
||||
assert len(shape)>1
|
||||
|
||||
|
@ -108,11 +107,11 @@ def xavier_uniform(shape, dtype, gain=1.0):
|
|||
for i in shape[2:]:
|
||||
matsize *= i
|
||||
fan = (shape[1] * matsize) + (shape[0] * matsize)
|
||||
bound = gain * math.sqrt(1.0/fan)
|
||||
bound = gain * math.sqrt(6.0/fan)
|
||||
return uniform(shape, dtype, -bound, bound)
|
||||
|
||||
def xavier_uniform_(var, gain=1.0):
|
||||
var.assign(xavier_uniform(tuple(var.shape), var.dtype, gain))
|
||||
return var.assign(xavier_uniform(tuple(var.shape), var.dtype, gain))
|
||||
|
||||
def xavier_gauss(shape, dtype, gain=1.0):
|
||||
assert len(shape)>1
|
||||
|
@ -125,4 +124,4 @@ def xavier_gauss(shape, dtype, gain=1.0):
|
|||
return gauss(shape, dtype, 0, std)
|
||||
|
||||
def xavier_gauss_(var, gain=1.0):
|
||||
var.assign(xavier_gauss(tuple(var.shape), var.dtype, gain))
|
||||
return var.assign(xavier_gauss(tuple(var.shape), var.dtype, gain))
|
||||
|
|
|
@ -0,0 +1,2 @@
|
|||
from .chamfer import chamfer_loss, ChamferLoss
|
||||
from .emd import earth_mover_distance, EarthMoverDistance
|
|
@ -0,0 +1,153 @@
|
|||
# Author: Zheng-Ning Liu
|
||||
#
|
||||
# This file implements chamfer loss on both CPU and GPU.
|
||||
# The implementation does no use extra NxM matrix to store distances, and thus
|
||||
# supports large point clouds.
|
||||
|
||||
import jittor as jt
|
||||
import jittor.nn as nn
|
||||
|
||||
cpu_src = '''
|
||||
for (int bs = 0; bs < in0_shape0; ++bs)
|
||||
for (int i = 0; i < in0_shape1; ++i) {
|
||||
float min_dis = (@in0(bs, i, 0) - @in1(bs, 0, 0)) * (@in0(bs, i, 0) - @in1(bs, 0, 0)) +
|
||||
(@in0(bs, i, 1) - @in1(bs, 0, 1)) * (@in0(bs, i, 1) - @in1(bs, 0, 1)) +
|
||||
(@in0(bs, i, 2) - @in1(bs, 0, 2)) * (@in0(bs, i, 2) - @in1(bs, 0, 2));
|
||||
@out(bs, i) = 0;
|
||||
for (int j = 1; j < in1_shape1; ++j) {
|
||||
float dis = (@in0(bs, i, 0) - @in1(bs, j, 0)) * (@in0(bs, i, 0) - @in1(bs, j, 0)) +
|
||||
(@in0(bs, i, 1) - @in1(bs, j, 1)) * (@in0(bs, i, 1) - @in1(bs, j, 1)) +
|
||||
(@in0(bs, i, 2) - @in1(bs, j, 2)) * (@in0(bs, i, 2) - @in1(bs, j, 2));
|
||||
if (dis < min_dis) {
|
||||
min_dis = dis;
|
||||
@out(bs, i) = j;
|
||||
}
|
||||
}
|
||||
}
|
||||
'''
|
||||
|
||||
cuda_src = '''
|
||||
__global__ void chamfer_loss_min_idx_kernel(@ARGS_DEF) {
|
||||
@PRECALC
|
||||
int bs = blockIdx.x;
|
||||
int n = in0_shape1;
|
||||
int m = in1_shape1;
|
||||
|
||||
for (int i = threadIdx.x; i < n; i += blockDim.x) {
|
||||
float min_dis = (@in0(bs, i, 0) - @in1(bs, 0, 0)) * (@in0(bs, i, 0) - @in1(bs, 0, 0)) +
|
||||
(@in0(bs, i, 1) - @in1(bs, 0, 1)) * (@in0(bs, i, 1) - @in1(bs, 0, 1)) +
|
||||
(@in0(bs, i, 2) - @in1(bs, 0, 2)) * (@in0(bs, i, 2) - @in1(bs, 0, 2));
|
||||
@out(bs, i) = 0;
|
||||
for (int j = 1; j < m; ++j) {
|
||||
float dis = (@in0(bs, i, 0) - @in1(bs, j, 0)) * (@in0(bs, i, 0) - @in1(bs, j, 0)) +
|
||||
(@in0(bs, i, 1) - @in1(bs, j, 1)) * (@in0(bs, i, 1) - @in1(bs, j, 1)) +
|
||||
(@in0(bs, i, 2) - @in1(bs, j, 2)) * (@in0(bs, i, 2) - @in1(bs, j, 2));
|
||||
if (dis < min_dis) {
|
||||
min_dis = dis;
|
||||
@out(bs, i) = j;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
chamfer_loss_min_idx_kernel<<<in0_shape0, 512>>>(@ARGS);
|
||||
'''
|
||||
|
||||
|
||||
def chamfer_loss(pc1, pc2, reduction='mean', dims='BNC', bidirectional=False):
|
||||
''' return the chamfer loss from pc1 to pc2.
|
||||
|
||||
:param pc1: input point cloud
|
||||
:type pc1: jittor array
|
||||
|
||||
:param pc2: input point cloud
|
||||
:type pc2: jittor array
|
||||
|
||||
:param reduction: reduction method in batches, can be 'mean', 'sum', or None. Default: 'mean'.
|
||||
:type reduction: str, optional
|
||||
|
||||
:param dims: a string that represents each dimension, can be
|
||||
'[BNC]' ([batch, number of points, xyz]), or
|
||||
'[BCN]' ([batch, xyz, number of points]). Default: 'BNC'.
|
||||
:type dims: str, optional
|
||||
|
||||
Example:
|
||||
|
||||
>>> import jittor as jt
|
||||
>>> from jittor.loss3d import chamfer_loss
|
||||
>>> jt.flags.use_cuda = True
|
||||
>>> pc1 = jt.rand([10, 100, 3], dtype=jt.float32)
|
||||
>>> pc2 = jt.rand([10, 100, 3], dtype=jt.float32)
|
||||
>>> cf = chamfer_loss(pc1, pc2, dims='BNC', bidirectional=True)
|
||||
>>> print('chamfer loss =', cf.item())
|
||||
'''
|
||||
if bidirectional:
|
||||
return chamfer_loss(pc1, pc2, reduction, dims) + chamfer_loss(pc2, pc1, reduction, dims)
|
||||
|
||||
assert dims in ['BNC', 'BCN']
|
||||
if dims == 'BCN':
|
||||
pc1, pc2 = pc1.permute(0, 2, 1), pc2.permute(0, 2, 1)
|
||||
|
||||
batch_size_1, N, _ = pc1.shape
|
||||
batch_size_2, M, _ = pc2.shape
|
||||
assert batch_size_1 == batch_size_2
|
||||
batch_size = batch_size_1
|
||||
|
||||
idx = jt.code([batch_size, N], 'int32', [pc1, pc2],
|
||||
cpu_src=cpu_src,
|
||||
cuda_src=cuda_src)
|
||||
|
||||
nearest_pts = pc2.reindex([batch_size, idx.shape[1], 3], [
|
||||
'i0',
|
||||
'@e0(i0, i1)',
|
||||
'i2'
|
||||
], extras=[idx])
|
||||
|
||||
chamfer_distance = (((pc1 - nearest_pts) ** 2).sum(dim=-1)).sqrt()
|
||||
if reduction is None:
|
||||
return chamfer_distance
|
||||
elif reduction == 'sum':
|
||||
return jt.sum(chamfer_distance)
|
||||
elif reduction == 'mean':
|
||||
return jt.mean(chamfer_distance)
|
||||
|
||||
|
||||
class ChamferLoss(nn.Module):
|
||||
''' A loss layer that computes the chamfer loss from pc1 to pc2.
|
||||
|
||||
:param pc1: input point cloud
|
||||
:type pc1: jittor array
|
||||
|
||||
:param pc2: input point cloud
|
||||
:type pc2: jittor array
|
||||
|
||||
:param reduction: reduction method in batches, can be 'mean', 'sum', or None. Default: 'mean'.
|
||||
:type reduction: str, optional
|
||||
|
||||
:param dims: a string that represents each dimension, can be
|
||||
'[BNC]' ([batch, number of points, xyz]), or
|
||||
'[BCN]' ([batch, xyz, number of points]). Default: 'BNC'.
|
||||
:type dims: str, optional
|
||||
|
||||
Example:
|
||||
|
||||
>>> import jittor as jt
|
||||
>>> from jittor.loss3d import ChamferLoss
|
||||
>>> jt.flags.use_cuda = True
|
||||
>>> pc1 = jt.rand([10, 100, 3], dtype=jt.float32)
|
||||
>>> pc2 = jt.rand([10, 100, 3], dtype=jt.float32)
|
||||
>>> CF = ChamferLoss(dims='BNC', bidirectional=True)
|
||||
>>> cf = CF(pc1, pc2)
|
||||
>>> print('chamfer loss =', cf.item())
|
||||
'''
|
||||
|
||||
def __init__(self, reduction='mean', dims='BNC', bidirectional=False):
|
||||
''' see function @chamfer_loss
|
||||
'''
|
||||
super().__init__()
|
||||
self.reduction = reduction
|
||||
self.dims = dims
|
||||
self.bidirectional = bidirectional
|
||||
|
||||
def execute(self, pc1, pc2):
|
||||
return chamfer_loss(pc1, pc2, self.reduction, self.dims, self.bidirectional)
|
|
@ -0,0 +1,440 @@
|
|||
# Author: Zheng-Ning Liu
|
||||
#
|
||||
# The gpu implementation is original provided by Haoqiang Fan and Kaichun Mo,
|
||||
# <https://github.com/daerduoCarey/PyTorchEMD>.
|
||||
|
||||
import jittor as jt
|
||||
from jittor import Function
|
||||
|
||||
EMD_gpu_header = '''
|
||||
namespace jittor {
|
||||
__device__ inline out_type dist2(out_type x1, out_type y1, out_type z1,
|
||||
out_type x2, out_type y2, out_type z2) {
|
||||
return (x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1) + (z2 - z1) * (z2 - z1);
|
||||
}
|
||||
}
|
||||
'''
|
||||
|
||||
approxmatch_gpu_src = '''
|
||||
__global__ void approxmatch_gpu_kernel(@ARGS_DEF) {
|
||||
@PRECALC
|
||||
@alias(xyz1, in0)
|
||||
@alias(xyz2, in1)
|
||||
@alias(match, out)
|
||||
|
||||
int b = in0_shape0;
|
||||
int n = in0_shape1;
|
||||
int m = in1_shape1;
|
||||
|
||||
out_type *remainL = in2_p + blockIdx.x * (n + m) * 2;
|
||||
out_type *remainR = remainL + n;
|
||||
out_type *ratioL = remainR + m;
|
||||
out_type *ratioR = ratioL + n;
|
||||
|
||||
const int Block = 1024;
|
||||
__shared__ out_type buf[Block * 4];
|
||||
|
||||
for (int i = blockIdx.x; i < b; i += gridDim.x) {
|
||||
for (int j = threadIdx.x; j < n * m; j += blockDim.x)
|
||||
match_p[i * n * m + j] = 0;
|
||||
for (int j = threadIdx.x; j < n; j += blockDim.x)
|
||||
remainL[j] = n >= m ? 1 : m / n;
|
||||
for (int j = threadIdx.x; j < m; j += blockDim.x)
|
||||
remainR[j] = n >= m ? n / m : 1;
|
||||
__syncthreads();
|
||||
|
||||
for (int j = 7; j >= -2; j--) {
|
||||
out_type level = j > -2 ? -powf(4.0f, j) : 0;
|
||||
|
||||
for (int k0 = 0; k0 < n; k0 += blockDim.x) {
|
||||
int k = k0 + threadIdx.x;
|
||||
out_type x1 = 0, y1 = 0, z1 = 0;
|
||||
if (k < n) {
|
||||
x1 = @xyz1(i, k, 0);
|
||||
y1 = @xyz1(i, k, 1);
|
||||
z1 = @xyz1(i, k, 2);
|
||||
}
|
||||
|
||||
out_type suml = 1e-9f;
|
||||
for (int l0 = 0; l0 < m; l0 += Block){
|
||||
int lend = min(m, l0 + Block) - l0;
|
||||
for (int l = threadIdx.x; l < lend; l += blockDim.x) {
|
||||
buf[l * 4 + 0] = @xyz2(i, l0 + l, 0);
|
||||
buf[l * 4 + 1] = @xyz2(i, l0 + l, 1);
|
||||
buf[l * 4 + 2] = @xyz2(i, l0 + l, 2);
|
||||
buf[l * 4 + 3] = remainR[l0 + l];
|
||||
}
|
||||
__syncthreads();
|
||||
|
||||
for (int l = 0; l < lend; l++){
|
||||
out_type x2 = buf[l * 4 + 0];
|
||||
out_type y2 = buf[l * 4 + 1];
|
||||
out_type z2 = buf[l * 4 + 2];
|
||||
out_type d = level * dist2(x1, y1, z1, x2, y2, z2);
|
||||
out_type w = __expf(d) * buf[l * 4 + 3];
|
||||
suml += w;
|
||||
}
|
||||
__syncthreads();
|
||||
}
|
||||
if (k < n)
|
||||
ratioL[k] = remainL[k] / suml;
|
||||
}
|
||||
__syncthreads();
|
||||
|
||||
for (int l0 = 0; l0 < m; l0 += blockDim.x){
|
||||
int l = l0 + threadIdx.x;
|
||||
out_type x2 = 0, y2 = 0, z2 = 0;
|
||||
if (l < m){
|
||||
x2 = @xyz2(i, l, 0);
|
||||
y2 = @xyz2(i, l, 1);
|
||||
z2 = @xyz2(i, l, 2);
|
||||
}
|
||||
out_type sumr = 0;
|
||||
for (int k0 = 0; k0 < n; k0 += Block){
|
||||
int kend = min(n, k0 + Block) - k0;
|
||||
for (int k = threadIdx.x; k < kend; k += blockDim.x){
|
||||
buf[k * 4 + 0] = @xyz1(i, k0 + k, 0);
|
||||
buf[k * 4 + 1] = @xyz1(i, k0 + k, 1);
|
||||
buf[k * 4 + 2] = @xyz1(i, k0 + k, 2);
|
||||
buf[k * 4 + 3] = ratioL[k0 + k];
|
||||
}
|
||||
__syncthreads();
|
||||
|
||||
for (int k = 0; k < kend; k++){
|
||||
out_type x1 = buf[k * 4 + 0];
|
||||
out_type y1 = buf[k * 4 + 1];
|
||||
out_type z1 = buf[k * 4 + 2];
|
||||
out_type d = level * dist2(x1, y1, z1, x2, y2, z2);
|
||||
out_type w = __expf(d) * buf[k * 4 + 3];
|
||||
sumr += w;
|
||||
}
|
||||
__syncthreads();
|
||||
}
|
||||
|
||||
if (l < m){
|
||||
sumr *= remainR[l];
|
||||
out_type consumption = fminf(remainR[l] / (sumr + 1e-9f), 1.0f);
|
||||
ratioR[l] = consumption * remainR[l];
|
||||
remainR[l] = fmaxf(0.0f, remainR[l] - sumr);
|
||||
}
|
||||
}
|
||||
__syncthreads();
|
||||
|
||||
for (int k0 = 0; k0 < n; k0 += blockDim.x){
|
||||
int k = k0 + threadIdx.x;
|
||||
out_type x1 = 0, y1 = 0, z1 = 0;
|
||||
if (k < n){
|
||||
x1 = @xyz1(i, k, 0);
|
||||
y1 = @xyz1(i, k, 1);
|
||||
z1 = @xyz1(i, k, 2);
|
||||
}
|
||||
out_type suml = 0;
|
||||
for (int l0 = 0; l0 < m; l0 += Block){
|
||||
int lend = min(m, l0 + Block)-l0;
|
||||
for (int l = threadIdx.x; l < lend; l += blockDim.x){
|
||||
buf[l * 4 + 0] = @xyz2(i, l0 + l, 0);
|
||||
buf[l * 4 + 1] = @xyz2(i, l0 + l, 1);
|
||||
buf[l * 4 + 2] = @xyz2(i, l0 + l, 2);
|
||||
buf[l * 4 + 3] = ratioR[l0 + l];
|
||||
}
|
||||
__syncthreads();
|
||||
|
||||
out_type rl = ratioL[k];
|
||||
if (k < n){
|
||||
for (int l = 0; l < lend; l++){
|
||||
out_type x2 = buf[l * 4 + 0];
|
||||
out_type y2 = buf[l * 4 + 1];
|
||||
out_type z2 = buf[l * 4 + 2];
|
||||
out_type d = level * dist2(x1, y1, z1, x2, y2, z2);
|
||||
out_type w = __expf(d) * rl * buf[l*4+3];
|
||||
@match(i, l0 + l, k) += w;
|
||||
suml += w;
|
||||
}
|
||||
}
|
||||
__syncthreads();
|
||||
}
|
||||
if (k < n)
|
||||
remainL[k] = fmaxf(0.0f, remainL[k] - suml);
|
||||
}
|
||||
__syncthreads();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
approxmatch_gpu_kernel<<<32, 512>>>(@ARGS);
|
||||
'''
|
||||
|
||||
matchcost_gpu_src = '''
|
||||
__global__ void matchcost_gpu_kernel(@ARGS_DEF) {
|
||||
@PRECALC
|
||||
@alias(xyz1, in0)
|
||||
@alias(xyz2, in1)
|
||||
@alias(match, in2)
|
||||
|
||||
int b = in0_shape0;
|
||||
int n = in0_shape1;
|
||||
int m = in1_shape1;
|
||||
|
||||
const int Block = 1024;
|
||||
__shared__ out_type allsum[512];
|
||||
__shared__ out_type buf[Block * 3];
|
||||
|
||||
for (int i = blockIdx.x; i < b; i += gridDim.x) {
|
||||
out_type subsum = 0;
|
||||
for (int k0 = 0; k0 < n; k0 += blockDim.x) {
|
||||
int k = k0 + threadIdx.x;
|
||||
out_type x1 = 0, y1 = 0, z1 = 0;
|
||||
if (k < n) {
|
||||
x1 = @xyz1(i, k, 0);
|
||||
y1 = @xyz1(i, k, 1);
|
||||
z1 = @xyz1(i, k, 2);
|
||||
}
|
||||
|
||||
for (int l0 = 0; l0 < m; l0 += Block) {
|
||||
int lend = min(m, l0 + Block) - l0;
|
||||
for (int l = threadIdx.x; l < lend * 3; l += blockDim.x)
|
||||
buf[l] = xyz2_p[i * m * 3 + l0 * 3 + l];
|
||||
__syncthreads();
|
||||
|
||||
if (k < n) {
|
||||
for (int l = 0; l < lend; l++) {
|
||||
out_type x2 = buf[l * 3 + 0];
|
||||
out_type y2 = buf[l * 3 + 1];
|
||||
out_type z2 = buf[l * 3 + 2];
|
||||
out_type d = dist2(x1, y1, z1, x2, y2, z2);
|
||||
subsum += d * @match(i, l0 + l, k);
|
||||
}
|
||||
}
|
||||
__syncthreads();
|
||||
}
|
||||
}
|
||||
|
||||
allsum[threadIdx.x] = subsum;
|
||||
for (int j = 1; j < blockDim.x; j <<= 1) {
|
||||
__syncthreads();
|
||||
if ((threadIdx.x & j) == 0 && threadIdx.x + j < blockDim.x) {
|
||||
allsum[threadIdx.x] += allsum[threadIdx.x + j];
|
||||
}
|
||||
}
|
||||
|
||||
if (threadIdx.x == 0)
|
||||
@out(i) = allsum[0];
|
||||
__syncthreads();
|
||||
}
|
||||
}
|
||||
|
||||
matchcost_gpu_kernel<<<32, 512>>>(@ARGS);
|
||||
'''
|
||||
|
||||
matchcost_grad1_gpu_src = '''
|
||||
__global__ void matchcost_grad1_gpu_kernel(@ARGS_DEF) {
|
||||
@PRECALC
|
||||
@alias(grad, in0)
|
||||
@alias(xyz1, in1)
|
||||
@alias(xyz2, in2)
|
||||
@alias(match, in3)
|
||||
|
||||
int b = grad_shape0;
|
||||
int n = xyz1_shape1;
|
||||
int m = xyz2_shape1;
|
||||
|
||||
for (int i = blockIdx.x; i < b ; i += gridDim.x){
|
||||
for (int l = threadIdx.x; l < n; l += blockDim.x){
|
||||
out_type x1 = @xyz1(i, l, 0);
|
||||
out_type y1 = @xyz1(i, l, 1);
|
||||
out_type z1 = @xyz1(i, l, 2);
|
||||
out_type dx = 0, dy = 0, dz = 0;
|
||||
for (int k = 0; k < m; k++){
|
||||
out_type x2 = @xyz2(i, k, 0);
|
||||
out_type y2 = @xyz2(i, k, 1);
|
||||
out_type z2 = @xyz2(i, k, 2);
|
||||
out_type d = @match(i, k, l) * 2;
|
||||
dx += (x1 - x2) * d;
|
||||
dy += (y1 - y2) * d;
|
||||
dz += (z1 - z2) * d;
|
||||
}
|
||||
@out(i, l, 0) = dx * @grad(i);
|
||||
@out(i, l, 1) = dy * @grad(i);
|
||||
@out(i, l, 2) = dz * @grad(i);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
matchcost_grad1_gpu_kernel<<<32, 512>>>(@ARGS);
|
||||
'''
|
||||
|
||||
matchcost_grad2_gpu_src = '''
|
||||
__global__ void matchcost_grad2_gpu_kernel(@ARGS_DEF) {
|
||||
@PRECALC
|
||||
@alias(grad, in0)
|
||||
@alias(xyz1, in1)
|
||||
@alias(xyz2, in2)
|
||||
@alias(match, in3)
|
||||
|
||||
int b = grad_shape0;
|
||||
int n = xyz1_shape1;
|
||||
int m = xyz2_shape1;
|
||||
|
||||
__shared__ out_type sum_grad[256 * 3];
|
||||
for (int i = blockIdx.x; i < b; i += gridDim.x) {
|
||||
int kbeg = m * blockIdx.y / gridDim.y;
|
||||
int kend = m * (blockIdx.y + 1) / gridDim.y;
|
||||
for (int k = kbeg; k < kend; k++) {
|
||||
out_type x2 = @xyz2(i, k, 0);
|
||||
out_type y2 = @xyz2(i, k, 1);
|
||||
out_type z2 = @xyz2(i, k, 2);
|
||||
out_type subsumx = 0, subsumy = 0, subsumz = 0;
|
||||
for (int j = threadIdx.x; j < n; j += blockDim.x) {
|
||||
out_type x1 = x2 - @xyz1(i, j, 0);
|
||||
out_type y1 = y2 - @xyz1(i, j, 1);
|
||||
out_type z1 = z2 - @xyz1(i, j, 2);
|
||||
out_type d = @match(i, k, j) * 2;
|
||||
subsumx += x1 * d;
|
||||
subsumy += y1 * d;
|
||||
subsumz += z1 * d;
|
||||
}
|
||||
sum_grad[threadIdx.x * 3 + 0] = subsumx;
|
||||
sum_grad[threadIdx.x * 3 + 1] = subsumy;
|
||||
sum_grad[threadIdx.x * 3 + 2] = subsumz;
|
||||
|
||||
for (int j = 1; j < blockDim.x; j <<= 1) {
|
||||
__syncthreads();
|
||||
int j1 = threadIdx.x;
|
||||
int j2 = threadIdx.x + j;
|
||||
if ((j1 & j) == 0 && j2 < blockDim.x){
|
||||
sum_grad[j1 * 3 + 0] += sum_grad[j2 * 3 + 0];
|
||||
sum_grad[j1 * 3 + 1] += sum_grad[j2 * 3 + 1];
|
||||
sum_grad[j1 * 3 + 2] += sum_grad[j2 * 3 + 2];
|
||||
}
|
||||
}
|
||||
if (threadIdx.x == 0){
|
||||
@out(i, k, 0) = sum_grad[0] * @grad(i);
|
||||
@out(i, k, 1) = sum_grad[1] * @grad(i);
|
||||
@out(i, k, 2) = sum_grad[2] * @grad(i);
|
||||
}
|
||||
__syncthreads();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
matchcost_grad2_gpu_kernel<<<dim3(32, 32), 256>>>(@ARGS);
|
||||
'''
|
||||
|
||||
class EarthMoverDistance(Function):
|
||||
''' A loss layer that computes Earth Mover's distance from pc1 to pc2. Only supports GPU.
|
||||
|
||||
:param pc1: input point cloud
|
||||
:type pc1: jittor array
|
||||
|
||||
:param pc2: input point cloud
|
||||
:type pc2: jittor array
|
||||
|
||||
:param reduction: reduction method in batches, can be 'mean', 'sum', or None. Default: 'mean'.
|
||||
:type reduction: str, optional
|
||||
|
||||
:param dims: a string that represents each dimension, can be
|
||||
'[BNC]' ([batch, number of points, xyz]), or
|
||||
'[BCN]' ([batch, xyz, number of points]). Default: 'BNC'.
|
||||
:type dims: str, optional
|
||||
|
||||
Example:
|
||||
|
||||
>>> import jittor as jt
|
||||
>>> from jittor.loss3d import EarthMoverDistance
|
||||
>>> jt.flags.use_cuda = True
|
||||
>>> pc1 = jt.rand([10, 100, 3], dtype=jt.float32)
|
||||
>>> pc2 = jt.rand([10, 100, 3], dtype=jt.float32)
|
||||
>>> EMD = EarthMoverDistance(dims='BNC')
|
||||
>>> emd = EMD(pc1, pc2)
|
||||
>>> print('EMD =', emd.item())
|
||||
'''
|
||||
def execute(self, pc1, pc2, reduction='mean', dims='BNC'):
|
||||
assert dims in ['BNC', 'BCN']
|
||||
if dims == 'BCN':
|
||||
pc1, pc2 = pc1.permute(0, 2, 1), pc2.permute(0, 2, 1)
|
||||
|
||||
batch_size_1, N, _ = pc1.shape
|
||||
batch_size_2, M, _ = pc2.shape
|
||||
assert batch_size_1 == batch_size_2
|
||||
batch_size = batch_size_1
|
||||
|
||||
temp = jt.zeros([batch_size, (N + M) * 2], pc1.dtype)
|
||||
match = jt.code(
|
||||
shape=[batch_size, M, N],
|
||||
dtype=pc1.dtype,
|
||||
inputs=[pc1, pc2, temp],
|
||||
cuda_header=EMD_gpu_header,
|
||||
cuda_src=approxmatch_gpu_src,
|
||||
)
|
||||
|
||||
emd = jt.code(
|
||||
shape=[batch_size],
|
||||
dtype=pc1.dtype,
|
||||
inputs=[pc1, pc2, match],
|
||||
cuda_header=EMD_gpu_header,
|
||||
cuda_src=matchcost_gpu_src,
|
||||
)
|
||||
|
||||
self.saved_vars = (pc1, pc2, match, reduction)
|
||||
|
||||
if reduction is None:
|
||||
return emd
|
||||
elif reduction == 'sum':
|
||||
return emd.sum()
|
||||
elif reduction == 'mean':
|
||||
return emd.mean()
|
||||
|
||||
def grad(self, grad):
|
||||
pc1, pc2, match, reduction = self.saved_vars
|
||||
|
||||
if reduction == 'sum':
|
||||
grad = jt.ones([pc1.shape[0]]) * grad
|
||||
elif reduction == 'mean':
|
||||
grad = jt.ones([pc1.shape[0]]) * grad / pc1.shape[0]
|
||||
|
||||
grad_pc1 = jt.code(
|
||||
shape=pc1.shape,
|
||||
dtype=pc1.dtype,
|
||||
inputs=[grad, pc1, pc2, match],
|
||||
cuda_src=matchcost_grad1_gpu_src,
|
||||
)
|
||||
|
||||
grad_pc2 = jt.code(
|
||||
shape=pc2.shape,
|
||||
dtype=pc2.dtype,
|
||||
inputs=[grad, pc1, pc2, match],
|
||||
cuda_src=matchcost_grad2_gpu_src,
|
||||
)
|
||||
|
||||
return grad_pc1, grad_pc2
|
||||
|
||||
|
||||
def earth_mover_distance(pc1, pc2, reduction='mean', dims='BNC'):
|
||||
''' Earth Mover's distance from pc1 to pc2. Only supports GPU.
|
||||
|
||||
:param pc1: input point cloud
|
||||
:type pc1: jittor array
|
||||
|
||||
:param pc2: input point cloud
|
||||
:type pc2: jittor array
|
||||
|
||||
:param reduction: reduction method in batches, can be 'mean', 'sum', or None. Default: 'mean'.
|
||||
:type reduction: str, optional
|
||||
|
||||
:param dims: a string that represents each dimension, can be
|
||||
'[BNC]' ([batch, number of points, xyz]), or
|
||||
'[BCN]' ([batch, xyz, number of points]). Default: 'BNC'.
|
||||
:type dims: str, optional
|
||||
|
||||
|
||||
Example:
|
||||
|
||||
>>> import jittor as jt
|
||||
>>> from jittor.loss3d import earth_mover_distance
|
||||
>>> jt.flags.use_cuda = True
|
||||
>>> pc1 = jt.rand([10, 100, 3], dtype=jt.float32)
|
||||
>>> pc2 = jt.rand([10, 100, 3], dtype=jt.float32)
|
||||
>>> emd = earth_mover_distance(pc1, pc2, dims='BNC')
|
||||
>>> print('EMD =', emd.item())
|
||||
'''
|
||||
return EarthMoverDistance.apply(pc1, pc2, reduction, dims)
|
|
@ -116,6 +116,65 @@ class CosineAnnealingLR(object):
|
|||
if param_group.get("lr") != None:
|
||||
param_group["lr"] = self.get_lr(self.base_lr_pg[i], param_group["lr"])
|
||||
|
||||
|
||||
class ExponentialLR(object):
|
||||
""" learning rate is multiplied by gamma in each step.
|
||||
"""
|
||||
def __init__(self, optimizer, gamma, last_epoch=-1):
|
||||
self.optimizer = optimizer
|
||||
self.gamma = gamma
|
||||
self.last_epoch = last_epoch
|
||||
self.base_lr = optimizer.lr
|
||||
self.base_lr_pg = [pg.get("lr") for pg in optimizer.param_groups]
|
||||
|
||||
def get_lr(self, base_lr, now_lr):
|
||||
if self.last_epoch == 0:
|
||||
return base_lr
|
||||
return base_lr * self.gamma ** self.last_epoch
|
||||
|
||||
def step(self):
|
||||
self.last_epoch += 1
|
||||
self.update_lr()
|
||||
|
||||
def update_lr(self):
|
||||
self.optimizer.lr = self.get_lr(self.base_lr, self.optimizer.lr)
|
||||
for i, param_group in enumerate(self.optimizer.param_groups):
|
||||
if param_group.get("lr") != None:
|
||||
param_group["lr"] = self.get_lr(self.base_lr_pg[i], param_group["lr"])
|
||||
|
||||
|
||||
class StepLR(object):
|
||||
def __init__(self, optimizer, step_size, gamma=0.1, last_epoch=-1):
|
||||
self.optimizer = optimizer
|
||||
self.step_size = step_size
|
||||
self.gamma = gamma
|
||||
self.last_epoch = last_epoch
|
||||
self.cur_epoch = 0
|
||||
|
||||
def get_gamma(self):
|
||||
if self.last_epoch < 0:
|
||||
if (self.cur_epoch != 0 and (self.cur_epoch + 1) % self.step_size == 0):
|
||||
return self.gamma
|
||||
else:
|
||||
if (self.cur_epoch + 1 + self.last_epoch) % self.step_size == 0:
|
||||
return self.gamma
|
||||
return 1.
|
||||
|
||||
def get_lr(self):
|
||||
return self.optimizer.lr
|
||||
|
||||
def step(self):
|
||||
self.update_lr()
|
||||
self.cur_epoch += 1
|
||||
|
||||
def update_lr(self):
|
||||
gamma = self.get_gamma()
|
||||
if gamma != 1.:
|
||||
self.optimizer.lr = self.optimizer.lr * gamma
|
||||
for i, param_group in enumerate(self.optimizer.param_groups):
|
||||
if param_group.get("lr") != None:
|
||||
param_group["lr"] = param_group["lr"] * gamma
|
||||
|
||||
class MultiStepLR(object):
|
||||
def __init__(self, optimizer, milestones=[], gamma=0.1, last_epoch=-1):
|
||||
self.optimizer = optimizer
|
||||
|
|
|
@ -182,7 +182,28 @@ def chunk(x, chunks, dim=0):
|
|||
jt.Var.chunk = chunk
|
||||
|
||||
|
||||
def expand(x, shape):
|
||||
def expand(x, *shape):
|
||||
''' Expand and broadcast this array, -1 represents this dimension is not changed.
|
||||
|
||||
Example::
|
||||
|
||||
a = jt.zeros((3,1))
|
||||
b = a.expand(3, 4)
|
||||
assert b.shape == (3,4)
|
||||
b = a.expand(-1, 4)
|
||||
assert b.shape == (3,4)
|
||||
b = a.expand((3, 4))
|
||||
assert b.shape == (3,4)
|
||||
b = a.expand((-1, 4))
|
||||
assert b.shape == (3,4)
|
||||
|
||||
'''
|
||||
if len(shape) == 1 and isinstance(shape[0], (tuple,list,jt.NanoVector)):
|
||||
shape = shape[0]
|
||||
shape = list(shape)
|
||||
for i in range(len(shape)):
|
||||
if shape[i] == -1:
|
||||
shape[i] = x.shape[i]
|
||||
return x.broadcast(shape)
|
||||
jt.Var.expand = expand
|
||||
|
||||
|
@ -634,23 +655,6 @@ def kthvalue(input, k, dim=None, keepdim=False):
|
|||
|
||||
jt.Var.kthvalue = kthvalue
|
||||
|
||||
|
||||
def gather(x,dim,index):
|
||||
if dim<0:
|
||||
dim+=index.ndim
|
||||
x_shape = list(x.shape )
|
||||
i_shape = list(index.shape)
|
||||
assert i_shape[dim]>0
|
||||
assert x.ndim == index.ndim
|
||||
i_shape[dim]=x_shape[dim]
|
||||
assert i_shape == x_shape
|
||||
ins = []
|
||||
for i in range(index.ndim):
|
||||
ins.append(jt.index(index.shape,dim=i))
|
||||
ins[dim]=index
|
||||
return x.reindex(ins)
|
||||
jt.Var.gather = gather
|
||||
|
||||
def _prod(x,dim=0):
|
||||
x = jt.log(x)
|
||||
x = x.sum(dim=dim)
|
||||
|
@ -708,7 +712,6 @@ def nms(dets,thresh):
|
|||
return order[selected]
|
||||
|
||||
|
||||
jt.Var.expand = jt.Var.broadcast
|
||||
jt.Var.expand_as = jt.Var.broadcast_var
|
||||
|
||||
|
||||
|
@ -1061,6 +1064,13 @@ def randperm(n, dtype="int32"):
|
|||
return index.cast(dtype)
|
||||
|
||||
def set_global_seed(seed):
|
||||
''' Sets the seeds of the random number generators of Python, numpy and jittor,
|
||||
simultaneously.
|
||||
|
||||
.. note::
|
||||
Jittor also gurantees each worker of jittor.dataset.Dataset to hold a different seed,
|
||||
which is global_seed ^ worker_id ^ 1234.
|
||||
'''
|
||||
import random
|
||||
random.seed(seed)
|
||||
jt.set_seed(seed)
|
||||
|
@ -1197,7 +1207,7 @@ def gather(x, dim, index):
|
|||
|
||||
Parameters::
|
||||
|
||||
* input (jt.Var) – the source array
|
||||
* x (jt.Var) – the source array
|
||||
* dim (int) – the axis along which to index
|
||||
* index (jt.Var) – the indices of elements to gather
|
||||
|
||||
|
@ -1216,3 +1226,46 @@ Example::
|
|||
return x.getitem(tuple(indexes))
|
||||
|
||||
jt.Var.gather = gather
|
||||
|
||||
def roll(x, shifts, dims=None):
|
||||
'''Roll the tensor along the given dimension(s).
|
||||
|
||||
Parameters::
|
||||
|
||||
* x (jt.Var) – the source array
|
||||
* shifts (int or tuple) – shift offset of dims
|
||||
* dims (int or tuple) – shift dims
|
||||
|
||||
Examples::
|
||||
|
||||
x = jt.array([1, 2, 3, 4, 5, 6, 7, 8]).view(4, 2)
|
||||
y = x.roll(1, 0)
|
||||
assert (y.numpy() == [[7,8],[1,2],[3,4],[5,6]]).all()
|
||||
y = x.roll(-1, 0)
|
||||
assert (y.numpy() == [[3,4],[5,6],[7,8],[1,2]]).all()
|
||||
y = x.roll(shifts=(2, 1), dims=(0, 1))
|
||||
assert (y.numpy() == [[6,5],[8,7],[2,1],[4,3]]).all()
|
||||
|
||||
'''
|
||||
if isinstance(shifts, int):
|
||||
shifts = (shifts,)
|
||||
if dims is None:
|
||||
dims = tuple(range(len(shifts)))
|
||||
elif isinstance(dims, int):
|
||||
dims = (dims,)
|
||||
assert len(dims) == len(shifts)
|
||||
ids = [ f'i{i}' for i in range(x.ndim) ]
|
||||
for i in range(len(dims)):
|
||||
shift = shifts[i]
|
||||
d = dims[i]
|
||||
size = x.shape[d]
|
||||
shift = shift % size
|
||||
if shift<0: shift += size
|
||||
ids[d] = f'(i{d}<{shift}?i{d}+{size-shift}:(i{d}-{shift}))'
|
||||
return x.reindex(x.shape, ids)
|
||||
|
||||
jt.Var.roll = roll
|
||||
|
||||
def safe_log(x):
|
||||
return jt.safe_clip(x, 1e-30, 1e30).log()
|
||||
jt.Var.safe_log = safe_log
|
||||
|
|
|
@ -20,19 +20,20 @@ import math
|
|||
from collections import OrderedDict
|
||||
from jittor.pool import *
|
||||
from jittor.optim import *
|
||||
from jittor.misc import _pair
|
||||
from jittor.misc import _pair, _triple
|
||||
from jittor_utils import LOG
|
||||
|
||||
|
||||
def matmul_transpose(a, b):
|
||||
'''
|
||||
returns a * b^T
|
||||
'''
|
||||
assert len(a.shape) >= 2 and len(b.shape) == 2
|
||||
assert a.shape[-1] == b.shape[-1], (a.shape, b.shape)
|
||||
if len(a.shape)>2:
|
||||
if len(a.shape) != 2:
|
||||
aa = a.reshape((-1, a.shape[-1]))
|
||||
cc = matmul_transpose(aa, b)
|
||||
return cc.reshape(a.shape[:-1]+(-1,))
|
||||
assert len(a.shape) == 2 and len(b.shape) == 2
|
||||
|
||||
shape = list(a.shape)[:-1] + list(b.shape)
|
||||
a = a.broadcast(shape, [len(shape)-2])
|
||||
|
@ -44,7 +45,7 @@ def bmm_transpose(a, b):
|
|||
'''
|
||||
returns a * b^T
|
||||
'''
|
||||
if jt.flags.use_cuda:
|
||||
if jt.flags.use_cuda and jt.compile_extern.cublas_ops:
|
||||
return jt.compile_extern.cublas_ops.cublas_batched_matmul(a, b, 0, 1)
|
||||
t = list(range(b.ndim))
|
||||
t[-1], t[-2] = t[-2], t[-1]
|
||||
|
@ -117,7 +118,7 @@ Example::
|
|||
if len_a>=3 and len_a==len_b:
|
||||
# bmm
|
||||
# a: [..., n, m], b: [..., m, k], c:[..., n, k]
|
||||
if jt.flags.use_cuda:
|
||||
if jt.flags.use_cuda and jt.compile_extern.cublas_ops:
|
||||
return jt.compile_extern.cublas_ops.cublas_batched_matmul(a, b, 0, 0)
|
||||
shape = []
|
||||
len_c = max(len_a, len_b)
|
||||
|
@ -178,36 +179,45 @@ class ELU(Module):
|
|||
class PReLU(Module):
|
||||
def __init__(self, num_parameters=1, init_=0.25):
|
||||
self.num_parameters = num_parameters
|
||||
self.a = init.constant((num_parameters,), "float32", init_)
|
||||
self.weight = init.constant((num_parameters,), "float32", init_)
|
||||
|
||||
def execute(self, x):
|
||||
if self.num_parameters != 1:
|
||||
assert self.num_parameters == x.size(1), f"num_parameters does not match input channels in PReLU"
|
||||
return jt.maximum(0, x) + self.a.broadcast(x, [0,2,3]) * jt.minimum(0, x)
|
||||
return jt.maximum(0, x) + self.weight.broadcast(x, [0,2,3]) * jt.minimum(0, x)
|
||||
else:
|
||||
return jt.maximum(0, x) + self.a * jt.minimum(0, x)
|
||||
return jt.maximum(0, x) + self.weight * jt.minimum(0, x)
|
||||
|
||||
#TODO dims is 4 will cause slowly execution
|
||||
def cross_entropy_loss(output, target, ignore_index=None):
|
||||
def cross_entropy_loss(output, target, weight=None, ignore_index=None,reduction='sum'):
|
||||
if len(output.shape) == 4:
|
||||
c_dim = output.shape[1]
|
||||
output = output.transpose((0, 2, 3, 1))
|
||||
output = output.reshape((-1, c_dim))
|
||||
if ignore_index is not None:
|
||||
target = jt.ternary(target==ignore_index,
|
||||
jt.array(-1).broadcast(target), target)
|
||||
mask = jt.logical_and(target >= 0, target < output.shape[1])
|
||||
|
||||
target = target.reshape((-1, ))
|
||||
target_weight = jt.ones(target.shape[0], dtype='float32')
|
||||
if weight is not None:
|
||||
target_weight = weight[target]
|
||||
if ignore_index is not None:
|
||||
target_weight = jt.ternary(
|
||||
target==ignore_index,
|
||||
jt.array(0).broadcast(target_weight),
|
||||
target_weight
|
||||
)
|
||||
|
||||
target = target.broadcast(output, [1])
|
||||
target = target.index(1) == target
|
||||
|
||||
output = output - output.max([1], keepdims=True)
|
||||
loss = output.exp().sum(1).log()
|
||||
loss = loss - (output*target).sum(1)
|
||||
if ignore_index is None:
|
||||
return loss.mean()
|
||||
logsum = output.exp().sum(1).log()
|
||||
loss = (logsum - (output*target).sum(1)) * target_weight
|
||||
if reduction == 'sum':
|
||||
return loss.sum() / target_weight.sum()
|
||||
elif reduction == 'mean':
|
||||
return loss.mean() / target_weight.mean()
|
||||
else:
|
||||
return loss.sum() / jt.maximum(mask.int().sum(), 1)
|
||||
return loss / target_weight
|
||||
|
||||
def mse_loss(output, target):
|
||||
return (output-target).sqr().mean()
|
||||
|
@ -273,11 +283,12 @@ def nll_loss(output,target,weight=None,ignore_index=-100,reduction='mean'):
|
|||
raise ValueError(f'not support {reduction}')
|
||||
|
||||
class CrossEntropyLoss(Module):
|
||||
def __init__(self,ignore_index=None):
|
||||
def __init__(self, weight=None, ignore_index=None):
|
||||
self.weight = weight
|
||||
self.ignore_index = ignore_index
|
||||
|
||||
def execute(self, output, target):
|
||||
return cross_entropy_loss(output, target,self.ignore_index)
|
||||
return cross_entropy_loss(output, target, self.weight, self.ignore_index)
|
||||
|
||||
class MSELoss(Module):
|
||||
def __init__(self):
|
||||
|
@ -423,7 +434,16 @@ class BatchNorm(Module):
|
|||
norm_x = x * w.broadcast(x, dims) + b.broadcast(x, dims)
|
||||
return norm_x
|
||||
|
||||
BatchNorm2d = BatchNorm1d = BatchNorm
|
||||
BatchNorm3d = BatchNorm2d = BatchNorm1d = BatchNorm
|
||||
|
||||
def batch_norm(x, running_mean, running_var, weight=1, bias=0, training=False, momentum=0.1, eps=1e-05):
|
||||
dims = [0]+list(range(2,x.ndim))
|
||||
assert not training
|
||||
w = weight / jt.sqrt(running_var+eps)
|
||||
b = bias - running_mean * w
|
||||
norm_x = x * w.broadcast(x, dims) + b.broadcast(x, dims)
|
||||
return norm_x
|
||||
|
||||
|
||||
class InstanceNorm(Module):
|
||||
def __init__(self, num_features, eps=1e-05, momentum=0.1, affine=True, is_train=True, sync=True):
|
||||
|
@ -447,7 +467,23 @@ class InstanceNorm(Module):
|
|||
b = self.bias - xmean * w
|
||||
return x * w.broadcast(x, dims) + b.broadcast(x, dims)
|
||||
|
||||
InstanceNorm2d = InstanceNorm1d = InstanceNorm
|
||||
InstanceNorm3d = InstanceNorm2d = InstanceNorm1d = InstanceNorm
|
||||
|
||||
def instance_norm(x,
|
||||
running_mean = None,
|
||||
running_var = None,
|
||||
weight = 1,
|
||||
bias = 0,
|
||||
momentum = 0.1,
|
||||
eps = 1e-5):
|
||||
dims = list(range(2,x.ndim))
|
||||
xmean = jt.mean(x, dims=dims)
|
||||
x2mean = jt.mean(x*x, dims=dims)
|
||||
|
||||
xvar = (x2mean-xmean*xmean).maximum(0.0)
|
||||
w = weight / jt.sqrt(xvar+eps)
|
||||
b = bias - xmean * w
|
||||
return x * w.broadcast(x, dims) + b.broadcast(x, dims)
|
||||
|
||||
class LayerNorm(Module):
|
||||
def __init__(self, normalized_shape, eps: float = 1e-5, elementwise_affine: bool = True) -> None:
|
||||
|
@ -470,7 +506,22 @@ class LayerNorm(Module):
|
|||
return x * w + b
|
||||
|
||||
|
||||
LayerNorm2d = LayerNorm1d = LayerNorm
|
||||
LayerNorm3d = LayerNorm2d = LayerNorm1d = LayerNorm
|
||||
|
||||
def layer_norm(x,
|
||||
normalized_shape,
|
||||
weight = 1,
|
||||
bias = 0,
|
||||
eps: float = 1e-5,
|
||||
elementwise_affine: bool = True):
|
||||
dims = [-i for i in range(len(normalized_shape), 0, -1)]
|
||||
xmean = jt.mean(x, dims=dims, keepdims=1)
|
||||
x2mean = jt.mean(x*x, dims=dims, keepdims=1)
|
||||
|
||||
xvar = (x2mean-xmean*xmean).maximum(0.0)
|
||||
w = weight / jt.sqrt(xvar+eps)
|
||||
b = bias - xmean * w
|
||||
return x * w + b
|
||||
|
||||
class GroupNorm(Module):
|
||||
def __init__(self, num_groups, num_channels, eps=1e-05, affine=True, is_train=True):
|
||||
|
@ -506,6 +557,33 @@ class GroupNorm(Module):
|
|||
x = x * w.broadcast(x, [3]) + b.broadcast(x, [3])
|
||||
return x.reshape(output_shape)
|
||||
|
||||
def group_norm(x,
|
||||
num_groups,
|
||||
weight = 1,
|
||||
bias = 0,
|
||||
eps=1e-05):
|
||||
N = x.shape[0]
|
||||
C = x.shape[1]
|
||||
output_shape = (N,-1)
|
||||
# TODO: 3d group norm
|
||||
if x.ndim==4:
|
||||
output_shape = x.shape
|
||||
assert C % num_groups == 0
|
||||
x = x.reshape((N, num_groups, C//num_groups, -1))
|
||||
xmean = jt.mean(x, dims=[2,3]).reshape((N, num_groups, 1))
|
||||
x2mean = jt.mean(x*x, dims=[2,3]).reshape((N, num_groups, 1))
|
||||
xvar = (x2mean-xmean*xmean).maximum(0.0)
|
||||
|
||||
if isinstance(weight, jt.Var):
|
||||
weight = weight.reshape((1, num_groups, -1))
|
||||
if isinstance(bias, jt.Var):
|
||||
bias = bias.reshape((1, num_groups, -1))
|
||||
weight = weight / jt.sqrt(xvar+eps)
|
||||
bias = bias - xmean * weight
|
||||
x = x * weight.broadcast(x, [3]) + bias.broadcast(x, [3])
|
||||
return x.reshape(output_shape)
|
||||
|
||||
|
||||
Relu = jt.make_module(relu)
|
||||
ReLU = Relu
|
||||
Leaky_relu = jt.make_module(leaky_relu, 2)
|
||||
|
@ -637,6 +715,34 @@ class Conv1d(Module):
|
|||
y = x.squeeze(-1)
|
||||
return y
|
||||
|
||||
class Conv3d(Module):
|
||||
def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True):
|
||||
self.in_channels = in_channels
|
||||
self.out_channels = out_channels
|
||||
self.kernel_size = kernel_size if isinstance(kernel_size, tuple) else (kernel_size, kernel_size, kernel_size)
|
||||
self.stride = stride if isinstance(stride, tuple) else (stride, stride, stride)
|
||||
self.padding = padding if isinstance(padding, tuple) else (padding, padding, padding)
|
||||
self.dilation = dilation if isinstance(dilation, tuple) else (dilation, dilation, dilation)
|
||||
self.groups = groups
|
||||
assert in_channels % groups == 0, 'in_channels must be divisible by groups'
|
||||
assert out_channels % groups == 0, 'out_channels must be divisible by groups'
|
||||
Kh, Kw, Kd = self.kernel_size
|
||||
self.groups = groups
|
||||
assert in_channels % groups == 0, 'in_channels must be divisible by groups'
|
||||
assert out_channels % groups == 0, 'out_channels must be divisible by groups'
|
||||
|
||||
self.weight = init.invariant_uniform([out_channels, in_channels//groups, Kh, Kw, Kd], dtype="float")
|
||||
if bias:
|
||||
fan=1
|
||||
for i in self.weight.shape[1:]:
|
||||
fan *= i
|
||||
bound = 1 / math.sqrt(fan)
|
||||
self.bias = init.uniform([out_channels], dtype="float", low=-bound, high=bound)
|
||||
else:
|
||||
self.bias = None
|
||||
|
||||
def execute(self, x):
|
||||
return conv3d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
|
||||
|
||||
def conv2d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
|
||||
padding = _pair(padding)
|
||||
|
@ -694,7 +800,70 @@ def conv2d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
|
|||
if bias is not None:
|
||||
b = bias.broadcast(y.shape, [0,2,3])
|
||||
y = y + b
|
||||
return y
|
||||
return y
|
||||
|
||||
def conv3d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
|
||||
padding = _triple(padding)
|
||||
stride = _triple(stride)
|
||||
dilation = _triple(dilation)
|
||||
out_channels = weight.shape[0]
|
||||
|
||||
if jt.flags.use_cuda and jt.cudnn:
|
||||
y = jt.cudnn.ops.cudnn_conv3d(x, weight, *stride, *padding, *dilation, groups)
|
||||
elif groups == 1:
|
||||
N,C,D,H,W = x.shape
|
||||
Kd, Kh, Kw = weight.shape[-3:]
|
||||
od = (D+padding[0]*2-Kd*dilation[0]+dilation[0]-1)//stride[0]+1
|
||||
oh = (H+padding[1]*2-Kh*dilation[1]+dilation[1]-1)//stride[1]+1
|
||||
ow = (W+padding[2]*2-Kw*dilation[2]+dilation[2]-1)//stride[2]+1
|
||||
xx = x.reindex([N,out_channels,C,od,oh,ow,Kd,Kh,Kw], [
|
||||
'i0', # Nid
|
||||
'i2', # Cid
|
||||
f'i3*{stride[0]}-{padding[0]}+i6*{dilation[0]}', # Hid+Khid
|
||||
f'i4*{stride[1]}-{padding[1]}+i7*{dilation[1]}', # Wid+KWid
|
||||
f'i5*{stride[2]}-{padding[2]}+i8*{dilation[2]}', # Did+KDid
|
||||
])
|
||||
ww = weight.broadcast(xx.shape, [0,3,4,5])
|
||||
yy = xx*ww
|
||||
y = yy.sum([2,6,7,8]) # Kc, Kh, Kw,Kd
|
||||
else:
|
||||
N,C,D,H,W = x.shape
|
||||
Kd, Kh, Kw = weight.shape[-3:]
|
||||
G = groups
|
||||
CpG = C // G # channels per group
|
||||
oc = out_channels
|
||||
od = (D+padding[0]*2-Kd*dilation[0]+dilation[0]-1)//stride[0]+1
|
||||
oh = (H+padding[1]*2-Kh*dilation[1]+dilation[1]-1)//stride[1]+1
|
||||
ow = (W+padding[2]*2-Kw*dilation[2]+dilation[2]-1)//stride[2]+1
|
||||
xx = x.reindex([N,G,oc//G,CpG,od,oh,ow,Kd,Kh,Kw], [
|
||||
'i0', # Nid
|
||||
f'i1*{CpG}+i3', # Gid
|
||||
f'i4*{stride[0]}-{padding[0]}+i7*{dilation[0]}', # Hid+Khid
|
||||
f'i5*{stride[1]}-{padding[1]}+i8*{dilation[1]}', # Wid+KWid
|
||||
f'i6*{stride[2]}-{padding[2]}+i9*{dilation[2]}', # Did+KDid
|
||||
])
|
||||
xx.compile_options = {"G":G}
|
||||
# w: [oc, CpG, Kh, Kw, Kd]
|
||||
ww = weight.reindex([N, G, oc//G, CpG, oh, ow, od, Kh, Kw, Kd], [
|
||||
f'i1*{oc//G}+i2',
|
||||
'i3',
|
||||
'i7',
|
||||
'i8',
|
||||
'i9'
|
||||
])
|
||||
yy = xx*ww
|
||||
y = yy.reindex_reduce('add', [N, oc, od, oh, ow], [
|
||||
'i0',
|
||||
f'i1*{oc//G}+i2',
|
||||
'i4',
|
||||
'i5',
|
||||
'i6'
|
||||
])
|
||||
|
||||
if bias is not None:
|
||||
b = bias.broadcast(y.shape, [0,2,3,4])
|
||||
y = y + b
|
||||
return y
|
||||
|
||||
class ConvTranspose(Module):
|
||||
def __init__(self, in_channels, out_channels, kernel_size, stride=1, \
|
||||
|
@ -754,6 +923,45 @@ class ConvTranspose(Module):
|
|||
y = y + b
|
||||
return y
|
||||
|
||||
class ConvTranspose3d(Module):
|
||||
def __init__(self, in_channels, out_channels, kernel_size, stride=1, \
|
||||
padding=0, output_padding=0, groups=1, bias=True, dilation=1):
|
||||
self.in_channels = in_channels
|
||||
self.out_channels = out_channels
|
||||
|
||||
# added
|
||||
self.dilation = dilation
|
||||
self.group = groups
|
||||
assert groups==1, "Group conv not supported yet."
|
||||
|
||||
self.kernel_size = kernel_size if isinstance(kernel_size, tuple) else (kernel_size, kernel_size, kernel_size)
|
||||
self.stride = stride if isinstance(stride, tuple) else (stride, stride, stride)
|
||||
self.dilation = dilation if isinstance(dilation, tuple) else (dilation, dilation, dilation)
|
||||
# added
|
||||
self.padding = padding if isinstance(padding, tuple) else (padding, padding, padding)
|
||||
self.real_padding = (
|
||||
self.dilation[0] * (self.kernel_size[0] - 1) - self.padding[0],
|
||||
self.dilation[1] * (self.kernel_size[1] - 1) - self.padding[1],
|
||||
self.dilation[2] * (self.kernel_size[2] - 1) - self.padding[2])
|
||||
self.output_padding = output_padding if isinstance (output_padding, tuple) else (output_padding, output_padding, output_padding)
|
||||
assert self.output_padding[0] < max(self.stride[0], self.dilation[0]) and \
|
||||
self.output_padding[1] < max(self.stride[1], self.dilation[1]) and \
|
||||
self.output_padding[2] < max(self.stride[2], self.dilation[2]), \
|
||||
"output padding must be smaller than max(stride, dilation)"
|
||||
|
||||
self.weight = init.invariant_uniform((in_channels, out_channels) + self.kernel_size, dtype="float")
|
||||
if bias:
|
||||
fan=1
|
||||
for i in self.weight.shape[1:]:
|
||||
fan *= i
|
||||
bound = 1 / math.sqrt(fan)
|
||||
self.bias = init.uniform([out_channels], dtype="float", low=-bound, high=bound)
|
||||
else:
|
||||
self.bias = None
|
||||
|
||||
def execute(self, x):
|
||||
return conv_transpose3d(x, self.weight, self.bias, self.stride, self.padding, self.output_padding, self.group, self.dilation)
|
||||
|
||||
def conv_transpose(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1):
|
||||
x = input
|
||||
N,C,H,W = x.shape
|
||||
|
@ -792,6 +1000,49 @@ def conv_transpose(input, weight, bias=None, stride=1, padding=0, output_padding
|
|||
assert not bias, "Bias should be none or jittor var"
|
||||
return y
|
||||
|
||||
def conv_transpose3d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1):
|
||||
x = input
|
||||
N,C,D,H,W = x.shape
|
||||
i,o,d,h,w = weight.shape
|
||||
assert C==i
|
||||
assert groups==1, "Group conv not supported yet."
|
||||
stride = stride if isinstance(stride, tuple) else (stride, stride, stride)
|
||||
dilation = dilation if isinstance(dilation, tuple) else (dilation, dilation, dilation)
|
||||
# added
|
||||
padding = padding if isinstance(padding, tuple) else (padding, padding, padding)
|
||||
output_padding = output_padding if isinstance (output_padding, tuple) else (output_padding, output_padding, output_padding)
|
||||
assert output_padding[0] < max(stride[0], dilation[0]) and \
|
||||
output_padding[1] < max(stride[1], dilation[1]) and \
|
||||
output_padding[2] < max(stride[2], dilation[2]), \
|
||||
"output padding must be smaller than max(stride, dilation)"
|
||||
|
||||
stride_d, stride_h, stride_w = stride
|
||||
padding_d, padding_h, padding_w = padding
|
||||
dilation_d, dilation_h, dilation_w = dilation
|
||||
|
||||
d_out = (D-1) * stride_d + output_padding[0] - 2*padding_d + 1 + (d-1)*dilation_d
|
||||
h_out = (H-1) * stride_h + output_padding[1] - 2*padding_h + 1 + (h-1)*dilation_h
|
||||
w_out = (W-1) * stride_w + output_padding[2] - 2*padding_w + 1 + (w-1)*dilation_w
|
||||
out_shape = (N, o, d_out, h_out, w_out)
|
||||
if jt.flags.use_cuda and jt.cudnn:
|
||||
return jt.cudnn.ops.cudnn_conv3d_backward_x(weight, x, *out_shape[2:], *stride, *padding, *dilation, groups)
|
||||
shape = (N, i, o, D, H, W, d, h, w)
|
||||
xx = x.broadcast(shape, (2, 6, 7, 8)) # i,h,w
|
||||
ww = weight.broadcast(shape, (0, 3, 4, 5)) # N,H,W
|
||||
y = (ww*xx).reindex_reduce("add", out_shape, [
|
||||
'i0', # N
|
||||
'i2', # o
|
||||
f'i3*{stride_d}-{padding_d}+i6*{dilation_d}', # Did+Kdid
|
||||
f'i4*{stride_h}-{padding_h}+i7*{dilation_h}', # Hid+Khid
|
||||
f'i5*{stride_w}-{padding_w}+i8*{dilation_w}', # Wid+KWid
|
||||
])
|
||||
if isinstance(bias, jt.Var):
|
||||
b = bias.broadcast(y.shape, [0,2,3,4])
|
||||
y = y + b
|
||||
else:
|
||||
assert not bias, "Bias should be none or jittor var"
|
||||
return y
|
||||
|
||||
conv_transpose2d = conv_transpose
|
||||
|
||||
def pad(x,padding, mode='constant', value=0):
|
||||
|
@ -997,9 +1248,9 @@ def _bicubic(x, a, func):
|
|||
|
||||
def _interpolate(img, x, y, ids, mode):
|
||||
if mode == "nearest":
|
||||
return img.reindex([*ids, x.floor(), y.floor()])
|
||||
return img.reindex([*ids, x.floor_int(), y.floor_int()])
|
||||
if mode == "bilinear":
|
||||
fx, fy = x.floor(), y.floor()
|
||||
fx, fy = x.floor_int(), y.floor_int()
|
||||
cx, cy = fx + 1, fy + 1
|
||||
dx, dy = x - fx, y - fy
|
||||
a = img.reindex_var([*ids, fx, fy])
|
||||
|
@ -1013,7 +1264,7 @@ def _interpolate(img, x, y, ids, mode):
|
|||
return o
|
||||
if mode=="bicubic": # ugly ver.
|
||||
n,c,h,w = img.shape
|
||||
fx, fy = x.floor(), y.floor()
|
||||
fx, fy = x.floor_int(), y.floor_int()
|
||||
dix, diy = x - fx, y - fy
|
||||
ax, ay = _bicubic(dix+1,-0.75,2), _bicubic(diy+1,-0.75,2)
|
||||
bx, by = _bicubic(dix,-0.75,1), _bicubic(diy,-0.75,1)
|
||||
|
@ -1031,23 +1282,8 @@ def _interpolate(img, x, y, ids, mode):
|
|||
return o
|
||||
raise (f"Not support interpolation mode: {mode}")
|
||||
|
||||
|
||||
def resize(img, size, mode="nearest", align_corners=False):
|
||||
n, c, h, w = img.shape
|
||||
H, W = size
|
||||
nid, cid, hid, wid = jt.index((n, c, H, W))
|
||||
if align_corners:
|
||||
x = hid * ((h - 1) / max(1, H - 1))
|
||||
y = wid * ((w - 1) / max(1, W - 1))
|
||||
else:
|
||||
x = hid * (h / H) + (h / H * 0.5 - 0.5)
|
||||
if H > h: x = x.clamp(0, h - 1)
|
||||
y = wid * (w / W) + (w / W * 0.5 - 0.5)
|
||||
if W > w: y = y.clamp(0, w - 1)
|
||||
return _interpolate(img, x, y, (nid, cid), mode)
|
||||
|
||||
|
||||
def upsample(img, size, mode="nearest", align_corners=False):
|
||||
# TODO: tf_mode to another function
|
||||
def resize(img, size, mode="nearest", align_corners=False, tf_mode=False):
|
||||
n, c, h, w = img.shape
|
||||
H, W = size
|
||||
nid, cid, hid, wid = jt.index((n, c, H, W))
|
||||
|
@ -1061,22 +1297,30 @@ def upsample(img, size, mode="nearest", align_corners=False):
|
|||
x = hid * (h / H)
|
||||
y = wid * (w / W)
|
||||
else:
|
||||
x = hid * (h / H) + (h / H * 0.5 - 0.5)
|
||||
if H > h: x = x.clamp(0, h - 1)
|
||||
y = wid * (w / W) + (w / W * 0.5 - 0.5)
|
||||
if W > w: y = y.clamp(0, w - 1)
|
||||
if (tf_mode):
|
||||
x = hid * (h / H)
|
||||
if H > h: x = x.clamp(0, h - 1)
|
||||
y = wid * (w / W)
|
||||
if W > w: y = y.clamp(0, w - 1)
|
||||
else:
|
||||
x = hid * (h / H) + (h / H * 0.5 - 0.5)
|
||||
if H > h: x = x.clamp(0, h - 1)
|
||||
y = wid * (w / W) + (w / W * 0.5 - 0.5)
|
||||
if W > w: y = y.clamp(0, w - 1)
|
||||
return _interpolate(img, x, y, (nid, cid), mode)
|
||||
|
||||
upsample = resize
|
||||
|
||||
def interpolate(X, size=None, scale_factor=None, mode='bilinear', align_corners=False):
|
||||
|
||||
def interpolate(X, size=None, scale_factor=None, mode='bilinear', align_corners=False, tf_mode=False):
|
||||
if scale_factor is not None:
|
||||
size = [X.shape[-2] * scale_factor, X.shape[-1] * scale_factor]
|
||||
if isinstance(size, int):
|
||||
size = (size, size)
|
||||
if scale_factor is not None and scale_factor > 1:
|
||||
return upsample(X, size, mode, align_corners)
|
||||
return upsample(X, size, mode, align_corners, tf_mode)
|
||||
else:
|
||||
return resize(X, size, mode, align_corners)
|
||||
return resize(X, size, mode, align_corners, tf_mode)
|
||||
|
||||
|
||||
def grid_sample_v0(input, grid, mode='bilinear', padding_mode='zeros'):
|
||||
|
@ -1134,7 +1378,7 @@ def linspace_from_neg_one(grid,num_steps,align_corners):
|
|||
return jt.array(ra,dtype=grid.dtype)
|
||||
|
||||
def make_base_grid_4D(theta,N,C,H,W,align_corners):
|
||||
base_grid = jt.zeros((N, H, W, 3), dtype=theta.dtype);
|
||||
base_grid = jt.zeros((N, H, W, 3), dtype=theta.dtype)
|
||||
base_grid[...,0] = linspace_from_neg_one(theta, W, align_corners)
|
||||
base_grid[...,1] = jt.unsqueeze(linspace_from_neg_one(theta, H, align_corners),-1)
|
||||
base_grid[...,-1] = 1
|
||||
|
@ -1190,7 +1434,7 @@ def reflect_coordinates(x,twice_low,twice_high):
|
|||
x = (x - m).abs()
|
||||
#`fmod` returns same sign as `in`, which is positive after the `fabs` above.
|
||||
extra = x.mod(span)
|
||||
flips = (x / span).floor()
|
||||
flips = (x / span).floor_int()
|
||||
result1 = extra+m
|
||||
result2 = span-extra+m
|
||||
con = flips%2==0
|
||||
|
@ -1242,9 +1486,9 @@ def grid_sampler_3d(X,grid,mode,padding_mode,align_corners):
|
|||
zid = z.reindex(shape,['i0','i2','i3','i4'])
|
||||
|
||||
if mode=='nearest':
|
||||
return X.reindex([nid,cid,zid.round(),yid.round(),xid.round()])
|
||||
return X.reindex([nid,cid,zid.round_int(),yid.round_int(),xid.round_int()])
|
||||
elif mode=='bilinear':
|
||||
fx,fy,fz = xid.floor(),yid.floor(),zid.floor()
|
||||
fx,fy,fz = xid.floor_int(),yid.floor_int(),zid.floor_int()
|
||||
cx,cy,cz = fx+1,fy+1,fz+1
|
||||
dx,dy,dz = xid-fx,yid-fy,zid-fz
|
||||
dnx,dny,dnz = cx-xid,cy-yid,cz-zid
|
||||
|
@ -1279,10 +1523,10 @@ def grid_sampler_2d(X,grid,mode,padding_mode,align_corners):
|
|||
yid = y.reindex(shape,['i0','i2','i3'])
|
||||
|
||||
if mode=='nearest':
|
||||
return X.reindex([nid,cid,yid.round(),xid.round()])
|
||||
return X.reindex([nid,cid,yid.round_int(),xid.round_int()])
|
||||
elif mode=='bilinear':
|
||||
#xid,yid = (xid+0.00001),(yid+0.00001)
|
||||
fx,fy = (xid).floor(),(yid).floor()
|
||||
fx,fy = (xid).floor_int(),(yid).floor_int()
|
||||
cx,cy = fx+1,fy+1
|
||||
dx,dy = xid-fx,yid-fy
|
||||
dnx,dny = cx-xid,cy-yid
|
||||
|
@ -1385,6 +1629,87 @@ class Sequential(Module):
|
|||
return len(self.layers)
|
||||
|
||||
|
||||
class ParameterList(Module):
|
||||
def __init__(self, *args):
|
||||
self.params = collections.OrderedDict()
|
||||
for var in args:
|
||||
if isinstance(var, (collections.OrderedDict, dict)):
|
||||
for k, v in var.items():
|
||||
self.add_param(k, v)
|
||||
elif isinstance(var, list):
|
||||
for v in var:
|
||||
self.append(v)
|
||||
else:
|
||||
self.append(var)
|
||||
def __getitem__(self, idx):
|
||||
if idx not in self.params:
|
||||
return list(self.params.values())[idx]
|
||||
|
||||
return self.params[idx]
|
||||
def __iter__(self):
|
||||
return self.params.values().__iter__()
|
||||
def keys(self):
|
||||
return self.params.keys()
|
||||
def values(self):
|
||||
return self.params.values()
|
||||
def items(self):
|
||||
return self.params.items()
|
||||
def execute(self, x):
|
||||
raise NotImplementedError("Parameters is not executable")
|
||||
def append(self, var):
|
||||
assert isinstance(var, jt.Var), f"argument <{type(var)}> is not jittor var"
|
||||
self.params[len(self.params)] = var
|
||||
def add_param(self, name, var):
|
||||
assert isinstance(var, jt.Var), f"argument <{type(var)}> is not jittor var"
|
||||
self.params[name]=var
|
||||
def __setitem__(self, name, var):
|
||||
self.add_param(name, var)
|
||||
|
||||
def __len__(self):
|
||||
return len(self.params)
|
||||
|
||||
ParameterDict = ParameterList
|
||||
|
||||
def Parameter(data, requires_grad=True):
|
||||
''' The `Parameter` interface isn't needed in Jittor, this interface
|
||||
doesn't nothings and it is just used for compatible.
|
||||
|
||||
A Jittor Var is a Parameter
|
||||
when it is a member of Module, if you don't want a Jittor
|
||||
Var menber is treated as a Parameter, just name it startswith
|
||||
underscore `_`.
|
||||
'''
|
||||
LOG.w(Parameter.__doc__)
|
||||
data = data.clone()
|
||||
data.requires_grad = requires_grad
|
||||
return data
|
||||
|
||||
def backward(v, *args, **kw):
|
||||
''' The `backward` variable interface doesn't exist in Jittor.
|
||||
please use `optimizer.backward(loss)` or
|
||||
`optimizer.step(loss)` instead.
|
||||
For example, if your code looks like this::
|
||||
|
||||
optimizer.zero_grad()
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
|
||||
It can be changed to this::
|
||||
|
||||
optimizer.zero_grad()
|
||||
optimizer.backward(loss)
|
||||
optimizer.step()
|
||||
|
||||
Or more concise::
|
||||
|
||||
optimizer.step(loss)
|
||||
|
||||
The step function will automatically zero grad and backward.
|
||||
'''
|
||||
LOG.f(backward.__doc__)
|
||||
|
||||
jt.Var.backward = backward
|
||||
|
||||
def unfold(X, kernel_size, dilation=1, padding=0, stride=1):
|
||||
assert X.ndim == 4
|
||||
if not isinstance(kernel_size, tuple):
|
||||
|
@ -1900,3 +2225,36 @@ class GRU(RNNBase):
|
|||
h = (1 - z) * n + z * hidden
|
||||
|
||||
return h, h
|
||||
|
||||
def bilinear(in1, in2, weight, bias):
|
||||
w = weight.transpose((1,0,2))
|
||||
w = w.reshape((w.shape[0], -1))
|
||||
x = jt.matmul(in1, w)
|
||||
x = x.reshape(x.shape[:-1]+[weight.shape[0], weight.shape[2]])
|
||||
y = in2.broadcast(x, (-2,))
|
||||
z = (x*y).sum(-1)
|
||||
if bias is not None:
|
||||
z += bias
|
||||
return z
|
||||
|
||||
|
||||
class Bilinear(Module):
|
||||
''' bilinear transformation $out = in1^T W in2 + bias$, Example::
|
||||
|
||||
m = nn.Bilinear(20, 30, 40)
|
||||
input1 = jt.randn(128, 20)
|
||||
input2 = jt.randn(128, 30)
|
||||
output = m(input1, input2)
|
||||
print(output.shape)
|
||||
# [128, 40]
|
||||
|
||||
'''
|
||||
def __init__(self, in1_features, in2_features, out_features, bias=True, dtype="float32"):
|
||||
bound = 1 / math.sqrt(in1_features)
|
||||
self.weight = jt.init.uniform([out_features, in1_features, in2_features], dtype, -bound, bound)
|
||||
self.bias = bias
|
||||
if bias:
|
||||
self.bias = jt.init.uniform([out_features], dtype, -bound, bound)
|
||||
|
||||
def execute(self, in1, in2):
|
||||
return bilinear(in1, in2, self.weight, self.bias)
|
||||
|
|
|
@ -0,0 +1,11 @@
|
|||
# 计图零基础入门教程(60分钟)
|
||||
|
||||
```
|
||||
git clone https://github.com/Jittor/LearnJittorBasicIn60Min.git
|
||||
cd LearnJittorBasicIn60Min
|
||||
jupyter notebook
|
||||
```
|
||||
|
||||
在线浏览地址:<https://nbviewer.jupyter.org/github/Jittor/LearnJittorBasicIn60Min/tree/master/>
|
||||
|
||||
特别感谢教程作者:llt
|
Binary file not shown.
After Width: | Height: | Size: 237 KiB |
|
@ -0,0 +1,138 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 计图入门教程 0 --- 介绍与安装"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"**计图 (Jittor)** 是一个以 Python 为前端语言的深度学习框架,它 \n",
|
||||
"* 效率高:可作为 NumPy,PyTorch 的替代品,可以使用 GPU 等其他加速器进行高效的数据运算。除此之外,计图还拥有多个创新点,旨在大幅提升其运算效率;\n",
|
||||
"* 易使用:是一个用于实现神经网络的自动微分库,并集成了大量有关深度学习的函数库,方便您快速开展开发任务。\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"**通过本教程,您将**\n",
|
||||
"* 学习并理解计图中基本类型的一般操作;\n",
|
||||
"* 了解神经网络的一些基本概念,并学会如何利用计图进行神经网络的训练;\n",
|
||||
"* 解决一个机器学习的经典实战问题。\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**本教程的适用群体:** \n",
|
||||
"我们的目标是,只要您会 Python 编程,即可通过本教程学习并掌握如何使用计图进行深度学习的开发。不用担心,本教程几乎对所有的关键代码都加以注释说明。只要您耐心跟着本教程一步步学习,便一定能有所斩获。"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"现在,请您开启计图快速入门之旅。"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## 安装\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Jittor框架对环境要求如下:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"* 操作系统: **Ubuntu** >= 16.04 或 **Windows Subsystem of Linux(WSL)**\n",
|
||||
"* Python:版本 >= 3.7\n",
|
||||
"* C++编译器 (需要下列至少一个)\n",
|
||||
" - g++ (>=5.4.0)\n",
|
||||
" - clang (>=8.0)\n",
|
||||
"* GPU 编译器(可选):nvcc >=10.0\n",
|
||||
"* GPU 加速库(可选):cudnn-dev (cudnn开发版, 推荐使用tar安装方法,[参考链接](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar))\n",
|
||||
"\n",
|
||||
"如果您不希望手动配置环境,我们推荐使用 Docker 进行安装。\n",
|
||||
"除此之外,您还可以使用 pip 安装和手动安装。\n",
|
||||
"\n",
|
||||
"注意:目前Jittor通过WSL的方式在Windows操作系统上运行,WSL的安装方法请参考[微软官网](https://docs.microsoft.com/en-us/windows/wsl/install-win10),WSL版本目前尚不支持CUDA。\n",
|
||||
"\n",
|
||||
"Jittor 提供了三种安装方法:docker,pip和手动安装:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Docker 安装\n",
|
||||
"\n",
|
||||
"我们提供了Docker安装方式,免去您配置环境,Docker安装方法如下:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"# CPU only(Linux)\n",
|
||||
"docker run -it --network host jittor/jittor\n",
|
||||
"# CPU and CUDA(Linux)\n",
|
||||
"docker run -it --network host --gpus all jittor/jittor-cuda\n",
|
||||
"# CPU only(Mac and Windows)\n",
|
||||
"docker run -it -p 8888:8888 jittor/jittor\n",
|
||||
"# Upgrade jittor docker image\n",
|
||||
"docker pull jittor/jittor\n",
|
||||
"docker pull jittor/jittor-cuda\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"关于Docker安装的详细教程,可以参考[Windows/Mac/Linux通过Docker安装计图](https://cg.cs.tsinghua.edu.cn/jittor/tutorial/2020-5-15-00-00-docker/)\n",
|
||||
"\n",
|
||||
"## Pip 安装\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"如果您没有准备好环境,或者使用的不是Ubuntu操作系统, 推荐使用**docker安装**, 如果您已经装好编译器和对应版本的Python,我们强烈推荐您使用这种方法\n",
|
||||
"(如果无法访问github, 可以通过jittor主页下载):\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"sudo apt install python3.7-dev libomp-dev\n",
|
||||
"python3.7 -m pip install jittor\n",
|
||||
"# or install from github(latest version)\n",
|
||||
"# python3.7 -m pip install git+https://github.com/Jittor/jittor.git\n",
|
||||
"python3.7 -m jittor.test.test_example\n",
|
||||
"\n",
|
||||
"# Upgrade jittor from pip\n",
|
||||
"python3.7 -m pip install jittor -U\n",
|
||||
"# Upgrade jittor from github\n",
|
||||
"python3.7 -m pip install git+https://github.com/Jittor/jittor.git -U\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"如果测试运行通过,恭喜你已经安装完成.\n",
|
||||
"jittor会自动在路径中寻找合适的编译器, 如果您希望手动指定编译器, 请使用环境变量 `cc_path` 和 `nvcc_path`(可选).\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
|
@ -37,6 +37,7 @@ class Optimizer(object):
|
|||
# __zero_grad is a value for fast determ the grad is zero or not
|
||||
# so we can omit 0+x
|
||||
self.__zero_grad = True
|
||||
self._grad_map = {}
|
||||
|
||||
def add_param_group(self, group):
|
||||
self.param_groups.append(group)
|
||||
|
@ -77,7 +78,7 @@ class Optimizer(object):
|
|||
for pg in self.param_groups:
|
||||
for p, g in zip(pg["params"], pg["grads"]):
|
||||
if p.is_stop_grad(): continue
|
||||
g *= clip_coef
|
||||
g.update(g*clip_coef)
|
||||
|
||||
|
||||
@property
|
||||
|
@ -86,6 +87,14 @@ class Optimizer(object):
|
|||
return { k:v for k, v in self.__dict__.items()
|
||||
if k[0] != '_' and k not in exclude and not callable(v) }
|
||||
|
||||
def state_dict(self):
|
||||
state = {"defaults": self.defaults}
|
||||
return state
|
||||
|
||||
def load_state_dict(self, state):
|
||||
for k,v in state["defaults"].items():
|
||||
setattr(self, k, v)
|
||||
|
||||
def zero_grad(self):
|
||||
self.__zero_grad = True
|
||||
|
||||
|
@ -181,6 +190,35 @@ class Optimizer(object):
|
|||
p.update(p - g * lr)
|
||||
self.zero_grad()
|
||||
|
||||
def _build_grad_map(self):
|
||||
_grad_map = {}
|
||||
for pg in self.param_groups:
|
||||
for p, g in zip(pg["params"], pg["grads"]):
|
||||
_grad_map[id(p)] = g
|
||||
self._grad_map = _grad_map
|
||||
|
||||
def find_grad(self, v:jt.Var) -> jt.Var:
|
||||
if id(v) not in self._grad_map:
|
||||
self._build_grad_map()
|
||||
if id(v) not in self._grad_map:
|
||||
raise RuntimeError("This variable is not managed by this optimizer")
|
||||
return self._grad_map[id(v)]
|
||||
|
||||
def opt_grad(v:jt.Var, opt:Optimizer):
|
||||
''' Get grad of certain variable in optimizer, Example::
|
||||
|
||||
|
||||
model = Model()
|
||||
optimizer = SGD(model.parameters())
|
||||
...
|
||||
optimizer.backward(loss)
|
||||
|
||||
for p in model.parameters():
|
||||
grad = p.opt_grad(optimizer)
|
||||
'''
|
||||
return opt.find_grad(v)
|
||||
|
||||
jt.Var.opt_grad = opt_grad
|
||||
|
||||
class SGD(Optimizer):
|
||||
""" SGD Optimizer.
|
||||
|
@ -325,6 +363,60 @@ class Adam(Optimizer):
|
|||
self.zero_grad()
|
||||
|
||||
|
||||
class AdamW(Optimizer):
|
||||
""" AdamW Optimizer.
|
||||
|
||||
Example::
|
||||
|
||||
optimizer = nn.AdamW(model.parameters(), lr, eps=1e-8, betas=(0.9, 0.999))
|
||||
optimizer.step(loss)
|
||||
"""
|
||||
def __init__(self, params, lr, eps=1e-8, betas=(0.9, 0.999), weight_decay=0):
|
||||
super().__init__(params, lr)
|
||||
self.eps = eps
|
||||
self.betas = betas
|
||||
self.weight_decay = weight_decay
|
||||
# assert weight_decay==0, "weight_decay is not supported yet"
|
||||
|
||||
# initialize required arguments for each param_groups
|
||||
for pg in self.param_groups:
|
||||
values = pg["values"] = []
|
||||
m = pg["m"] = []
|
||||
for p in pg["params"]:
|
||||
values.append(jt.zeros(p.shape, p.dtype).stop_grad())
|
||||
m.append(jt.zeros(p.shape, p.dtype).stop_grad())
|
||||
|
||||
def add_param_group(self, group):
|
||||
values = group["values"] = []
|
||||
m = group["m"] = []
|
||||
for p in group["params"]:
|
||||
values.append(jt.zeros(p.shape, p.dtype).stop_grad())
|
||||
m.append(jt.zeros(p.shape, p.dtype).stop_grad())
|
||||
self.param_groups.append(group)
|
||||
|
||||
def step(self, loss=None):
|
||||
if loss is not None:
|
||||
self.pre_step(loss)
|
||||
n = float(self.n_step)
|
||||
for pg in self.param_groups:
|
||||
# get arguments from each param_groups
|
||||
lr = pg.get("lr", self.lr)
|
||||
eps = pg.get("eps", self.eps)
|
||||
weight_decay = pg.get("weight_decay", self.weight_decay)
|
||||
b0, b1 = pg.get("betas", self.betas)
|
||||
for p, g, v, m in zip(pg["params"], pg["grads"], pg["values"], pg["m"]):
|
||||
if p.is_stop_grad(): continue
|
||||
p.update(p * (1 - lr * weight_decay))
|
||||
bias_correction1 = 1 - b0 ** n
|
||||
bias_correction2 = 1 - b1 ** n
|
||||
m.update(b0 * m + (1-b0) * g) #exp_avg
|
||||
v.update(b1 * v + (1-b1) * g * g) #exp_avg_sq
|
||||
denom = jt.sqrt(v) / jt.sqrt(bias_correction2) + eps
|
||||
step_size = lr / bias_correction1
|
||||
p.update(p - step_size * m / denom)
|
||||
self.zero_grad()
|
||||
|
||||
|
||||
class LRScheduler:
|
||||
def __init__(self,optimizer, last_epoch=-1):
|
||||
assert isinstance(optimizer,Optimizer)
|
||||
|
|
|
@ -73,7 +73,7 @@ class Pool(Module):
|
|||
for (int q = k3; q < k3_; ++q)
|
||||
if (out_value < @in0(i0, i1, p, q)) {{
|
||||
out_value = @in0(i0, i1, p, q);
|
||||
out_index = (p - k2) * {self.kernel_size[0]} + (q - k3);
|
||||
out_index = p * in0_shape3 + q;
|
||||
}}
|
||||
@out(i0, i1, i2, i3) = out_value;
|
||||
@out1(i0, i1, i2, i3) = out_index;
|
||||
|
@ -99,7 +99,7 @@ class Pool(Module):
|
|||
'''
|
||||
if self.return_indices:
|
||||
return_shapes = [[N,C,h,w]] * 2
|
||||
return_dtypes = [x.dtype, 'uint8']
|
||||
return_dtypes = [x.dtype, 'int32']
|
||||
else:
|
||||
return_shapes = [N,C,h,w]
|
||||
return_dtypes = x.dtype
|
||||
|
@ -184,6 +184,204 @@ class Pool(Module):
|
|||
])
|
||||
return xx.reduce(self.op, [4,5])
|
||||
|
||||
def _triple(x):
|
||||
if isinstance(x, tuple):
|
||||
assert len(x) == 3
|
||||
return x
|
||||
else:
|
||||
return (x,x,x)
|
||||
|
||||
|
||||
class Pool3d(Module):
|
||||
def __init__(self, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False, count_include_pad=True, op="maximum"):
|
||||
assert dilation == None
|
||||
assert return_indices == None or op == "maximum"
|
||||
self.return_indices = return_indices
|
||||
self.kernel_size = _triple(kernel_size)
|
||||
self.op = op
|
||||
stride = stride if stride else kernel_size
|
||||
self.stride = _triple(stride)
|
||||
self.padding = _triple(padding)
|
||||
self.ceil_mode = ceil_mode
|
||||
self.count_include_pad = count_include_pad and padding != 0
|
||||
|
||||
def execute(self, x):
|
||||
N,C,D,H,W = x.shape
|
||||
if self.ceil_mode == False:
|
||||
d = (D+self.padding[0]*2-self.kernel_size[0])//self.stride[0]+1
|
||||
h = (H+self.padding[1]*2-self.kernel_size[1])//self.stride[1]+1
|
||||
w = (W+self.padding[2]*2-self.kernel_size[2])//self.stride[2]+1
|
||||
use_code_op = self.op in ['maximum', 'minimum']
|
||||
# some second order avg_pool is require, so we don't use code op here
|
||||
else:
|
||||
d = (D+self.padding[0]*2-self.kernel_size[0] + self.stride[0] - 1)//self.stride[0]+1
|
||||
h = (H+self.padding[1]*2-self.kernel_size[1] + self.stride[1] - 1)//self.stride[1]+1
|
||||
w = (W+self.padding[2]*2-self.kernel_size[2] + self.stride[2] - 1)//self.stride[2]+1
|
||||
use_code_op = self.op in ['maximum', 'minimum', 'mean']
|
||||
|
||||
if use_code_op:
|
||||
if self.op == 'mean':
|
||||
if self.count_include_pad:
|
||||
count = f"int count = {self.kernel_size[0]*self.kernel_size[1]*self.kernel_size[2]};"
|
||||
else:
|
||||
count = "int count = (k2_ - k2) * (k3_ - k3) * (k4_ - k4);"
|
||||
count += "float32 rcount = 1.0f / count;"
|
||||
else:
|
||||
count = ""
|
||||
forward_body = f'''
|
||||
int k4 = i4*{self.stride[2]}-{self.padding[2]};
|
||||
int k3 = i3*{self.stride[1]}-{self.padding[1]};
|
||||
int k2 = i2*{self.stride[0]}-{self.padding[0]};
|
||||
int k4_ = min(k4 + {self.kernel_size[2]}, in0_shape4);
|
||||
int k3_ = min(k3 + {self.kernel_size[1]}, in0_shape3);
|
||||
int k2_ = min(k2 + {self.kernel_size[0]}, in0_shape2);
|
||||
k4 = max(0, k4);
|
||||
k3 = max(0, k3);
|
||||
k2 = max(0, k2);
|
||||
{count}
|
||||
'''
|
||||
if not self.return_indices:
|
||||
forward_body += f'''
|
||||
@out(i0, i1, i2, i3, i4) = init_{self.op}(out_type);
|
||||
for (int p = k2; p < k2_; ++p)
|
||||
for (int q = k3; q < k3_; ++q)
|
||||
for (int r = k4; r < k4_; ++r)
|
||||
@out(i0, i1, i2, i3, i4) = {self.op}(out_type, @out(i0, i1, i2, i3, i4), @in0(i0, i1, p, q, r));
|
||||
'''
|
||||
else:
|
||||
forward_body += f'''
|
||||
auto out_value = init_{self.op}(out_type);
|
||||
int out_index = -1;
|
||||
for (int p = k2; p < k2_; ++p)
|
||||
for (int q = k3; q < k3_; ++q)
|
||||
for (int r = k4; q < k4_; ++r)
|
||||
if (out_value < @in0(i0, i1, p, q, r)) {{
|
||||
out_value = @in0(i0, i1, p, q, r);
|
||||
out_index = p * in0_shape3 * in0_shape4 + q * in0_shape4 + r;
|
||||
}}
|
||||
@out(i0, i1, i2, i3, i4) = out_value;
|
||||
@out1(i0, i1, i2, i3, i4) = out_index;
|
||||
'''
|
||||
backward_body = f'''
|
||||
int k4 = i4*{self.stride[2]}-{self.padding[2]};
|
||||
int k3 = i3*{self.stride[1]}-{self.padding[1]};
|
||||
int k2 = i2*{self.stride[0]}-{self.padding[0]};
|
||||
int k4_ = min(k4 + {self.kernel_size[2]}, in0_shape4);
|
||||
int k3_ = min(k3 + {self.kernel_size[1]}, in0_shape3);
|
||||
int k2_ = min(k2 + {self.kernel_size[0]}, in0_shape2);
|
||||
k4 = max(0, k4);
|
||||
k3 = max(0, k3);
|
||||
k2 = max(0, k2);
|
||||
{count}
|
||||
int bo=1;
|
||||
for (int p = k2; p < k2_ && bo; ++p)
|
||||
for (int q = k3; q < k3_ && bo; ++q)
|
||||
for (int r = k4; r < k4_ && bo; ++r) {{
|
||||
{"atomicAdd(&@out(i0,i1,p,q,r), @dout(i0,i1,i2,i3,i4)/count);"
|
||||
if self.op == "mean" else
|
||||
f"""if (@pout(i0,i1,i2,i3,i4) == @in0(i0,i1,p,q,r)) {{
|
||||
atomicAdd(&@out(i0,i1,p,q,r), @dout(i0,i1,i2,i3,i4)),
|
||||
bo=0;
|
||||
}}"""}
|
||||
}}
|
||||
'''
|
||||
if self.return_indices:
|
||||
return_shapes = [[N,C,d,h,w]] * 2
|
||||
return_dtypes = [x.dtype, 'int32']
|
||||
else:
|
||||
return_shapes = [N,C,d,h,w]
|
||||
return_dtypes = x.dtype
|
||||
out = jt.code(return_shapes, return_dtypes, [x],
|
||||
cuda_header="""
|
||||
#include <ops/binary_op_defs.h>
|
||||
#include <misc/cuda_limits.h>
|
||||
""",
|
||||
cuda_src=f'''
|
||||
__global__ static void kernel1(@ARGS_DEF) {{
|
||||
@PRECALC
|
||||
int p4 = threadIdx.x;
|
||||
int s4 = blockDim.x;
|
||||
int p3 = threadIdx.y;
|
||||
int s3 = blockDim.y;
|
||||
int p2 = threadIdx.z + blockIdx.x * blockDim.z;
|
||||
int s2 = blockDim.z * gridDim.x;
|
||||
int i1 = blockIdx.y;
|
||||
int i0 = blockIdx.z;
|
||||
for (int i4 = p4; i4 < out_shape4; i4 += s4)
|
||||
for (int i3 = p3; i3 < out_shape3; i3 += s3)
|
||||
for (int i2 = p2; i2 < out_shape2; i2 += s2)
|
||||
{{ {forward_body} }}
|
||||
}}
|
||||
int tx = min(1024, out_shape4);
|
||||
int ty = min(1024 / tx, out_shape3);
|
||||
int tz = min(1024 / tx / ty, out_shape2);
|
||||
int bx = (out_shape2 - 1) / tz + 1;
|
||||
int by = out_shape1;
|
||||
int bz = out_shape0;
|
||||
dim3 s1(bx, by, bz);
|
||||
dim3 s2(tx, ty, tz);
|
||||
kernel1<<<s1, s2>>>(@ARGS);
|
||||
''',
|
||||
cuda_grad_src=[f'''
|
||||
__global__ static void kernel3(@ARGS_DEF) {{
|
||||
@PRECALC
|
||||
int p4 = threadIdx.x;
|
||||
int s4 = blockDim.x;
|
||||
int p3 = threadIdx.y;
|
||||
int s3 = blockDim.y;
|
||||
int p2 = threadIdx.z + blockIdx.x * blockDim.z;
|
||||
int s2 = blockDim.z * gridDim.x;
|
||||
int i1 = blockIdx.y;
|
||||
int i0 = blockIdx.z;
|
||||
for (int i4 = p4; i4 < out_shape4; i4 += s4)
|
||||
for (int i3 = p3; i3 < out_shape3; i3 += s3)
|
||||
for (int i2 = p2; i2 < out_shape2; i2 += s2)
|
||||
{{ {backward_body} }}
|
||||
}}
|
||||
cudaMemsetAsync(out_p, 0, out->size);
|
||||
int tx = min(1024, pout_shape4);
|
||||
int ty = min(1024 / tx, pout_shape3);
|
||||
int tz = min(1024 / tx / ty, pout_shape2);
|
||||
int bx = (pout_shape2 - 1) / tz + 1;
|
||||
int by = pout_shape1;
|
||||
int bz = pout_shape0;
|
||||
dim3 s1(bx, by, bz);
|
||||
dim3 s2(tx, ty, tz);
|
||||
kernel3<<<s1, s2>>>(@ARGS);
|
||||
'''],
|
||||
cpu_header='#include <ops/binary_op_defs.h>',
|
||||
cpu_src=f'''
|
||||
using namespace std;
|
||||
for (int i0=0; i0<out_shape0; i0++)
|
||||
for (int i1=0; i1<out_shape1; i1++)
|
||||
for (int i2=0; i2<out_shape2; i2++)
|
||||
for (int i3=0; i3<out_shape3; i3++)
|
||||
for (int i4=0; i4<out_shape4; i4++)
|
||||
{{ {forward_body} }}
|
||||
''',
|
||||
cpu_grad_src = [f'''
|
||||
using namespace std;
|
||||
std::memset(out_p, 0, out->size);
|
||||
#define atomicAdd(a,b) (*a) += b
|
||||
|
||||
for (int i0=0; i0<pout_shape0; i0++)
|
||||
for (int i1=0; i1<pout_shape1; i1++)
|
||||
for (int i2=0; i2<pout_shape2; i2++)
|
||||
for (int i3=0; i3<pout_shape3; i3++)
|
||||
for (int i4=0; i4<pout_shape4; i4++)
|
||||
{{ {backward_body} }}
|
||||
'''])
|
||||
return out
|
||||
else:
|
||||
# TODO: backward
|
||||
xx = x.reindex([N,C,d,h,w,self.kernel_size[0],self.kernel_size[1],self.kernel_size[2]], [
|
||||
"i0", # Nid
|
||||
"i1", # Cid
|
||||
f"i2*{self.stride[0]}-{self.padding[0]}+i5", # Did
|
||||
f"i3*{self.stride[1]}-{self.padding[1]}+i6", # Hid
|
||||
f"i4*{self.stride[2]}-{self.padding[2]}+i7", # Hid
|
||||
])
|
||||
return xx.reduce(self.op, [5,6,7])
|
||||
|
||||
class AdaptiveAvgPool2d(Module):
|
||||
def __init__(self, output_size):
|
||||
|
@ -216,8 +414,9 @@ class AdaptiveAvgPool2d(Module):
|
|||
return xx.reduce("mean", [4,5])
|
||||
|
||||
class AdaptiveMaxPool2d(Module):
|
||||
def __init__(self, output_size):
|
||||
def __init__(self, output_size, return_indices=False):
|
||||
self.output_size = output_size
|
||||
self.return_indices = return_indices
|
||||
|
||||
def execute(self, x):
|
||||
if isinstance(self.output_size, int):
|
||||
|
@ -235,6 +434,10 @@ class AdaptiveMaxPool2d(Module):
|
|||
self.sw = math.floor(W / ow)
|
||||
self.ksh = H - (oh - 1) * self.sh
|
||||
self.ksw = W - (ow - 1) * self.sw
|
||||
if self.return_indices:
|
||||
return MaxPool2d(
|
||||
kernel_size=(self.ksh, self.ksw),
|
||||
stride=(self.sh, self.sw), return_indices=True)(x)
|
||||
h = (H-self.ksh)//self.sh+1
|
||||
w = (W-self.ksw)//self.sw+1
|
||||
xx = x.reindex([N,C,h,w,self.ksh,self.ksw], [
|
||||
|
@ -245,9 +448,74 @@ class AdaptiveMaxPool2d(Module):
|
|||
])
|
||||
return xx.reduce("maximum", [4,5])
|
||||
|
||||
|
||||
class AdaptiveAvgPool3d(Module):
|
||||
def __init__(self, output_size):
|
||||
self.output_size = _triple(output_size)
|
||||
|
||||
def execute(self, x):
|
||||
od, oh, ow = self.output_size
|
||||
if od == 1 and oh == 1 and ow == 1:
|
||||
return x.reduce("mean", [2,3,4], keepdims=True)
|
||||
N,C,D,H,W = x.shape
|
||||
self.sd = math.floor(D / od)
|
||||
self.sh = math.floor(H / oh)
|
||||
self.sw = math.floor(W / ow)
|
||||
self.ksd = D - (od - 1) * self.sd
|
||||
self.ksh = H - (oh - 1) * self.sh
|
||||
self.ksw = W - (ow - 1) * self.sw
|
||||
d = (D-self.ksd)//self.sd+1
|
||||
h = (H-self.ksh)//self.sh+1
|
||||
w = (W-self.ksw)//self.sw+1
|
||||
xx = x.reindex([N,C,d,h,w,self.ksd,self.ksh,self.ksw], [
|
||||
"i0", # Nid
|
||||
"i1", # Cid
|
||||
f"i2*{self.sd}+i5", # Did
|
||||
f"i3*{self.sh}+i6", # Hid
|
||||
f"i4*{self.sw}+i7", # Wid
|
||||
])
|
||||
return xx.reduce("mean", [5,6,7])
|
||||
|
||||
class AdaptiveMaxPool3d(Module):
|
||||
def __init__(self, output_size, return_indices=False):
|
||||
self.output_size = _triple(output_size)
|
||||
self.return_indices = return_indices
|
||||
|
||||
def execute(self, x):
|
||||
od, oh, ow = self.output_size
|
||||
if od == 1 and oh == 1 and ow == 1 and not self.return_indices:
|
||||
return x.reduce("maximum", [2,3,4], keepdims=True)
|
||||
N,C,D,H,W = x.shape
|
||||
self.sd = math.floor(D / od)
|
||||
self.sh = math.floor(H / oh)
|
||||
self.sw = math.floor(W / ow)
|
||||
self.ksd = D - (od - 1) * self.sd
|
||||
self.ksh = H - (oh - 1) * self.sh
|
||||
self.ksw = W - (ow - 1) * self.sw
|
||||
if self.return_indices:
|
||||
return MaxPool3d(
|
||||
kernel_size=(self.ksd, self.ksh, self.ksw),
|
||||
stride=(self.sd, self.sh, self.sw), return_indices=True)(x)
|
||||
d = (D-self.ksd)//self.sd+1
|
||||
h = (H-self.ksh)//self.sh+1
|
||||
w = (W-self.ksw)//self.sw+1
|
||||
xx = x.reindex([N,C,d,h,w,self.ksd,self.ksh,self.ksw], [
|
||||
"i0", # Nid
|
||||
"i1", # Cid
|
||||
f"i2*{self.sd}+i5", # Did
|
||||
f"i3*{self.sh}+i6", # Hid
|
||||
f"i4*{self.sw}+i7", # Wid
|
||||
])
|
||||
return xx.reduce("maximun", [5,6,7])
|
||||
|
||||
def pool(x, kernel_size, op, padding=0, stride=None):
|
||||
return Pool(kernel_size, stride, padding, op=op)(x)
|
||||
|
||||
pool2d = pool
|
||||
|
||||
def pool3d(x, kernel_size, op, padding=0, stride=None):
|
||||
return Pool3d(kernel_size, stride, padding, op=op)(x)
|
||||
|
||||
class AvgPool2d(Module):
|
||||
def __init__(self, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True):
|
||||
self.layer = Pool(kernel_size=kernel_size, stride=stride, padding=padding, ceil_mode=ceil_mode, count_include_pad=count_include_pad, op="mean")
|
||||
|
@ -255,6 +523,13 @@ class AvgPool2d(Module):
|
|||
def execute(self, x):
|
||||
return self.layer(x)
|
||||
|
||||
class AvgPool3d(Module):
|
||||
def __init__(self, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True):
|
||||
self.layer = Pool3d(kernel_size=kernel_size, stride=stride, padding=padding, ceil_mode=ceil_mode, count_include_pad=count_include_pad, op="mean")
|
||||
|
||||
def execute(self, x):
|
||||
return self.layer(x)
|
||||
|
||||
def avg_pool2d(x, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True):
|
||||
return AvgPool2d(kernel_size, stride, padding, ceil_mode, count_include_pad)(x)
|
||||
|
||||
|
@ -265,54 +540,115 @@ class MaxPool2d(Module):
|
|||
def execute(self, x):
|
||||
return self._layer(x)
|
||||
|
||||
class MaxPool3d(Module):
|
||||
def __init__(self, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False):
|
||||
self._layer = Pool3d(kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, return_indices=return_indices, ceil_mode=ceil_mode, op="maximum")
|
||||
|
||||
def execute(self, x):
|
||||
return self._layer(x)
|
||||
|
||||
def max_pool2d(x, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False):
|
||||
return MaxPool2d(kernel_size, stride, padding, dilation, return_indices, ceil_mode)(x)
|
||||
|
||||
|
||||
def max_pool3d(x, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False):
|
||||
return MaxPool3d(kernel_size, stride, padding, dilation, return_indices, ceil_mode)(x)
|
||||
|
||||
class MaxUnpool2d(Module):
|
||||
''' MaxUnpool2d is the invert version of MaxPool2d with indices.
|
||||
It takes the output index of MaxPool2d as input.
|
||||
The element will be zero if it is not the max pooled value.
|
||||
|
||||
Example::
|
||||
|
||||
>>> import jittor as jt
|
||||
>>> from jittor import nn
|
||||
|
||||
>>> pool = nn.MaxPool2d(2, stride=2, return_indices=True)
|
||||
>>> unpool = nn.MaxUnpool2d(2, stride=2)
|
||||
>>> input = jt.array([[[[ 1., 2, 3, 4,0],
|
||||
[ 5, 6, 7, 8,0],
|
||||
[ 9, 10, 11, 12,0],
|
||||
[13, 14, 15, 16,0],
|
||||
[0, 0, 0, 0, 0]]]])
|
||||
>>> output, indices = pool(input)
|
||||
>>> unpool(output, indices, output_size=input.shape)
|
||||
jt.array([[[[ 0., 0., 0., 0., 0.],
|
||||
[ 0., 6., 0., 8., 0.],
|
||||
[ 0., 0., 0., 0., 0.],
|
||||
[ 0., 14., 0., 16., 0.],
|
||||
[ 0., 0., 0., 0., 0.]]]])
|
||||
'''
|
||||
def __init__(self, kernel_size, stride=None):
|
||||
''' MaxUnpool2d is the invert version of MaxPool2d with indices.
|
||||
It takes the output index of MaxPool2d as input.
|
||||
The element will be zero if it is not the max pooled value.
|
||||
|
||||
Example::
|
||||
|
||||
>>> import jittor as jt
|
||||
>>> from jittor import nn
|
||||
|
||||
>>> pool = nn.MaxPool2d(2, stride=2, return_indices=True)
|
||||
>>> unpool = nn.MaxUnpool2d(2, stride=2)
|
||||
>>> input = jt.array([[[[ 1., 2, 3, 4,0],
|
||||
[ 5, 6, 7, 8,0],
|
||||
[ 9, 10, 11, 12,0],
|
||||
[13, 14, 15, 16,0],
|
||||
[0, 0, 0, 0, 0]]]])
|
||||
>>> output, indices = pool(input)
|
||||
>>> unpool(output, indices, output_size=input.shape)
|
||||
jt.array([[[[ 0., 0., 0., 0., 0.],
|
||||
[ 0., 6., 0., 8., 0.],
|
||||
[ 0., 0., 0., 0., 0.],
|
||||
[ 0., 14., 0., 16., 0.],
|
||||
[ 0., 0., 0., 0., 0.]]]])
|
||||
'''
|
||||
if isinstance(kernel_size, int):
|
||||
kernel_size = (kernel_size, kernel_size)
|
||||
if isinstance(stride, int):
|
||||
stride = (stride, stride)
|
||||
if stride is None: stride = kernel_size
|
||||
assert stride == kernel_size, "Different stride and kernel is not supported yet."
|
||||
self.kernel_size = kernel_size
|
||||
self.stride = stride
|
||||
|
||||
def execute(self, x, id, output_size=None):
|
||||
b, c, ph, pw = x.shape
|
||||
kh, kw = self.kernel_size
|
||||
sh, sw = self.stride
|
||||
if output_size:
|
||||
h, w = output_size[-2:]
|
||||
else:
|
||||
h, w = ph * kh, pw * kw
|
||||
x = x.reindex(shape=[b, c, h, w],
|
||||
indexes=['i0', 'i1', f'i2/{kh}', f'i3/{kw}'],
|
||||
extras=[id],
|
||||
overflow_conditions=[
|
||||
f'((i2%{kh})*{kw}+i3%{kw}) != @e0(i0,i1,i2/{kh},i3/{kw})'],
|
||||
overflow_value=0)
|
||||
return x
|
||||
h, w = ph * sh, pw * sw
|
||||
if self.stride == self.kernel_size:
|
||||
x = x.reindex(shape=[b, c, h, w],
|
||||
indexes=['i0', 'i1', f'i2/{kh}', f'i3/{kw}'],
|
||||
extras=[id],
|
||||
overflow_conditions=[
|
||||
f'(i2*yshape3+i3) != @e0(i0,i1,i2/{kh},i3/{kw})'],
|
||||
overflow_value=0)
|
||||
else:
|
||||
x = x.reindex_reduce(
|
||||
op="add",
|
||||
shape=[b, c, h, w],
|
||||
indexes=['i0', 'i1',
|
||||
f'@e0(i0,i1,i2,i3)/xshape3',
|
||||
f'@e0(i0,i1,i2,i3)%xshape3'],
|
||||
extras=[id],
|
||||
)
|
||||
return x
|
||||
|
||||
class MaxUnpool3d(Module):
|
||||
''' MaxUnpool3d is the invert version of MaxPool3d with indices.
|
||||
It takes the output index of MaxPool3d as input.
|
||||
The element will be zero if it is not the max pooled value.
|
||||
'''
|
||||
def __init__(self, kernel_size, stride=None):
|
||||
if stride is None: stride = kernel_size
|
||||
kernel_size = _triple(kernel_size)
|
||||
stride = _triple(stride)
|
||||
self.kernel_size = kernel_size
|
||||
self.stride = stride
|
||||
|
||||
def execute(self, x, id, output_size=None):
|
||||
b, c, pd, ph, pw = x.shape
|
||||
kd, kh, kw = self.kernel_size
|
||||
sd, sh, sw = self.stride
|
||||
if output_size:
|
||||
d, h, w = output_size[-3:]
|
||||
else:
|
||||
d, h, w = pd * sd, ph * sh, pw * sw
|
||||
if self.stride == self.kernel_size:
|
||||
x = x.reindex(shape=[b, c, d, h, w],
|
||||
indexes=['i0', 'i1', f'i2/{kd}', f'i3/{kh}', f'i4/{kw}'],
|
||||
extras=[id],
|
||||
overflow_conditions=[
|
||||
f'(i2*yshape3*yshape4+i3*yshape4+i4) != @e0(i0,i1,i2/{kd},i3/{kh},i4/{kw})'],
|
||||
overflow_value=0)
|
||||
else:
|
||||
x = x.reindex_reduce(
|
||||
op="add",
|
||||
shape=[b, c, d, h, w],
|
||||
indexes=['i0', 'i1',
|
||||
f'@e0(i0,i1,i2,i3,i4)/(xshape4*xshape3)',
|
||||
f'@e0(i0,i1,i2,i3,i4)/xshape4%xshape3',
|
||||
f'@e0(i0,i1,i2,i3,i4)%xshape4'],
|
||||
extras=[id],
|
||||
)
|
||||
return x
|
||||
|
|
|
@ -668,7 +668,7 @@ def compile_src(src, h, basename):
|
|||
arr_func_return.append(f"return ({func_call},0)")
|
||||
func_return_failed = "return -1"
|
||||
else:
|
||||
assert "-> void" in func_head
|
||||
assert "-> void" in func_head, func_head
|
||||
arr_func_return.append(f"{func_call};{before_return}return")
|
||||
func_return_failed = "return"
|
||||
# generate error msg when not a valid call
|
||||
|
|
|
@ -0,0 +1,31 @@
|
|||
# wget https://github.com/oneapi-src/oneDNN/archive/refs/tags/v2.2.zip
|
||||
# extract zip
|
||||
# cd to root folder
|
||||
|
||||
mkdir -p build
|
||||
cd build
|
||||
make clean
|
||||
export CC=aarch64-linux-gnu-gcc-8
|
||||
export CXX=aarch64-linux-gnu-g++-8
|
||||
cmake .. \
|
||||
-DCMAKE_SYSTEM_NAME=Linux \
|
||||
-DCMAKE_SYSTEM_PROCESSOR=AARCH64 \
|
||||
-DCMAKE_LIBRARY_PATH=/usr/aarch64-linux-gnu/lib \
|
||||
-DCMAKE_BUILD_TYPE=Release
|
||||
# -DCMAKE_SHARED_LINKER_FLAGS=' -lm ' \
|
||||
make -j8
|
||||
|
||||
name=dnnl_lnx_2.2.0_cpu_gomp_aarch64
|
||||
mkdir -p $name
|
||||
cp -r ../include ./$name/
|
||||
mkdir -p ./$name/lib
|
||||
cp ./src/libmkldnn.so ./$name/lib/libmkldnn.so
|
||||
cp -r ../examples ./$name/
|
||||
cp ./include/oneapi/dnnl/* ./$name/include/oneapi/dnnl/
|
||||
|
||||
tar -acvf $name.tgz ./$name/
|
||||
|
||||
rsync -avPu $name.tgz jittor-web:Documents/jittor-blog/assets/
|
||||
ssh jittor-web Documents/jittor-blog.git/hooks/post-update
|
||||
echo "https://cg.cs.tsinghua.edu.cn/jittor/assets/$name.tgz"
|
||||
md5sum $name.tgz
|
|
@ -10,7 +10,7 @@
|
|||
#include <functional>
|
||||
#include "utils/log.h"
|
||||
|
||||
#define JIT_TEST(name) extern "C" void jit_test_ ## name ()
|
||||
#define JIT_TEST(name) extern void jit_test_ ## name ()
|
||||
void expect_error(std::function<void()> func);
|
||||
|
||||
#define VAR_MEMBER_NAME_AND_OFFSET(name, op) { #name , offsetof(struct op, name) }
|
||||
|
|
|
@ -34,6 +34,7 @@ namespace jittor {
|
|||
Executor exe;
|
||||
extern MemoryProfiler memory_profiler;
|
||||
DECLARE_FLAG(int, profile_memory_enable);
|
||||
DEFINE_FLAG(int, gopt_disable, 0, "Disable graph optimizer.");
|
||||
|
||||
// from fetch_op.cc
|
||||
extern list<VarPtr> fetcher_to_free;
|
||||
|
@ -145,7 +146,7 @@ void Executor::run_sync(vector<Var*> vars, bool device_sync) {
|
|||
}
|
||||
}
|
||||
}
|
||||
if (!need_opt) break;
|
||||
if (!need_opt || gopt_disable) break;
|
||||
for (Node* n : bfs_q) {
|
||||
if (n->flags.get(NodeFlags::_has_gopt)) {
|
||||
n->op()->graph_optimize();
|
||||
|
@ -486,9 +487,14 @@ void Executor::run_sync(vector<Var*> vars, bool device_sync) {
|
|||
if (use_cuda)
|
||||
checkCudaErrors(cudaDeviceSynchronize());
|
||||
#endif
|
||||
for (Var* var : op->outputs())
|
||||
check_nan(var);
|
||||
}
|
||||
#ifdef JT_CHECK_NAN
|
||||
for (Var* var : op->outputs())
|
||||
check_nan(var);
|
||||
#endif
|
||||
#ifdef JT_SYNC
|
||||
checkCudaErrors(cudaDeviceSynchronize());
|
||||
#endif
|
||||
LOGvvv << "Finished Op(" >> op->name() << rid >>
|
||||
"/" >> queue.size() >> ") output:" << op->outputs();
|
||||
if (is_fused_op) {
|
||||
|
|
|
@ -37,6 +37,14 @@ inline static void assign_attrs(Var* a, Var* b) {
|
|||
a->flags.set(NodeFlags::_stop_fuse);
|
||||
}
|
||||
|
||||
map<string,int> grad_breaks;
|
||||
|
||||
void warn_grad_break(int i, Var* v) {
|
||||
if (grad_breaks.count(v->name.c_str())) return;
|
||||
grad_breaks[v->name.c_str()] = 1;
|
||||
LOGw << "grads[">>i>>"] '">> v->name>>"' doesn't have gradient. It will be set to zero:" << v;
|
||||
}
|
||||
|
||||
vector<VarPtr> grad(Var* loss, vector<Var*> targets) {
|
||||
LOGvv << "loss:" >> loss << "targets:" >> targets;
|
||||
CHECK(loss->is_float()) << "Loss should be float";
|
||||
|
@ -212,7 +220,7 @@ vector<VarPtr> grad(Var* loss, vector<Var*> targets) {
|
|||
grad = move(grads[id]);
|
||||
if (!grad) {
|
||||
// TODO: better warning message
|
||||
LOGw << "grads[">>i>>"] '">> var->name>>"' doesn't have gradient. It will be set to zero:" << var;
|
||||
warn_grad_break(i, var);
|
||||
grad = make_number(0.f, var);
|
||||
assign_attrs(grad.ptr, var);
|
||||
}
|
||||
|
|
|
@ -13,14 +13,27 @@ namespace jittor {
|
|||
typedef void (*set_seed_callback)(int);
|
||||
|
||||
void init();
|
||||
|
||||
/**
|
||||
Sets the seed of jittor random number generator. Also see @jittor.set_global_seed.
|
||||
|
||||
----------------
|
||||
|
||||
* [in] seed: a python number.
|
||||
|
||||
*/
|
||||
// @pyjt(set_seed, seed)
|
||||
void set_seed(int seed);
|
||||
|
||||
/**
|
||||
Returns the seed of jittor random number generator.
|
||||
*/
|
||||
// @pyjt(get_seed)
|
||||
int get_seed();
|
||||
|
||||
void add_set_seed_callback(set_seed_callback callback);
|
||||
|
||||
extern "C"
|
||||
extern
|
||||
std::default_random_engine* get_random_engine();
|
||||
|
||||
// things need to be clean before python exit
|
||||
|
|
|
@ -12,6 +12,7 @@
|
|||
#else
|
||||
#include <dlfcn.h>
|
||||
#endif
|
||||
#include <mutex>
|
||||
|
||||
#include "jit_compiler.h"
|
||||
#include "op.h"
|
||||
|
@ -33,14 +34,19 @@ DEFINE_FLAG(int, rewrite_op, 1, "Rewrite source file of jit operator or not");
|
|||
|
||||
namespace jit_compiler {
|
||||
|
||||
std::mutex dl_open_mutex;
|
||||
|
||||
jit_op_entry_t load_jit_lib(string name, string symbol_name="jit_entry") {
|
||||
const char* msg = "";
|
||||
LOGvv << "Opening jit lib:" << name;
|
||||
#ifdef _WIN32
|
||||
void* handle = (void*)LoadLibrary(name.c_str());
|
||||
#else
|
||||
#elif defined(__linux__)
|
||||
void* handle = dlopen(name.c_str(), RTLD_LAZY | RTLD_DEEPBIND | RTLD_LOCAL);
|
||||
msg = dlerror();
|
||||
#else
|
||||
void *handle = dlopen(name.c_str(), RTLD_NOW | RTLD_LOCAL);
|
||||
msg = dlerror();
|
||||
#endif
|
||||
|
||||
CHECK(handle) << "Cannot open library" << name << ":" << msg;
|
||||
|
@ -99,10 +105,10 @@ jit_op_entry_t compile(const string& jit_key, const string& src, const bool is_c
|
|||
+ " \"" + jit_src_path + "\"" + other_src
|
||||
+ cc_flags + extra_flags
|
||||
+ " -o \"" + jit_lib_path + "\"";
|
||||
#ifndef _WIN32
|
||||
#ifdef __linux__
|
||||
cmd = python_path+" "+jittor_path+"/utils/asm_tuner.py "
|
||||
"--cc_path=" + cmd;
|
||||
#endif
|
||||
#endif
|
||||
}
|
||||
cache_compile(cmd, cache_path, jittor_path);
|
||||
auto symbol_name = get_symbol_name(jit_key);
|
||||
|
|
|
@ -8,34 +8,33 @@
|
|||
#include <sys/mman.h>
|
||||
#endif
|
||||
#include <sstream>
|
||||
#include <unistd.h>
|
||||
#include "jit_key.h"
|
||||
#include "utils/str_utils.h"
|
||||
|
||||
namespace jittor {
|
||||
|
||||
const int page_size = 4*1024;
|
||||
|
||||
extern thread_local size_t protected_page;
|
||||
|
||||
#ifndef _WIN32
|
||||
static size_t get_buffer_end_page(size_t buffer_end) {
|
||||
// get the last complete page in buffer
|
||||
// 4k align :
|
||||
// | | | | |
|
||||
// buffer: xxxxxxxxxxxxxxxxxxxxxxxx
|
||||
// ^ buffer_end_page
|
||||
size_t buffer_end_page = buffer_end - buffer_end % page_size;
|
||||
if (buffer_end_page + page_size-1 > buffer_end)
|
||||
buffer_end_page -= page_size;
|
||||
size_t buffer_end_page = buffer_end - buffer_end % getpagesize();
|
||||
if (buffer_end_page + getpagesize()-1 > buffer_end)
|
||||
buffer_end_page -= getpagesize();
|
||||
return buffer_end_page;
|
||||
}
|
||||
#endif
|
||||
|
||||
JitKey::JitKey() {
|
||||
(void)get_buffer_end_page;
|
||||
#ifndef _WIN32
|
||||
auto buffer_end_page = get_buffer_end_page((size_t)&buffer[buffer_size-1]);
|
||||
LOGvv << "protect page" << (void*)buffer_end_page;
|
||||
ASSERT(0==mprotect((void*)buffer_end_page, page_size, PROT_NONE));
|
||||
// windows assign extern thread_local var cause fault, FIX IT
|
||||
ASSERT(0==mprotect((void*)buffer_end_page, getpagesize(), PROT_NONE));
|
||||
protected_page = buffer_end_page;
|
||||
#endif
|
||||
}
|
||||
|
@ -44,7 +43,7 @@ JitKey::~JitKey() {
|
|||
#ifndef _WIN32
|
||||
auto buffer_end_page = get_buffer_end_page((size_t)&buffer[buffer_size-1]);
|
||||
LOGvv << "un-protect page" << (void*)buffer_end_page;
|
||||
mprotect((void*)buffer_end_page, page_size, PROT_READ|PROT_WRITE|PROT_EXEC);
|
||||
mprotect((void*)buffer_end_page, getpagesize(), PROT_READ|PROT_WRITE|PROT_EXEC);
|
||||
protected_page = 0;
|
||||
#endif
|
||||
}
|
||||
|
|
|
@ -165,8 +165,8 @@ inline JK& operator<<(JK& jk, int64 c) {
|
|||
}
|
||||
return jk << JK::hex(c);
|
||||
}
|
||||
#ifndef _WIN32
|
||||
// win32 cause redefinition error
|
||||
|
||||
#ifdef __linux__
|
||||
inline JK& operator<<(JK& jk, long long int c) {
|
||||
return jk << (int64)c;
|
||||
}
|
||||
|
|
|
@ -15,6 +15,9 @@ const char* AlignedAllocator::name() const {return "aligned";}
|
|||
|
||||
void* AlignedAllocator::alloc(size_t size, size_t& allocation) {
|
||||
#ifndef _WIN32
|
||||
#ifdef __APPLE__
|
||||
size += 32-size%32;
|
||||
#endif
|
||||
return aligned_alloc(alignment, size);
|
||||
#else
|
||||
return _aligned_malloc(size, alignment);
|
||||
|
|
|
@ -13,6 +13,7 @@
|
|||
namespace jittor {
|
||||
|
||||
DEFINE_FLAG(int, use_temp_allocator, 1, "Enable temp allocator");
|
||||
vector<TempAllocator*> TempAllocator::temp_allocators;
|
||||
|
||||
TempAllocator::~TempAllocator() {
|
||||
while (!cached_blocks.empty()) {
|
||||
|
|
|
@ -24,6 +24,7 @@ struct TempCachingBlock {
|
|||
struct TempAllocator : Allocator {
|
||||
static const size_t ALIGN_SIZE = 512;
|
||||
static const size_t ID_LIMIT = 1 << 18;
|
||||
static vector<TempAllocator*> temp_allocators;
|
||||
Allocator* underlying;
|
||||
size_t cache_blocks_limit, used_memory, unused_memory;
|
||||
std::map<unsigned long long, TempCachingBlock*> cached_blocks;
|
||||
|
@ -33,6 +34,7 @@ struct TempAllocator : Allocator {
|
|||
|
||||
|
||||
inline TempAllocator(size_t cache_blocks_limit=2) : cache_blocks_limit(cache_blocks_limit), used_memory(0), unused_memory(0), tot_block_id(0), occupied_id_mapper(new TempCachingBlock*[ID_LIMIT]) {
|
||||
temp_allocators.push_back(this);
|
||||
}
|
||||
inline TempAllocator(Allocator* underlying, size_t cache_blocks_limit=2) : TempAllocator(cache_blocks_limit) {
|
||||
setup(underlying);
|
||||
|
|
|
@ -6,11 +6,17 @@
|
|||
// ***************************************************************
|
||||
#include <iomanip>
|
||||
#include <algorithm>
|
||||
#ifndef _WIN32
|
||||
#if defined(__linux__)
|
||||
#include <sys/sysinfo.h>
|
||||
#else
|
||||
#elif defined(__APPLE__)
|
||||
#include <sys/sysctl.h>
|
||||
#include <mach/host_info.h>
|
||||
#include <mach/mach_init.h>
|
||||
#include <mach/mach_host.h>
|
||||
#elif defined(_WIN32)
|
||||
#include <windows.h>
|
||||
#endif
|
||||
#include <unistd.h>
|
||||
|
||||
#include "var.h"
|
||||
#include "op.h"
|
||||
|
@ -104,16 +110,41 @@ void display_memory_info(const char* fileline, bool dump_var, bool red_color) {
|
|||
>> "(" >> std::setprecision(p) >> a->unused_memory*100.0 / total >> "%)"
|
||||
<< "total:" << FloatOutput{(double)total, " KMG", 1024, "B"} >> "\n";
|
||||
}
|
||||
if (use_temp_allocator && exe.temp_allocator) {
|
||||
for (auto& a : TempAllocator::temp_allocators) {
|
||||
auto total = a->used_memory + a->unused_memory;
|
||||
all_total += total;
|
||||
a->is_cuda() ? gpu_total += total : cpu_total += total;
|
||||
log << "name:" << a->name() << "is_cuda:" << a->is_cuda()
|
||||
<< "used:" << FloatOutput{(double)a->used_memory, " KMG", 1024, "B"}
|
||||
>> "(" >> std::setprecision(p) >> a->used_memory*100.0 / total >> "%)"
|
||||
<< "unused:" << FloatOutput{(double)a->unused_memory, " KMG", 1024, "B"}
|
||||
>> "(" >> std::setprecision(p) >> a->unused_memory*100.0 / total >> "%)"
|
||||
<< "total:" << FloatOutput{(double)total, " KMG", 1024, "B"} >> "\n";
|
||||
}
|
||||
}
|
||||
log << "cpu&gpu:" << FloatOutput{(double)all_total, " KMG", 1024, "B"}
|
||||
<< "gpu:" << FloatOutput{(double)gpu_total, " KMG", 1024, "B"}
|
||||
<< "cpu:" << FloatOutput{(double)cpu_total, " KMG", 1024, "B"} >> '\n';
|
||||
if (use_temp_allocator) {
|
||||
TempAllocator* temp_allocator = (TempAllocator*)exe.temp_allocator;
|
||||
log << "\nname:" << temp_allocator->name() << "\n";
|
||||
log << "used_memory:" << FloatOutput{(double)temp_allocator->used_memory, " KMG", 1024, "B"} << "\n";
|
||||
log << "unused_memory:" << FloatOutput{(double)temp_allocator->unused_memory, " KMG", 1024, "B"} << "\n";
|
||||
|
||||
size_t cpu_free = 0;
|
||||
#if defined(__linux__)
|
||||
cpu_free = get_avphys_pages() * sysconf(_SC_PAGESIZE);
|
||||
#elif defined(__APPLE__)
|
||||
{
|
||||
mach_msg_type_number_t count = HOST_VM_INFO_COUNT;
|
||||
vm_statistics_data_t vmstat;
|
||||
if (KERN_SUCCESS == host_statistics(mach_host_self(), HOST_VM_INFO, (host_info_t)&vmstat, &count)) {
|
||||
cpu_free = vmstat.free_count * sysconf(_SC_PAGESIZE);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
size_t gpu_free = 0, _gpu_total = 0;
|
||||
(void)gpu_free; (void)_gpu_total;
|
||||
#ifdef HAS_CUDA
|
||||
cudaMemGetInfo(&gpu_free, &_gpu_total);
|
||||
#endif
|
||||
log << "free: cpu(">>FloatOutput{(double)cpu_free, " KMG", 1024, "B"}
|
||||
>> ") gpu(">>FloatOutput{(double)gpu_free, " KMG", 1024, "B"} >> ")\n";
|
||||
if (dump_var) {
|
||||
vector<Node*> queue;
|
||||
unordered_set<Node*> visited;
|
||||
|
@ -155,22 +186,35 @@ void display_memory_info(const char* fileline, bool dump_var, bool red_color) {
|
|||
log.end();
|
||||
}
|
||||
|
||||
extern vector<void(*)()> sigquit_callback;
|
||||
|
||||
void meminfo_callback() {
|
||||
display_memory_info();
|
||||
}
|
||||
|
||||
MemInfo::MemInfo() {
|
||||
#ifndef _WIN32
|
||||
|
||||
#if defined(__linux__)
|
||||
struct sysinfo info = {0};
|
||||
sysinfo(&info);
|
||||
total_cpu_ram = info.totalram;
|
||||
#else
|
||||
#elif defined(__APPLE__)
|
||||
int mib[] = {CTL_HW, HW_MEMSIZE};
|
||||
size_t len=sizeof(total_cpu_ram);
|
||||
sysctl(mib, 2, &total_cpu_ram, &len, NULL, 0);
|
||||
#elif defined(_WIN32)
|
||||
MEMORYSTATUSEX statex;
|
||||
GlobalMemoryStatusEx (&statex);
|
||||
total_cpu_ram = statex.ullTotalPhys;
|
||||
#endif
|
||||
|
||||
total_cuda_ram = 0;
|
||||
#ifdef HAS_CUDA
|
||||
cudaDeviceProp prop;
|
||||
cudaGetDeviceProperties(&prop, 0);
|
||||
total_cuda_ram = prop.totalGlobalMem;
|
||||
#endif
|
||||
sigquit_callback.push_back(&meminfo_callback);
|
||||
}
|
||||
|
||||
MemInfo mem_info;
|
||||
|
|
|
@ -10,11 +10,11 @@
|
|||
namespace jittor {
|
||||
|
||||
__device__ inline static int floatToOrderedInt(float floatVal) {
|
||||
int intVal = __float_as_int( floatVal );
|
||||
return (intVal >= 0 ) ? intVal : intVal ^ 0x7FFFFFFF;
|
||||
int intVal = __float_as_int( floatVal );
|
||||
return (intVal >= 0 ) ? intVal : intVal ^ 0x7FFFFFFF;
|
||||
}
|
||||
__device__ inline static float orderedIntToFloat(int intVal) {
|
||||
return __int_as_float((intVal >= 0) ? intVal : intVal ^ 0x7FFFFFFF);
|
||||
return __int_as_float((intVal >= 0) ? intVal : intVal ^ 0x7FFFFFFF);
|
||||
}
|
||||
|
||||
__global__ inline static void fix_float_kernel(float* x, int num) {
|
||||
|
@ -24,7 +24,24 @@ __global__ inline static void fix_float_kernel(float* x, int num) {
|
|||
x[i] = orderedIntToFloat(__float_as_int(x[i]));
|
||||
}
|
||||
|
||||
inline static void fix_float(float* x, int num) {
|
||||
|
||||
__device__ inline static long long floatToOrderedInt(double floatVal) {
|
||||
long long intVal = __double_as_longlong( floatVal );
|
||||
return (intVal >= 0 ) ? intVal : intVal ^ 0x7FFFFFFFFFFFFFFF;
|
||||
}
|
||||
__device__ inline static double orderedIntToFloat(long long intVal) {
|
||||
return __longlong_as_double((intVal >= 0) ? intVal : intVal ^ 0x7FFFFFFFFFFFFFFF);
|
||||
}
|
||||
|
||||
__global__ inline static void fix_float_kernel(double* x, int num) {
|
||||
int tid = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
int tnum = gridDim.x * blockDim.x;
|
||||
for (int i=tid; i<num; i+=tnum)
|
||||
x[i] = orderedIntToFloat(__double_as_longlong(x[i]));
|
||||
}
|
||||
|
||||
template<class T>
|
||||
inline static void fix_float(T* x, int num) {
|
||||
fix_float_kernel<<<std::min((num-1)/1024+1,256), 1024>>>(x, num);
|
||||
}
|
||||
|
||||
|
@ -38,6 +55,11 @@ inline float cuda_atomic_max(float* a, float b) {
|
|||
return orderedIntToFloat(atomicMax((int *)a, floatToOrderedInt(b)));
|
||||
}
|
||||
|
||||
template<> __device__
|
||||
inline double cuda_atomic_max(double* a, double b) {
|
||||
return orderedIntToFloat(atomicMax((long long *)a, floatToOrderedInt(b)));
|
||||
}
|
||||
|
||||
template<class T> __device__
|
||||
T cuda_atomic_min(T* a, T b) {
|
||||
return atomicMin(a, b);
|
||||
|
@ -48,6 +70,11 @@ inline float cuda_atomic_min(float* a, float b) {
|
|||
return orderedIntToFloat(atomicMin((int *)a, floatToOrderedInt(b)));
|
||||
}
|
||||
|
||||
template<> __device__
|
||||
inline double cuda_atomic_min(double* a, double b) {
|
||||
return orderedIntToFloat(atomicMin((long long *)a, floatToOrderedInt(b)));
|
||||
}
|
||||
|
||||
template <class T> struct int_mapper {
|
||||
typedef T src;
|
||||
typedef T target;
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#include <bits/stdc++.h>
|
||||
#include <cstring>
|
||||
#include "misc/nano_string.h"
|
||||
|
||||
namespace jittor {
|
||||
|
@ -59,6 +59,9 @@ static unordered_set<string> unary_ops = {
|
|||
"round",
|
||||
"floor",
|
||||
"ceil",
|
||||
"round_int",
|
||||
"floor_int",
|
||||
"ceil_int",
|
||||
"cast",
|
||||
"sin",
|
||||
"asin",
|
||||
|
@ -82,9 +85,9 @@ static unordered_set<string> unary_float_ops = {
|
|||
"sqrt",
|
||||
};
|
||||
static unordered_set<string> unary_int_ops = {
|
||||
"round",
|
||||
"floor",
|
||||
"ceil",
|
||||
"round_int",
|
||||
"floor_int",
|
||||
"ceil_int",
|
||||
};
|
||||
|
||||
static unordered_set<string> binary_ops = {
|
||||
|
|
|
@ -62,6 +62,9 @@ constexpr int ns_max_len = 16;
|
|||
m(round) \
|
||||
m(floor) \
|
||||
m(ceil) \
|
||||
m(round_int) \
|
||||
m(floor_int) \
|
||||
m(ceil_int) \
|
||||
m(cast) \
|
||||
\
|
||||
m(sin) \
|
||||
|
|
|
@ -159,13 +159,8 @@ struct NanoVector {
|
|||
for (auto a : v) push_back_check_overflow(a);
|
||||
}
|
||||
|
||||
inline static NanoVector make(const int64* v, int n) {
|
||||
NanoVector nv;
|
||||
for (int i=0; i<n; i++) nv.push_back_check_overflow(v[i]);
|
||||
return nv;
|
||||
}
|
||||
|
||||
inline static NanoVector make(const int32* v, int n) {
|
||||
template<typename TMakeV>
|
||||
inline static NanoVector make(const TMakeV* v, int n) {
|
||||
NanoVector nv;
|
||||
for (int i=0; i<n; i++) nv.push_back_check_overflow(v[i]);
|
||||
return nv;
|
||||
|
@ -238,6 +233,21 @@ struct NanoVector {
|
|||
v[i] = at(i);
|
||||
return v;
|
||||
}
|
||||
|
||||
inline void _unpack(int i) {
|
||||
return;
|
||||
}
|
||||
|
||||
template<class... Args>
|
||||
void _unpack(int i, int& x, Args&&... args) {
|
||||
x = this->operator[](i);
|
||||
_unpack(i+1, std::forward<Args>(args)...);
|
||||
}
|
||||
|
||||
template<class... Args>
|
||||
void unpack(Args&&... args) {
|
||||
_unpack(0, std::forward<Args>(args)...);
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
|
|
|
@ -47,13 +47,11 @@ struct RingBuffer {
|
|||
}
|
||||
|
||||
inline ~Cond() {
|
||||
#ifndef _WIN32
|
||||
// a dirty hack
|
||||
// ref: https://stackoverflow.com/questions/20439404/pthread-conditions-and-process-termination
|
||||
// cv.__data.__wrefs = 0;
|
||||
#ifdef __linux__
|
||||
cv.__data = {0};
|
||||
#else
|
||||
// cv.__data = 0;
|
||||
#endif
|
||||
pthread_cond_destroy(&cv);
|
||||
}
|
||||
|
|
|
@ -5,12 +5,22 @@
|
|||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#pragma once
|
||||
|
||||
#if defined(__clang__)
|
||||
#include <string_view>
|
||||
#elif defined(__GNUC__)
|
||||
#include <experimental/string_view>
|
||||
#endif
|
||||
|
||||
#include "common.h"
|
||||
|
||||
namespace jittor {
|
||||
|
||||
#if defined(__clang__)
|
||||
using std::string_view;
|
||||
#elif defined(__GNUC__)
|
||||
using std::experimental::string_view;
|
||||
#endif
|
||||
|
||||
template<class T>
|
||||
struct string_view_map {
|
||||
|
|
|
@ -104,6 +104,8 @@ int OpCompiler::total_member_count() {
|
|||
// array need a extra local var
|
||||
if (op->ops[i]->name()==string("array"))
|
||||
member_count += 1;
|
||||
if (op->ops[i]->name()==string("safe_clip"))
|
||||
member_count += 2;
|
||||
member_count += v.size();
|
||||
i += 1;
|
||||
}
|
||||
|
@ -826,11 +828,15 @@ string OpCompiler::__get_fused_src(
|
|||
const unordered_set<string> members = {
|
||||
"x", "y", "z", "cond", "output", "extras"
|
||||
};
|
||||
const unordered_set<string> scalar_members = {
|
||||
"left", "right"
|
||||
};
|
||||
const unordered_set<string> unchanged = {
|
||||
"for", "const", "auto", "get_random_engine",
|
||||
"int", "float", "bool", "CHECK", "STRINGIZE",
|
||||
"void", "__restrict__", "if", "true", "false",
|
||||
"Op", "Var", "Node", "itof", "assert", "ASSERT"
|
||||
"Op", "Var", "Node", "itof", "assert", "ASSERT",
|
||||
"float64"
|
||||
};
|
||||
auto not_change = [&](const string& s) -> bool {
|
||||
if (unchanged.count(s)) return true;
|
||||
|
@ -941,7 +947,8 @@ string OpCompiler::__get_fused_src(
|
|||
while (l<src.size() && isvar(src[l])) l++;
|
||||
auto var = src.substr(j, l-j);
|
||||
if (var[0] == ':' || isdigit(var[0]) || not_change(var) || src[j-1]=='.' || src[j-1]=='>') {} else
|
||||
if (members.count(var)) {
|
||||
if (members.count(var) || scalar_members.count(var)) {
|
||||
bool is_member = members.count(var);
|
||||
string arg_name = "op" + S(oi) + "_" + var;
|
||||
if (l<src.size() && src[l]=='[') {
|
||||
// handle extras[...]
|
||||
|
@ -964,7 +971,8 @@ string OpCompiler::__get_fused_src(
|
|||
" = (("+name3+"Op*)(ops[" + S(oi) + "]))->" + var;
|
||||
fused_kernel_args += ";\n";
|
||||
kernel_args.insert(arg_name);
|
||||
op_members[oi].push_back(arg_name);
|
||||
if (is_member)
|
||||
op_members[oi].push_back(arg_name);
|
||||
}
|
||||
fused_kernel += arg_name;
|
||||
j = l-1;
|
||||
|
|
|
@ -476,6 +476,14 @@ VarPtr BinaryOp::grad(Var* out, Var* dout, Var* v, int v_index) {
|
|||
return make_binary(ndzx, y2, ns_divide);
|
||||
}
|
||||
}
|
||||
if (ns == ns_mod) {
|
||||
if (v_index == 0)
|
||||
return dout;
|
||||
else {
|
||||
auto a = make_unary(make_binary(x, y, ns_divide), ns_floor);
|
||||
return make_unary(a, ns_negative);
|
||||
}
|
||||
}
|
||||
if (ns == ns_maximum || ns == ns_minimum) {
|
||||
auto zeros = make_number(0, dout);
|
||||
auto cond = make_binary(y, z, ns_equal);
|
||||
|
|
|
@ -13,12 +13,12 @@ namespace jittor {
|
|||
#define pow(T,a,b) ::pow(a,b)
|
||||
#define maximum(T,a,b) ::max(T(a), T(b))
|
||||
#define minimum(T,a,b) ::min(T(a), T(b))
|
||||
#define mod(T,a,b) @if(@strcmp(@Tx,float32)==0,::fmodf(T(a),T(b)),@if(@strcmp(@Tx,float64)==0,::fmod(T(a),T(b)),((a)%(b))))
|
||||
#define mod(T,a,b) @if(@strcmp(@T,float32)==0,(a-::floorf((a)/(b))*(b)),@if(@strcmp(@Tx,float64)==0,(a-::floor((a)/(b))*(b)),(a%b)))
|
||||
#else // JIT_cpu
|
||||
#define pow(T,a,b) std::pow(a,b)
|
||||
#define maximum(T,a,b) std::max(T(a), T(b))
|
||||
#define minimum(T,a,b) std::min(T(a), T(b))
|
||||
#define mod(T,a,b) @if(@strcmp(@Tx,float32)==0 || @strcmp(@Tx,float64)==0,std::fmod((T)a,(T)b),((a)%(b)))
|
||||
#define mod(T,a,b) @if(@strcmp(@T,float32)==0,(a-std::floor((a)/(b))*(b)),@if(@strcmp(@Tx,float64)==0,(a-std::floor((a)/(b))*(b)),(a%b)))
|
||||
#endif
|
||||
#define add(T,a,b) ((a)+(b))
|
||||
#define subtract(T,a,b) ((a)-(b))
|
||||
|
|
|
@ -0,0 +1,114 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers: Dun Liang <randonlang@gmail.com>.
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#include "ops/fuse_transpose_op.h"
|
||||
#include "var.h"
|
||||
#include "ops/op_register.h"
|
||||
#include "misc/cuda_flags.h"
|
||||
|
||||
namespace jittor {
|
||||
|
||||
#ifndef JIT
|
||||
static auto make_transpose = get_op_info("fuse_transpose")
|
||||
.get_constructor<VarPtr, Var*, NanoVector>();
|
||||
|
||||
static inline NanoVector get_reverse(NanoVector axes) {
|
||||
NanoVector reverse;
|
||||
reverse.reserve(axes.size(), axes.size());
|
||||
for (uint i=0; i<axes.size(); i++)
|
||||
reverse.set_data(axes[i], i);
|
||||
return reverse;
|
||||
}
|
||||
|
||||
FuseTransposeOp::FuseTransposeOp(Var* x, NanoVector axes_) : x(x), axes(axes_) {
|
||||
OpType tp = OpType::broadcast;
|
||||
if (!x->is_finished()) {
|
||||
auto type = x->input()->type();
|
||||
if (type==OpType::broadcast || type==OpType::element)
|
||||
tp = OpType::reduce;
|
||||
}
|
||||
flags.set(NodeFlags::_cpu);
|
||||
flags.set(NodeFlags::_cuda);
|
||||
set_type(tp);
|
||||
int i=0;
|
||||
for (; i<axes.size(); i++)
|
||||
if (i!=axes[i]) break;
|
||||
if (i==axes.size() && axes.size()) {
|
||||
forward(x);
|
||||
return;
|
||||
}
|
||||
auto xdim = x->shape.size();
|
||||
if (!axes.size()) {
|
||||
for (int i=0; i<(int)xdim; i++)
|
||||
axes.push_back(xdim-1-i);
|
||||
}
|
||||
y = create_output(nullptr, x->dtype());
|
||||
}
|
||||
|
||||
void FuseTransposeOp::infer_shape() {
|
||||
auto xdim = x->shape.size();
|
||||
CHECK(xdim);
|
||||
if (!axes.size()) {
|
||||
for (int i=0; i<(int)xdim; i++)
|
||||
axes.push_back(xdim-1-i);
|
||||
} else {
|
||||
CHECKop(axes.size(),==,xdim);
|
||||
int64_t mask=0;
|
||||
for (auto i : axes) mask |= 1<<i;
|
||||
CHECK(mask==((1ll<<xdim)-1)) << "Invalid axes" << axes;
|
||||
}
|
||||
NanoVector shape;
|
||||
for (uint i=0; i<xdim; i++)
|
||||
shape.push_back(x->shape[axes[i]]);
|
||||
y->set_shape(shape);
|
||||
}
|
||||
|
||||
VarPtr FuseTransposeOp::grad(Var* out, Var* dout, Var* v, int v_index) {
|
||||
return make_transpose(dout, get_reverse(axes));
|
||||
}
|
||||
|
||||
void FuseTransposeOp::jit_prepare(JK& jk) {
|
||||
auto bc = type()==OpType::broadcast;
|
||||
auto ax = bc ? axes : get_reverse(axes);
|
||||
jk << _CS("[Tx:") << x->dtype();
|
||||
jk << _CS("][DIM=") << JK::hex1(axes.size());
|
||||
jk << _CS("][BC:") << JK::hex1(bc);
|
||||
for (uint i=0; i<ax.size(); i++)
|
||||
jk << _CS("][AXES") << JK::hex1(ax[i]) << '=' << JK::hex1(i);
|
||||
jk << ']';
|
||||
}
|
||||
|
||||
#else // JIT
|
||||
void FuseTransposeOp::jit_run() {
|
||||
auto* __restrict__ xp = x->ptr<Tx>();
|
||||
auto* __restrict__ yp = y->ptr<Tx>();
|
||||
|
||||
@for(i, 0, DIM, index_t yshape@i = y->shape[@i];)
|
||||
@for(i, 0, DIM, index_t xshape@i = yshape@{AXES@i};)
|
||||
index_t xstride@{DIM-1} = 1;
|
||||
@for(i, DIM-2, -1, -1, auto xstride@i = xstride@{i+1} * xshape@{i+1};)
|
||||
index_t ystride@{DIM-1} = 1;
|
||||
@for(i, DIM-2, -1, -1, auto ystride@i = ystride@{i+1} * yshape@{i+1};)
|
||||
@if(BC,
|
||||
@for(d, 0, DIM, for (index_t i@d=0; i@d < yshape@d; i@d++)) {
|
||||
auto yid = @for(d, 0, DIM, + i@d * ystride@d);
|
||||
@for(d, 0, DIM, auto xid@d = i@{AXES@d};)
|
||||
auto xid = @for(d, 0, DIM, + xid@d * xstride@d);
|
||||
yp[yid] = xp[xid];
|
||||
},
|
||||
@for(d, 0, DIM, for (index_t i@d=0; i@d < xshape@d; i@d++)) {
|
||||
auto xid = @for(d, 0, DIM, + i@d * xstride@d);
|
||||
@for(d, 0, DIM, auto yid@d = i@{AXES@d};)
|
||||
auto yid = @for(d, 0, DIM, + yid@d * ystride@d);
|
||||
yp[yid] = xp[xid];
|
||||
}
|
||||
)
|
||||
// unused var
|
||||
(void)xshape0;
|
||||
}
|
||||
#endif // JIT
|
||||
|
||||
} // jittor
|
|
@ -0,0 +1,23 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers: Dun Liang <randonlang@gmail.com>.
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#pragma once
|
||||
#include "op.h"
|
||||
|
||||
namespace jittor {
|
||||
|
||||
struct FuseTransposeOp : Op {
|
||||
Var* x, * y;
|
||||
NanoVector axes;
|
||||
FuseTransposeOp(Var* x, NanoVector axes=NanoVector());
|
||||
|
||||
const char* name() const override { return "fuse_transpose"; }
|
||||
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
|
||||
void infer_shape() override;
|
||||
DECLARE_jit_run;
|
||||
};
|
||||
|
||||
} // jittor
|
|
@ -93,6 +93,9 @@ void GetitemOp::infer_slices(
|
|||
} else
|
||||
if (s.is_ellipsis()) {
|
||||
auto remain_slice = vs.n-vid-1;
|
||||
for (int i=vid+1; i<vs.n; i++)
|
||||
if (vs.slices[i].is_none())
|
||||
remain_slice--;
|
||||
auto remain_idims = nin-i;
|
||||
auto ellipsis_size = remain_idims - remain_slice;
|
||||
ASSERT(ellipsis_size>=0) << "NDims not match";
|
||||
|
|
|
@ -0,0 +1,47 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers: Dun Liang <randonlang@gmail.com>.
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#include <cmath>
|
||||
#include "var.h"
|
||||
#include "ops/safe_clip_op.h"
|
||||
#include "ops/op_register.h"
|
||||
|
||||
namespace jittor {
|
||||
|
||||
#ifndef JIT
|
||||
|
||||
SafeClipOp::SafeClipOp(Var* x, float64 left, float64 right) : x(x), left(left), right(right) {
|
||||
flags.set(NodeFlags::_cpu);
|
||||
flags.set(NodeFlags::_cuda);
|
||||
set_type(OpType::element);
|
||||
y = create_output(nullptr, x->dtype());
|
||||
}
|
||||
|
||||
VarPtr SafeClipOp::grad(Var* out, Var* dout, Var* v, int v_index) {
|
||||
return dout;
|
||||
}
|
||||
|
||||
void SafeClipOp::infer_shape() {
|
||||
y->set_shape(x->shape);
|
||||
}
|
||||
|
||||
void SafeClipOp::jit_prepare(JK& jk) {
|
||||
jk << _CS("[Tx:") << x->dtype() <<']';
|
||||
}
|
||||
|
||||
#else // JIT
|
||||
void SafeClipOp::jit_run() {
|
||||
auto* __restrict__ xp = x->ptr<Tx>();
|
||||
Tx left_value = (Tx)std::max((float64)std::numeric_limits<Tx>::lowest(), left);
|
||||
Tx right_value = (Tx)std::min((float64)std::numeric_limits<Tx>::max(), right);
|
||||
auto* __restrict__ yp = y->ptr<Tx>();
|
||||
index_t num = y->num;
|
||||
for (index_t i=0; i<num; i++)
|
||||
yp[i] = xp[i] < left_value ? left_value : (xp[i] > right_value ? right_value : xp[i]);
|
||||
}
|
||||
#endif // JIT
|
||||
|
||||
} // jittor
|
|
@ -0,0 +1,33 @@
|
|||
// ***************************************************************
|
||||
// Copyright (c) 2021 Jittor. All Rights Reserved.
|
||||
// Maintainers: Dun Liang <randonlang@gmail.com>.
|
||||
// This file is subject to the terms and conditions defined in
|
||||
// file 'LICENSE.txt', which is part of this source code package.
|
||||
// ***************************************************************
|
||||
#pragma once
|
||||
#include "op.h"
|
||||
|
||||
|
||||
namespace jittor {
|
||||
|
||||
struct SafeClipOp : Op {
|
||||
Var* x, * y;
|
||||
float64 left, right;
|
||||
/** Safe clip value to a range, and keep
|
||||
the gradient pass thought.
|
||||
|
||||
* [in] x: input value
|
||||
* [in] left: float64 clip min value.
|
||||
* [in] right: float64 clip max value.
|
||||
|
||||
*/
|
||||
// @pybind(safe_clip)
|
||||
SafeClipOp(Var* x, float64 left, float64 right);
|
||||
|
||||
const char* name() const override { return "safe_clip"; }
|
||||
VarPtr grad(Var* out, Var* dout, Var* v, int v_index) override;
|
||||
void infer_shape() override;
|
||||
DECLARE_jit_run;
|
||||
};
|
||||
|
||||
} // jittor
|
|
@ -316,8 +316,11 @@ void SetitemOp::jit_run() {
|
|||
checkCudaErrors(cudaMemcpyAsync(op, ip, out->size, cudaMemcpyDefault, 0));
|
||||
#endif
|
||||
|
||||
if (data->allocation == in->allocation &&
|
||||
data->allocator == in->allocator)
|
||||
if (flags.get((NodeFlags::Flags(SetitemOp::_data_inplaced))) &&
|
||||
// array op may move the data allocation, double check
|
||||
// affect test_contrib.pu
|
||||
in->allocator == data->allocator &&
|
||||
in->allocation == data->allocation)
|
||||
return;
|
||||
|
||||
@for(d, 0, ODIM, for (index_t i@d=0; i@d < oshape@d; i@d++)) {
|
||||
|
|
|
@ -11,6 +11,7 @@
|
|||
namespace jittor {
|
||||
|
||||
struct SetitemOp : Op {
|
||||
static constexpr int _data_inplaced = NodeFlags::_has_vary_input + 1;
|
||||
VarSlices vs;
|
||||
// map i to related var slice
|
||||
NanoVector i_to_vs;
|
||||
|
|
|
@ -168,7 +168,7 @@ static unordered_set<string> unary_ops = {
|
|||
>>> a
|
||||
jt.Var([ 2.101595 0.33055413 -0.44147047 -0.7720668 ], dtype=float32)
|
||||
>>> jt.round(a)
|
||||
jt.Var([ 2 0 0 -1], dtype=int32)
|
||||
jt.Var([ 2.0 0.0 0.0 -1.0], dtype=float32)
|
||||
*/
|
||||
"round",
|
||||
|
||||
|
@ -185,7 +185,7 @@ static unordered_set<string> unary_ops = {
|
|||
>>> a
|
||||
jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
|
||||
>>> jt.floor(a)
|
||||
jt.Var([-2 -1 -1 -1], dtype=int32)
|
||||
jt.Var([-2.0 -1.0 -1.0 -1.0], dtype=float32)
|
||||
*/
|
||||
"floor",
|
||||
|
||||
|
@ -203,10 +203,63 @@ static unordered_set<string> unary_ops = {
|
|||
>>> a
|
||||
jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
|
||||
>>> jt.ceil(a)
|
||||
jt.Var([-1 0 0 0], dtype=int32)
|
||||
jt.Var([-1.0 0.0 0.0 0.0], dtype=float32)
|
||||
*/
|
||||
"ceil",
|
||||
|
||||
/**
|
||||
Returns the closest integer of the input ``x``.
|
||||
|
||||
----------------
|
||||
|
||||
* [in] x: the input jt.Var.
|
||||
|
||||
----------------
|
||||
|
||||
Example-1::
|
||||
>>> a = jt.randn(4)
|
||||
>>> a
|
||||
jt.Var([ 2.101595 0.33055413 -0.44147047 -0.7720668 ], dtype=float32)
|
||||
>>> jt.round_int(a)
|
||||
jt.Var([ 2 0 0 -1], dtype=int32)
|
||||
*/
|
||||
"round_int",
|
||||
|
||||
/**
|
||||
Returns the largest integer less than or equal to the input ``x``.
|
||||
|
||||
----------------
|
||||
|
||||
* [in] x: the input jt.Var.
|
||||
|
||||
----------------
|
||||
Example-1::
|
||||
>>> a = jt.randn(4)
|
||||
>>> a
|
||||
jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
|
||||
>>> jt.floor_int(a)
|
||||
jt.Var([-2 -1 -1 -1], dtype=int32)
|
||||
*/
|
||||
"floor_int",
|
||||
|
||||
/**
|
||||
Returns the smallest integer greater than or equal to the input ``x``.
|
||||
|
||||
----------------
|
||||
|
||||
* [in] x: the input jt.Var.
|
||||
|
||||
----------------
|
||||
|
||||
Example-1::
|
||||
>>> a = jt.randn(4)
|
||||
>>> a
|
||||
jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
|
||||
>>> jt.ceil_int(a)
|
||||
jt.Var([-1 0 0 0], dtype=int32)
|
||||
*/
|
||||
"ceil_int",
|
||||
|
||||
/**
|
||||
Returns the sine of the input ``x``.
|
||||
|
||||
|
|
|
@ -21,6 +21,9 @@ namespace jittor {
|
|||
#define round(T,x) ((T) ::roundf((x)))
|
||||
#define floor(T,x) ((T) ::floorf((x)))
|
||||
#define ceil(T,x) ((T) ::ceilf((x)))
|
||||
#define round_int(T,x) ((T) ::roundf((x)))
|
||||
#define floor_int(T,x) ((T) ::floorf((x)))
|
||||
#define ceil_int(T,x) ((T) ::ceilf((x)))
|
||||
|
||||
#define sin(T,x) ((T) ::sinf((x)))
|
||||
#define asin(T,x) ((T) ::asinf((x)))
|
||||
|
@ -49,6 +52,9 @@ namespace jittor {
|
|||
#define round(T,x) ((T)std::round((x)))
|
||||
#define floor(T,x) ((T)std::floor((x)))
|
||||
#define ceil(T,x) ((T)std::ceil((x)))
|
||||
#define round_int(T,x) ((T)std::round((x)))
|
||||
#define floor_int(T,x) ((T)std::floor((x)))
|
||||
#define ceil_int(T,x) ((T)std::ceil((x)))
|
||||
|
||||
#define sin(T,x) ((T) std::sin((x)))
|
||||
#define asin(T,x) ((T) std::asin((x)))
|
||||
|
|
|
@ -47,6 +47,8 @@ static void setitem_inplace(SetitemOp* op) {
|
|||
return;
|
||||
}
|
||||
auto output = op->outputs().front();
|
||||
// return if output is all ready shared
|
||||
if (output->allocator) return;
|
||||
output->share_with(input);
|
||||
|
||||
auto data = op->input(1);
|
||||
|
@ -78,13 +80,13 @@ static void setitem_inplace(SetitemOp* op) {
|
|||
VarSlice s = vs.slices[i];
|
||||
if (!(s.is_slice())) return;
|
||||
Slice ss = s.slice;
|
||||
if (!(ss.start == 0 && ss.stop >= in_shape[i] && ss.step == 1))
|
||||
if (!(ss.start == 0 && (ss.mask&2) && ss.step == 1))
|
||||
return;
|
||||
inplace_size *= in_shape[i];
|
||||
}
|
||||
|
||||
VarSlice s = vs.slices[0];
|
||||
if (s.is_var()) return;
|
||||
if (s.is_var() || s.is_str()) return;
|
||||
|
||||
auto size = 0;
|
||||
if (s.is_int())
|
||||
|
@ -105,6 +107,7 @@ static void setitem_inplace(SetitemOp* op) {
|
|||
}
|
||||
add_dependency(data->input(), {input->node()});
|
||||
data->share_with(input, size);
|
||||
op->flags.set((NodeFlags::Flags(SetitemOp::_data_inplaced)));
|
||||
}
|
||||
|
||||
struct BBox {
|
||||
|
@ -174,7 +177,10 @@ static void getitem_inplace(GetitemOp* op) {
|
|||
|
||||
auto in = op->inputs().front();
|
||||
auto ou = op->outputs().front();
|
||||
|
||||
|
||||
// return if out is all ready inplaced
|
||||
if (ou->allocator)
|
||||
return;
|
||||
// return if input or output's shape is variable
|
||||
if (in->num <= 0 || ou->num <= 0)
|
||||
return;
|
||||
|
@ -186,12 +192,12 @@ static void getitem_inplace(GetitemOp* op) {
|
|||
VarSlice s = vs.slices[i];
|
||||
if (!(s.is_slice())) return;
|
||||
Slice ss = s.slice;
|
||||
if (!(ss.start == 0 && ss.stop >= in_shape[i] && ss.step == 1))
|
||||
return;
|
||||
if (!(ss.start == 0 && (ss.mask&2) && ss.step == 1))
|
||||
return;
|
||||
}
|
||||
|
||||
VarSlice s = vs.slices[0];
|
||||
if (s.is_var()) return;
|
||||
if (s.is_var() || s.is_str()) return;
|
||||
|
||||
auto size = 0;
|
||||
if (s.is_int())
|
||||
|
@ -213,7 +219,7 @@ void SetitemOp::graph_optimize() {
|
|||
void GetitemOp::graph_optimize() {
|
||||
// This optimize is still WIP
|
||||
// LOGir << "hello getitem graph_optimize";
|
||||
setitem_grad_opt(this);
|
||||
// setitem_grad_opt(this);
|
||||
(void)setitem_grad_opt;
|
||||
// (void)getitem_inplace;
|
||||
getitem_inplace(this);
|
||||
|
|
|
@ -22,7 +22,7 @@ void FloatAtomicFixPass::run() {
|
|||
if (!choice) return;
|
||||
|
||||
unordered_map<string,int> fixed;
|
||||
auto fix_float_atomic = [&](string name) {
|
||||
auto fix_float_atomic = [&](string name, Var* v) {
|
||||
if (fixed.count(name)) return;
|
||||
fixed[name] = 1;
|
||||
string namep = name+"p";
|
||||
|
@ -38,9 +38,15 @@ void FloatAtomicFixPass::run() {
|
|||
return;
|
||||
// fix code a[b] = c -->
|
||||
// a[b] = __int_as_float(floatToOrderedInt(c))
|
||||
string new_code = namep+'['+results.at(0)->to_string(true)+
|
||||
"] = __int_as_float(floatToOrderedInt(" +
|
||||
results.at(1)->to_string(true) + "));";
|
||||
string new_code;
|
||||
if (v->dtype() == ns_float32)
|
||||
new_code = namep+'['+results.at(0)->to_string(true)+
|
||||
"] = __int_as_float(floatToOrderedInt(" +
|
||||
results.at(1)->to_string(true) + "));";
|
||||
else
|
||||
new_code = namep+'['+results.at(0)->to_string(true)+
|
||||
"] = __longlong_as_double(floatToOrderedInt(" +
|
||||
results.at(1)->to_string(true) + "));";
|
||||
LOGvvvv << "prev code" << code >> "\nreplace:" << new_code;
|
||||
code = new_code;
|
||||
});
|
||||
|
@ -74,7 +80,7 @@ void FloatAtomicFixPass::run() {
|
|||
}
|
||||
if (!var->dtype().is_float()) return;
|
||||
LOGvvvv << "find var" << var << "op" << op;
|
||||
fix_float_atomic(s);
|
||||
fix_float_atomic(s, var);
|
||||
});
|
||||
}
|
||||
|
||||
|
|
|
@ -67,6 +67,10 @@ void LoopToFuncPass::run() {
|
|||
args.push_back(d.get());
|
||||
continue;
|
||||
}
|
||||
if (endswith(d->attrs["lvalue"], "_value")) {
|
||||
args.push_back(d.get());
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
func->push_back(d->clone());
|
||||
|
@ -99,7 +103,9 @@ void LoopToFuncPass::run() {
|
|||
auto& fc = ir->children[i];
|
||||
fc->attrs["loop_func"] = func->attrs["lvalue"];
|
||||
}
|
||||
// ir->remove_all_unused();
|
||||
#ifdef __APPLE__
|
||||
ir->remove_all_unused();
|
||||
#endif
|
||||
}
|
||||
|
||||
} // jittor
|
|
@ -13,7 +13,7 @@
|
|||
|
||||
namespace jittor {
|
||||
|
||||
DECLARE_FLAG(int, para_opt_level);
|
||||
DEFINE_FLAG(int, para_opt_level, 3, "para_opt_level");
|
||||
|
||||
void LoopVarAnalyzePass::run() {
|
||||
// loop_vars: opi_xx->shape[j]
|
||||
|
@ -130,6 +130,7 @@ void LoopVarAnalyzePass::run() {
|
|||
}
|
||||
loop_vars.reserve(loop_var->shape.size());
|
||||
string vname = pm->oc->get_name_by_op_var(op, loop_var);
|
||||
ASSERT(vname!="__fill__");
|
||||
for (uint j=0; j<loop_var->shape.size(); j++)
|
||||
loop_vars.emplace_back(vname+"->shape["+S(j)+"]");
|
||||
break;
|
||||
|
|
|
@ -65,9 +65,12 @@ unique_ptr<MemoryChecker>* load_memory_checker(string name) {
|
|||
LOGvv << "Opening jit lib:" << name;
|
||||
#ifdef _WIN32
|
||||
void* handle = (void*)LoadLibrary(name.c_str());
|
||||
#else
|
||||
#elif defined(__linux__)
|
||||
void* handle = dlopen(name.c_str(), RTLD_LAZY | RTLD_DEEPBIND | RTLD_LOCAL);
|
||||
msg = dlerror();
|
||||
#else
|
||||
void* handle = dlopen(name.c_str(), RTLD_LAZY | RTLD_LOCAL);
|
||||
msg = dlerror();
|
||||
#endif
|
||||
|
||||
CHECK(handle) << "Cannot open library" << name << ":" << msg;
|
||||
|
|
|
@ -20,6 +20,7 @@
|
|||
namespace jittor {
|
||||
|
||||
DEFINE_FLAG(int, trace_py_var, 0, "Trace py stack max depth for debug.");
|
||||
DEFINE_FLAG(int, trace_var_data, 0, "Trace py stack max depth for debug.");
|
||||
Op* trace_grad_op = nullptr;
|
||||
|
||||
TraceData trace_data;
|
||||
|
@ -185,6 +186,44 @@ static vector<Stack> get_stack_info() {
|
|||
return stacks;
|
||||
}
|
||||
|
||||
template<class T>
|
||||
string get_str(T* t, int64 num) {
|
||||
string s = "";
|
||||
for (int64 i=0; i<num; i++) {
|
||||
s += S(t[i]);
|
||||
if (i != num-1)
|
||||
s += ',';
|
||||
}
|
||||
return s;
|
||||
}
|
||||
|
||||
static inline string get_var_data_str(Var* v) {
|
||||
if (v->dtype() == ns_int8)
|
||||
return get_str(v->ptr<int8>(), v->num);
|
||||
if (v->dtype() == ns_int16)
|
||||
return get_str(v->ptr<int16>(), v->num);
|
||||
if (v->dtype() == ns_int32)
|
||||
return get_str(v->ptr<int32>(), v->num);
|
||||
if (v->dtype() == ns_int64)
|
||||
return get_str(v->ptr<int64>(), v->num);
|
||||
|
||||
|
||||
if (v->dtype() == ns_uint8)
|
||||
return get_str(v->ptr<uint8>(), v->num);
|
||||
if (v->dtype() == ns_uint16)
|
||||
return get_str(v->ptr<uint16>(), v->num);
|
||||
if (v->dtype() == ns_uint32)
|
||||
return get_str(v->ptr<uint32>(), v->num);
|
||||
if (v->dtype() == ns_uint64)
|
||||
return get_str(v->ptr<uint64>(), v->num);
|
||||
|
||||
if (v->dtype() == ns_float32)
|
||||
return get_str(v->ptr<float32>(), v->num);
|
||||
if (v->dtype() == ns_float64)
|
||||
return get_str(v->ptr<float64>(), v->num);
|
||||
return "";
|
||||
}
|
||||
|
||||
void TraceData::record_node(Node* node, bool record_stack) {
|
||||
if (thread_name.size()) return;
|
||||
NodeData data;
|
||||
|
@ -255,6 +294,8 @@ void TraceData::record_exe_node(Node* node) {
|
|||
data.attrs["dsize"] = S(v->dtype().dsize());
|
||||
data.attrs["name"] = v->name.c_str();
|
||||
data.attrs["is_var"] = "1";
|
||||
if (trace_var_data && v->mem_ptr)
|
||||
data.attrs["data"] = get_var_data_str(v);
|
||||
} else {
|
||||
auto op = node->op();
|
||||
data.attrs["name"] = op->name_ex();
|
||||
|
|
|
@ -86,15 +86,19 @@ ArrayOp::ArrayOp(PyObject* obj) {
|
|||
// use 32-bit by default
|
||||
if ((auto_convert_64_to_32 || holder.obj)
|
||||
&& args.dtype.dsize() == 8 && args.ptr) {
|
||||
auto num = PyArray_Size(arr)/8;
|
||||
auto size = PyArray_Size(arr);
|
||||
args.buffer.reset(new char[size]);
|
||||
auto pre_data = args.ptr;
|
||||
args.ptr = args.buffer.get();
|
||||
auto num = size/8;
|
||||
if (args.dtype.is_int()) {
|
||||
auto* __restrict__ i64 = (int64*)args.ptr;
|
||||
auto* __restrict__ i64 = (int64*)pre_data;
|
||||
auto* __restrict__ i32 = (int32*)args.ptr;
|
||||
for (int i=0; i<num; i++)
|
||||
i32[i] = (int32)i64[i];
|
||||
args.dtype = ns_int32;
|
||||
} else if (args.dtype.is_float()) {
|
||||
auto* __restrict__ f64 = (float64*)args.ptr;
|
||||
auto* __restrict__ f64 = (float64*)pre_data;
|
||||
auto* __restrict__ f32 = (float32*)args.ptr;
|
||||
for (int i=0; i<num; i++)
|
||||
f32[i] = (float32)f64[i];
|
||||
|
@ -136,7 +140,11 @@ ArrayOp::ArrayOp(PyObject* obj) {
|
|||
std::memcpy(host_ptr, args.ptr, size);
|
||||
} else {
|
||||
// this is non-continue numpy array
|
||||
#if defined(__linux__) || defined(_WIN32)
|
||||
int64 dims[args.shape.size()];
|
||||
#elif defined(__APPLE__)
|
||||
long dims[args.shape.size()];
|
||||
#endif
|
||||
for (int i=0; i<args.shape.size(); i++)
|
||||
dims[i] = args.shape[i];
|
||||
holder.assign(PyArray_New(
|
||||
|
|
|
@ -266,7 +266,11 @@ DEF_IS(ArrayArgs, bool) is_type(PyObject* obj) {
|
|||
}
|
||||
|
||||
DEF_IS(ArrayArgs, PyObject*) to_py_object(const T& a) {
|
||||
#if defined(__linux__) || defined(_WIN32)
|
||||
int64 dims[a.shape.size()];
|
||||
#elif defined(__APPLE__)
|
||||
long dims[a.shape.size()];
|
||||
#endif
|
||||
for (int i=0; i<a.shape.size(); i++)
|
||||
dims[i] = a.shape[i];
|
||||
PyObjHolder obj(PyArray_SimpleNew(
|
||||
|
@ -378,7 +382,11 @@ DEF_IS(VarHolder*, T) from_py_object(PyObject* obj, unique_ptr<VarHolder>& holde
|
|||
|
||||
struct DataView;
|
||||
DEF_IS(DataView, PyObject*) to_py_object(T a) {
|
||||
#if defined(__linux__) || defined(_WIN32)
|
||||
int64 dims[a.shape.size()];
|
||||
#elif defined(__APPLE__)
|
||||
long dims[a.shape.size()];
|
||||
#endif
|
||||
for (int i=0; i<a.shape.size(); i++)
|
||||
dims[i] = a.shape[i];
|
||||
PyObjHolder oh(PyArray_New(
|
||||
|
|
|
@ -109,7 +109,11 @@ static void push_py_object(RingBuffer* rb, PyObject* obj, uint64& __restrict__ o
|
|||
rb->push_t<NanoString>(args.dtype, offset);
|
||||
rb->push(size, offset);
|
||||
args.ptr = rb->get_ptr(size, offset);
|
||||
#if defined(__linux__) || defined(_WIN32)
|
||||
int64 dims[args.shape.size()];
|
||||
#elif defined(__APPLE__)
|
||||
long dims[args.shape.size()];
|
||||
#endif
|
||||
for (int i=0; i<args.shape.size(); i++)
|
||||
dims[i] = args.shape[i];
|
||||
PyObjHolder oh(PyArray_New(
|
||||
|
|
|
@ -415,8 +415,19 @@ inline Console() {
|
|||
#endif
|
||||
|
||||
run("import jittor as jt");
|
||||
make_pyjt_array = (PyObject* (*)(const vector<int64>& shape, const string& dtype, const void* data))dlsym(RTLD_DEFAULT, "_ZN6jittor15make_pyjt_arrayERKSt6vectorIlSaIlEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKv");
|
||||
get_pyjt_array = (void (*)(PyObject* obj, vector<int64>& shape, string& dtype, void*& data))dlsym(RTLD_DEFAULT, "_ZN6jittor14get_pyjt_arrayEP7_objectRSt6vectorIlSaIlEERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERPv");
|
||||
#ifdef __APPLE__
|
||||
auto symbol_make_pyjt_array = "__ZN6jittor15make_pyjt_arrayERKNSt3__16vectorIxNS0_9allocatorIxEEEERKNS0_12basic_stringIcNS0_11char_traitsIcEENS2_IcEEEEPKv";
|
||||
auto symbol_gen_pyjt_array = "__ZN6jittor14get_pyjt_arrayEP7_objectRNSt3__16vectorIxNS2_9allocatorIxEEEERNS2_12basic_stringIcNS2_11char_traitsIcEENS4_IcEEEERPv";
|
||||
#else
|
||||
auto symbol_make_pyjt_array = "_ZN6jittor15make_pyjt_arrayERKSt6vectorIlSaIlEERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKv";
|
||||
auto symbol_gen_pyjt_array = "_ZN6jittor14get_pyjt_arrayEP7_objectRSt6vectorIlSaIlEERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERPv";
|
||||
#endif
|
||||
make_pyjt_array = (PyObject* (*)(const vector<int64>& shape, const string& dtype, const void* data))dlsym(RTLD_DEFAULT, symbol_make_pyjt_array);
|
||||
get_pyjt_array = (void (*)(PyObject* obj, vector<int64>& shape, string& dtype, void*& data))dlsym(RTLD_DEFAULT, symbol_gen_pyjt_array);
|
||||
if (!make_pyjt_array || !get_pyjt_array) {
|
||||
std::cerr << "get symbol failed." << std::endl;
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
inline ~Console() {
|
||||
|
|
|
@ -141,7 +141,7 @@ size_t skip_comments(const string& src, size_t i) {
|
|||
return i;
|
||||
}
|
||||
|
||||
void process(string src, vector<string>& input_names) {
|
||||
void process(string src, vector<string>& input_names, string& cmd) {
|
||||
for (size_t i=0; i<src.size(); i++) {
|
||||
i = skip_comments(src, i);
|
||||
if (i>=src.size()) break;
|
||||
|
@ -163,6 +163,20 @@ void process(string src, vector<string>& input_names) {
|
|||
input_names.push_back(inc);
|
||||
}
|
||||
}
|
||||
if (l-k>2 && src[k] == 'J' && src[k+1] == 'T' && j-i==6 && src.substr(i,j-i) == "#ifdef") {
|
||||
auto inc = src.substr(k, l-k);
|
||||
auto env = getenv(inc.c_str());
|
||||
if (env && string(env)!="0") {
|
||||
string dflag = " -D"+inc+"="+string(env)+" -o ";
|
||||
if (cmd.find(dflag) == string::npos) {
|
||||
// -D flags should insert before -o flag
|
||||
auto cmds = split(cmd, " -o ", 2);
|
||||
if (cmds.size() == 2) {
|
||||
cmd = cmds[0] + dflag + cmds[1];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
i=l;
|
||||
}
|
||||
}
|
||||
|
@ -191,7 +205,7 @@ static inline bool is_full_path(const string& name) {
|
|||
#endif
|
||||
}
|
||||
|
||||
bool cache_compile(const string& cmd, const string& cache_path, const string& jittor_path) {
|
||||
bool cache_compile(string cmd, const string& cache_path, const string& jittor_path) {
|
||||
vector<string> input_names;
|
||||
map<string,vector<string>> extra;
|
||||
string output_name;
|
||||
|
@ -199,29 +213,21 @@ bool cache_compile(const string& cmd, const string& cache_path, const string& ji
|
|||
string output_cache_key;
|
||||
bool ran = false;
|
||||
output_cache_key = read_all(output_name+".key");
|
||||
// string cd_cmd = cache_path.size() ? "cd " + cache_path + " && " + cmd : cmd;
|
||||
string cd_cmd = cmd;
|
||||
if (output_cache_key.size() == 0) {
|
||||
LOGvv << "Cache key of" << output_name << "not found.";
|
||||
LOGvvv << "Run cmd:" << cmd;
|
||||
check_win_file(output_name);
|
||||
system_with_check(cd_cmd.c_str());
|
||||
ran = true;
|
||||
}
|
||||
string cache_key = cmd;
|
||||
cache_key += "\n";
|
||||
string cache_key;
|
||||
unordered_set<string> processed;
|
||||
auto src_path = join(jittor_path, "src");
|
||||
const auto& extra_include = extra["I"];
|
||||
for (size_t i=0; i<input_names.size(); i++) {
|
||||
if (processed.count(input_names[i]) != 0)
|
||||
continue;
|
||||
if (input_names[i] == "dynamic_lookup")
|
||||
continue;
|
||||
processed.insert(input_names[i]);
|
||||
auto src = read_all(input_names[i]);
|
||||
ASSERT(src.size()) << "Source read failed:" << input_names[i];
|
||||
auto hash = S(hash64(src));
|
||||
vector<string> new_names;
|
||||
process(src, new_names);
|
||||
process(src, new_names, cmd);
|
||||
for (auto& name : new_names) {
|
||||
string full_name;
|
||||
if (name.substr(0, 4) == "jit/" || name.substr(0, 4) == "gen/")
|
||||
|
@ -251,11 +257,18 @@ bool cache_compile(const string& cmd, const string& cache_path, const string& ji
|
|||
cache_key += hash;
|
||||
cache_key += "\n";
|
||||
}
|
||||
cache_key = cmd + "\n" + cache_key;
|
||||
if (output_cache_key.size() == 0) {
|
||||
LOGvv << "Cache key of" << output_name << "not found.";
|
||||
LOGvvv << "Run cmd:" << cmd;
|
||||
system_with_check(cmd.c_str());
|
||||
ran = true;
|
||||
}
|
||||
if (output_cache_key.size() != 0 && output_cache_key != cache_key) {
|
||||
LOGvv << "Cache key of" << output_name << "changed.";
|
||||
LOGvvv << "Run cmd:" << cmd;
|
||||
check_win_file(output_name);
|
||||
system_with_check(cd_cmd.c_str());
|
||||
system_with_check(cmd.c_str());
|
||||
ran = true;
|
||||
}
|
||||
if (output_cache_key != cache_key) {
|
||||
|
@ -326,7 +339,8 @@ void test_find_nams_error(string cmd) {
|
|||
|
||||
void test_process(string src, vector<string> files) {
|
||||
vector<string> ifiles;
|
||||
jittor::jit_compiler::process(src, ifiles);
|
||||
string cmd;
|
||||
jittor::jit_compiler::process(src, ifiles, cmd);
|
||||
CHECK(files.size() == ifiles.size());
|
||||
for (size_t i=0; i<files.size(); i++)
|
||||
CHECKop(files[i],==,ifiles[i]);
|
||||
|
|
|
@ -14,7 +14,7 @@ string read_all(const string& fname);
|
|||
void write(const string& fname, const string& src);
|
||||
bool file_exist(const string& fname);
|
||||
string join(string a, string b);
|
||||
bool cache_compile(const string& cmd, const string& cache_path="", const string& jittor_path="");
|
||||
bool cache_compile(string cmd, const string& cache_path="", const string& jittor_path="");
|
||||
|
||||
} // jit_compiler
|
||||
} // jittor
|
|
@ -8,11 +8,11 @@
|
|||
#include "pyjt/py_converter.h"
|
||||
#include "pyjt/py_arg_printer.h"
|
||||
#ifdef __clang__
|
||||
#pragma clang diagnostic ignored "-Wdefaulted-function-deleted"
|
||||
// #pragma clang diagnostic ignored "-Wdefaulted-function-deleted"
|
||||
#endif
|
||||
#ifdef __GNUC__
|
||||
#endif
|
||||
#ifndef _WIN32
|
||||
#ifdef __linux__
|
||||
#include <sys/prctl.h>
|
||||
#endif
|
||||
#include <signal.h>
|
||||
|
@ -23,7 +23,7 @@
|
|||
namespace jittor {
|
||||
|
||||
void init_subprocess() {
|
||||
#ifndef _WIN32
|
||||
#ifdef __linux__
|
||||
prctl(PR_SET_PDEATHSIG, SIGKILL);
|
||||
#endif
|
||||
}
|
||||
|
|
|
@ -206,6 +206,8 @@ int segfault_happen = 0;
|
|||
string thread_local thread_name;
|
||||
static int _pid = getpid();
|
||||
vector<void(*)()> cleanup_callback;
|
||||
vector<void(*)()> sigquit_callback;
|
||||
int64 last_q_time;
|
||||
|
||||
#ifdef _WIN32
|
||||
void handle_signal(int signal) {
|
||||
|
@ -214,13 +216,49 @@ void handle_signal(int signal) {
|
|||
abort();
|
||||
}
|
||||
#else
|
||||
static inline void do_exit() {
|
||||
#ifdef __APPLE__
|
||||
_Exit(1);
|
||||
#else
|
||||
std::quick_exit(1);
|
||||
#endif
|
||||
}
|
||||
|
||||
void segfault_sigaction(int signal, siginfo_t *si, void *arg) {
|
||||
if (signal == SIGQUIT) {
|
||||
if (_pid == getpid()) {
|
||||
std::cerr << "Caught SIGQUIT" << std::endl;
|
||||
int64 now = clock();
|
||||
if (now > last_q_time && last_q_time+CLOCKS_PER_SEC/10 > now) {
|
||||
last_q_time = now;
|
||||
std::cerr << "GDB attach..." << std::endl;
|
||||
breakpoint();
|
||||
} else {
|
||||
last_q_time = now;
|
||||
for (auto f : sigquit_callback)
|
||||
f();
|
||||
}
|
||||
}
|
||||
return;
|
||||
}
|
||||
if (signal == SIGCHLD) {
|
||||
if (si->si_code != CLD_EXITED && si->si_status != SIGTERM) {
|
||||
LOGe << "Caught SIGCHLD"
|
||||
<< "si_errno:" << si->si_errno
|
||||
<< "si_code:" << si->si_code
|
||||
<< "si_status:" << si->si_status
|
||||
<< ", quick exit";
|
||||
exited = true;
|
||||
do_exit();
|
||||
}
|
||||
return;
|
||||
}
|
||||
if (signal == SIGINT) {
|
||||
if (_pid == getpid()) {
|
||||
LOGe << "Caught SIGINT, quick exit";
|
||||
}
|
||||
exited = true;
|
||||
std::quick_exit(1);
|
||||
do_exit();
|
||||
}
|
||||
std::cerr << "Caught segfault at address " << si->si_addr << ", "
|
||||
<< "thread_name: '" << thread_name << "', flush log..." << std::endl;
|
||||
|
@ -266,6 +304,7 @@ int register_sigaction() {
|
|||
sigaction(SIGSTOP, &sa, NULL);
|
||||
sigaction(SIGFPE, &sa, NULL);
|
||||
sigaction(SIGINT, &sa, NULL);
|
||||
sigaction(SIGCHLD, &sa, NULL);
|
||||
sigaction(SIGILL, &sa, NULL);
|
||||
sigaction(SIGBUS, &sa, NULL);
|
||||
sigaction(SIGQUIT, &sa, NULL);
|
||||
|
@ -364,6 +403,34 @@ but you can hot fix it by this command:
|
|||
)";
|
||||
}
|
||||
|
||||
static inline void check_cuda_gcc_version(const string& output) {
|
||||
/* if such error occur:
|
||||
error: identifier "__is_assignable" is undefined
|
||||
this means your gcc version is not match with nvcc,
|
||||
for example, nvcc 10 support gcc<=7, nvcc 11 support gcc<=9,
|
||||
|
||||
https://gist.github.com/ax3l/9489132
|
||||
*/
|
||||
string pat = "__is_assignable";
|
||||
auto id = output.find(pat);
|
||||
if (id == string::npos) return;
|
||||
LOGf << output << R"(
|
||||
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
|
||||
Dear user, your nvcc and gcc version are still not match
|
||||
after dirty hack, your should install the correct version of g++
|
||||
or nvcc, for example, nvcc 10 support g++<=7, nvcc 11 support g++<=9,
|
||||
here is the NVCC Compatibility Matrix:
|
||||
https://gist.github.com/ax3l/9489132
|
||||
Please install correct version of gcc, for example:
|
||||
>>> sudo apt install g++-7
|
||||
After your g++ is installed, using enviroment variable `cc_path` to
|
||||
tell jittor use the correct version of g++, for example:
|
||||
>>> cc_path='g++-7' python3.7 -m jittor.test.test_core
|
||||
If you still have problems, please contact us:
|
||||
https://github.com/Jittor/jittor/issues
|
||||
)";
|
||||
}
|
||||
|
||||
#ifdef _WIN32
|
||||
int system_popen(const char *cmd) {
|
||||
HANDLE g_hChildStd_OUT_Rd = NULL;
|
||||
|
@ -444,6 +511,7 @@ int system_popen(const char *cmd) {
|
|||
|
||||
if (ec) {
|
||||
check_cuda_unsupport_version(output);
|
||||
check_cuda_gcc_version(output);
|
||||
}
|
||||
return ec;
|
||||
}
|
||||
|
@ -468,6 +536,7 @@ int system_popen(const char* cmd) {
|
|||
}
|
||||
if (ret) {
|
||||
check_cuda_unsupport_version(output);
|
||||
check_cuda_gcc_version(output);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
|
|
@ -34,9 +34,9 @@ constexpr int32_t basename_index(const char * const path, const int32_t index =
|
|||
|
||||
#define PREDICT_BRANCH_NOT_TAKEN(x) (__builtin_expect(x, 0))
|
||||
|
||||
extern "C" uint32_t get_tid();
|
||||
extern "C" bool g_supports_color;
|
||||
extern "C" void print_prefix(std::ostream* out);
|
||||
extern uint32_t get_tid();
|
||||
extern bool g_supports_color;
|
||||
extern void print_prefix(std::ostream* out);
|
||||
|
||||
#ifdef _WIN32
|
||||
constexpr char green[] = "\x1b[1;32m";
|
||||
|
@ -83,10 +83,10 @@ static void get_color(char level, int verbose, const char*& color_begin, const c
|
|||
|
||||
#endif
|
||||
|
||||
extern "C" void send_log(std::ostringstream&& out);
|
||||
extern "C" void flush_log();
|
||||
extern "C" void log_capture_start();
|
||||
extern "C" void log_capture_stop();
|
||||
extern void send_log(std::ostringstream&& out);
|
||||
extern void flush_log();
|
||||
extern void log_capture_start();
|
||||
extern void log_capture_stop();
|
||||
extern std::vector<std::map<string,string>> log_capture_read();
|
||||
extern string thread_local thread_name;
|
||||
|
||||
|
@ -170,9 +170,10 @@ template<class T> T get_from_env(const char* name,const T& _default) {
|
|||
template<> std::string get_from_env(const char* name, const std::string& _default);
|
||||
|
||||
#define DECLARE_FLAG(type, name) \
|
||||
extern "C" type name; \
|
||||
extern "C" std::string doc_ ## name; \
|
||||
extern "C" void set_ ## name (const type&);
|
||||
extern type name; \
|
||||
extern std::string doc_ ## name; \
|
||||
extern void set_ ## name (const type&);
|
||||
|
||||
|
||||
#ifdef JIT
|
||||
|
||||
|
|
|
@ -25,15 +25,15 @@ bool endswith(const string& a, const string& b) {
|
|||
|
||||
vector<string> split(const string& s, const string& sep, int max_split) {
|
||||
vector<string> ret;
|
||||
int pos = -1, pos_next;
|
||||
int pos = 0, pos_next;
|
||||
while (1) {
|
||||
pos_next = s.find(sep, pos+1);
|
||||
pos_next = s.find(sep, pos);
|
||||
if (pos_next == (int)string::npos || (int)ret.size() == max_split-1) {
|
||||
ret.push_back(s.substr(pos+sep.size()));
|
||||
ret.push_back(s.substr(pos));
|
||||
return ret;
|
||||
}
|
||||
ret.push_back(s.substr(pos+sep.size(), pos_next-pos-sep.size()));
|
||||
pos = pos_next;
|
||||
ret.push_back(s.substr(pos, pos_next-pos));
|
||||
pos = pos_next + sep.size();
|
||||
}
|
||||
ASSERT(max_split==0);
|
||||
return ret;
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue