diff --git a/AWESOME-JITTOR-LIST.md b/AWESOME-JITTOR-LIST.md
new file mode 100644
index 00000000..8b493f1b
--- /dev/null
+++ b/AWESOME-JITTOR-LIST.md
@@ -0,0 +1,28 @@
+# Awesome Jittor List
+
+- [GAN](https://github.com/Jittor/gan-jittor): Jittor GAN模型库一共包括了从2014到2019最主流的27种GAN模型。这27个GAN总计被引用60953次，每篇文章平均引用次数为2176。[缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/resources/jittor-gan/gan-all.png)
+- [实例分割](https://github.com/Jittor/InstanceSegmentation-jittor): Jittor实例分割模型库一共包含了6种Backbone，和11类检测分割模型，包含最经典的Mask RCNN系列，实时实例分割网络以及人体分割网络等等。[缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/resources/jittor-seg/fenge.png)
+- [语义分割](https://github.com/Jittor/segmentation-jittor): 目前，Jittor已经支持了目前主流的语义分割算法。其中包含了三种经典的 Backbone ，以及六种常见的分割模型。[缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/resources/jittor-is/fenge.png)
+- [点云](https://github.com/Jittor/PointCloudLib): 计图框架本次发布的点云模型库包括几种最有影响力的模型：PointNet、PointNet++、PointCNN、DGCNN 和PointConv ，支持分类和分割。[缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/resources/jittor-point/dianyun.png)
+- [可微渲染](https://github.com/Jittor/jrender): 可微渲染目前被广泛地应用于三维重建，同时在人体重建、人脸重建、三维属性估计等应用。目前，Jrender已经支持可微表面渲染和可微体渲染特性。[缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/tutorial/2020-10-17-22-00-dr/dr.png)
+- [遥感检测](https://github.com/Jittor/JDet): JDet是基于Jittor的遥感目标检测算法库。JDet目前提供了4个主流遥感目标检测：S2ANet、Gliding、RetinaNet和Faster R-CNN，其他主流模型陆续添加中。[缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images//download/jdet.png)
+- [医学图像分割](https://github.com/THU-CVlab/JMedSeg): Jittor Medical Segmentation Lib -- The assignment of Pattern Recognition course (2021 Spring) in Tsinghua University. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/download/JMedSeg-0.jpg)
+- [PCT](https://github.com/MenghaoGuo/PCT): This is a Jittor implementation of PCT: Point Cloud Transformer. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/download/pct-0.jpg)
+- [DeepFaceDrawing](https://github.com/IGLICT/DeepFaceDrawing-Jittor): One version of our system is implemented using the Jittor, and you need to install Jittor first. We will also provide a version in pytorch. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/papers/2020-9-10-DeepFaceDrawing.jpg)
+- [JittorVis](https://github.com/thu-vis/JittorVis): JittorVis is an open-source library for understanding the inner workings of Jittor models by visually illustrating their dataflow graphs. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/download/jittorvis.png)
+- [PraNet](https://github.com/DengPingFan/PraNet): Parallel Reverse Attention Network for Polyp Segmentation. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/download/PraNet.png)
+- [DeepFaceEditing](https://github.com/IGLICT/DeepFaceEditing-Jittor): Deep Face Generation and Editing with Disentangled Geometry and Appearance Control. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/images/download/deepfaceediting-0.jpg)
+- [SINet-V2](https://github.com/GewelsJI/SINet-V2): Concealed Object Detection (SINet-V2, IEEE TPAMI 2021). Code using Jittor Framework is available. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/SINet-V2.png)
+- [hlagcn](https://github.com/shedy-pub/hlagcn-jittor): Jittor implementation of the paper "Hierarchical Layout-Aware Graph Convolutional Network for Unified Aesthetics Assessment". [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/hlagcn-jittor.jpg)
+- [Jittor-Image-Models](https://github.com/Jittor-Image-Models/Jittor-Image-Models): About
+Jittor Image Models is a library for pulling together a wide variety of SOTA deep learning models in the Jittor framework. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/white.png)
+- [LearnJittorBasicIn60Min](https://github.com/Jittor/LearnJittorBasicIn60Min): 计图零基础快速入门教程（60分钟) [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/white.png)
+- [di-fusion-network-jittor](https://github.com/heiwang1997/di-fusion-network-jittor): Jittor implementation of the network architecture in DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/white.png)
+- [Zhusuan](https://github.com/McGrady00H/Zhusuan-Jittor): Zhusuan with backend Jittor. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/white.png)
+- [PRS-Net](https://github.com/IGLICT/PRS-NET-Jittor): This repository is code release for PRS-Net: Planar Reflective Symmetry Detection Net for 3D Models. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/PRS-Net.png)
+- [PFSegNets](https://github.com/Jittor/PFSegNets-Jittor): This repo contains the the implementation of CVPR-2021 work: PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation by Jittor. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/PFSegNets.jpg)
+- [APDrawingGAN](https://github.com/yiranran/APDrawingGAN-Jittor): We provide Jittor implementations for our CVPR 2019 oral paper "APDrawingGAN: Generating Artistic Portrait Drawings from Face Photos with Hierarchical GANs". [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/APDrawingGAN.png)
+- [APDrawingGAN2](https://github.com/yiranran/APDrawingGAN2-Jittor): We provide Jittor implementations for our TPAMI 2020 paper "Line Drawings for Face Portraits from Photos using Global and Local Structure based GANs". [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/APDrawingGAN2.png)
+- [CMIC-Retrieval](https://github.com/IGLICT/IBSR_jittor): Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/CMIC-Retrieval.png)
+- [Unpaired-Portrait-Drawing](https://github.com/yiranran/Unpaired-Portrait-Drawing-Jittor): We provide Jittor implementations for our CVPR 2020 paper "Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping". [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/Unpaired-Portrait-Drawing.jpg)
+- [Jittor-MLP](https://github.com/liuruiyang98/Jittor-MLP): Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLPv2, ConvMLP, ConvMixer in Jittor. [缩略图](https://cg.cs.tsinghua.edu.cn/jittor/assets/images/white.png)
\ No newline at end of file
diff --git a/README.cn.md b/README.cn.md
index a958b2c4..221c59b2 100644
--- a/README.cn.md
+++ b/README.cn.md
@@ -17,6 +17,8 @@ Jittor前端语言为Python。前端使用了模块化和动态图执行的设
 *  [Jittor模型库](https://cg.cs.tsinghua.edu.cn/jittor/resources/)
 *  [Jittor文档](https://cg.cs.tsinghua.edu.cn/jittor/assets/docs/index.html)
 *  [Github](https://github.com/jittor/jittor)， [Gitee](https://gitee.com/jittor/jittor)
+*  [Jittor 论坛](https://discuss.jittor.org/)
+*  即时通信: QQ Group(761222083)
 
 
 
diff --git a/README.md b/README.md
index a70d622f..2db2b51c 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,9 @@ Related Links:
 *  [Jittor Models](https://cg.cs.tsinghua.edu.cn/jittor/resources/)
 *  [Jittor Documents](https://cg.cs.tsinghua.edu.cn/jittor/assets/docs/index.html)
 *  [Github](https://github.com/jittor/jittor), [Gitee](https://gitee.com/jittor/jittor)
+*  [Jittor Forum](https://discuss.jittor.org/)
+*  [Awesome Jittor List](https://github.com/Jittor/jittor/blob/master/AWESOME-JITTOR-LIST.md)
+*  IM: QQ Group(761222083)
 
 
 
diff --git a/README.src.md b/README.src.md
index 6b11f688..d4a0f5d2 100644
--- a/README.src.md
+++ b/README.src.md
@@ -21,6 +21,8 @@ Related Links:
 *  [Jittor Models](https://cg.cs.tsinghua.edu.cn/jittor/resources/)
 *  [Jittor Documents](https://cg.cs.tsinghua.edu.cn/jittor/assets/docs/index.html)
 *  [Github](https://github.com/jittor/jittor), [Gitee](https://gitee.com/jittor/jittor)
+*  [Jittor Forum](https://discuss.jittor.org/)
+*  IM: QQ Group(761222083)
 
 相关链接：
 *  [Jittor官网](https://cg.cs.tsinghua.edu.cn/jittor/)
@@ -28,6 +30,8 @@ Related Links:
 *  [Jittor模型库](https://cg.cs.tsinghua.edu.cn/jittor/resources/)
 *  [Jittor文档](https://cg.cs.tsinghua.edu.cn/jittor/assets/docs/index.html)
 *  [Github](https://github.com/jittor/jittor)， [Gitee](https://gitee.com/jittor/jittor)
+*  [Jittor 论坛](https://discuss.jittor.org/)
+*  即时通信: QQ Group(761222083)
 
 
 The following example shows how to model a two-layer neural network step by step and train from scratch In a few lines of Python code.
diff --git a/doc/source/Jittor性能测试与对比方法.md b/doc/source/Jittor性能测试与对比方法.md
new file mode 100644
index 00000000..aa05d000
--- /dev/null
+++ b/doc/source/Jittor性能测试与对比方法.md
@@ -0,0 +1,176 @@
+Jittor性能测试与对比方法
+=====================
+
+下面代码以AlexNet为例，用于演示 Jittor 性能测试的正确方法：
+
+```python
+import time
+import jittor as jt
+from jittor.models import resnet50
+jt.flags.use_cuda = jt.has_cuda
+
+warmup = 10
+rerun = 100
+batch_size = 8
+data = jt.random((batch_size, 3, 224, 224))
+model = resnet50()
+model.eval()
+
+# 此段代码对jittor进行热身，确保时间测试准确
+jt.sync_all(True)
+for i in range(warmup):
+    pred = model(data)
+    # sync是把计算图发送到计算设备上
+    pred.sync()
+# sync_all(true)是把计算图发射到计算设备上，并且同步。
+# 只有运行了jt.sync_all(True)才会真正地运行，时间才是有效的，因此执行forward前后都要执行这句话
+jt.sync_all(True)
+
+# 开始测试运行时间
+start = time.time()
+for i in range(rerun):
+    pred = model(data)
+    pred.sync()
+jt.sync_all(True)
+end = time.time()
+
+print("Jittor FPS:", (rerun*batch_size)/(end-start))
+
+```
+
+在这段代码中，我们定义了几个参数`batch_size`, `warmup`, `rerun`, batch_size代表批大小，warmup是用于热身的循环次数，而rerun是用于测速的循环次数，最终输出FPS，对Jittor进行正确测速的关键是 热身部分和同步部分，热身部分确保测试时间稳定，没有包含编译用的时间，而同步部分确保计算完成，因为jittor是一个异步框架，只有同步操作能保证计算完成。
+
+以上代码的运行结果如下（RTX Titan，batch 8）：
+
+```
+Compiling Operators(8/8) used: 7.35s eta:    0s
+Compiling Operators(13/13) used: 8.36s eta:    0s
+Jittor FPS: 908.9853866375396
+```
+
+我们还可以使用类似的代码测试 PyTorch的性能：
+
+```python
+import time
+import torch
+from torchvision.models import resnet50
+
+warmup = 10
+rerun = 100
+batch_size = 8
+data = torch.randn((batch_size, 3, 224, 224)).cuda()
+model = resnet50()
+model.cuda()
+model.eval()
+
+# 此段代码对pytorch进行热身，确保时间测试准确
+torch.cuda.synchronize()
+for i in range(warmup):
+    pred = model(data)
+# synchronize用于确保PyTorch计算完成
+torch.cuda.synchronize()
+
+# 开始测试运行时间
+start = time.time()
+for i in range(rerun):
+    pred = model(data)
+torch.cuda.synchronize()
+end = time.time()
+
+print("PyTorch FPS:", (rerun*batch_size)/(end-start))
+```
+
+
+以上代码的运行结果如下（RTX Titan，batch 8）：
+
+```
+PyTorch FPS: 807.4806873965665
+```
+
+我们还可以对这两段代码合并，并对比结果的一致性：
+
+```python
+import time
+import jittor as jt
+from jittor.models import resnet50
+jt.flags.use_cuda = jt.has_cuda
+
+warmup = 100
+rerun = 1000
+batch_size = 8
+data = jt.random((batch_size, 3, 224, 224))
+model = resnet50()
+model.eval()
+
+# 此段代码对jittor进行热身，确保时间测试准确
+jt.sync_all(True)
+for i in range(warmup):
+    pred = model(data)
+    # sync是把计算图发送到计算设备上
+    pred.sync()
+# sync_all(true)是把计算图发射到计算设备上，并且同步。
+# 只有运行了jt.sync_all(True)才会真正地运行，时间才是有效的，因此执行forward前后都要执行这句话
+jt.sync_all(True)
+
+# 开始测试运行时间
+start = time.time()
+for i in range(rerun):
+    pred = model(data)
+    pred.sync()
+jt.sync_all(True)
+end = time.time()
+
+print("Jittor FPS:", (rerun*batch_size)/(end-start))
+# 将 jittor 数据和参数导出为 numpy 和 torch 格式
+jittor_data = pred.numpy()
+jittor_param = model.state_dict(to="torch")
+
+import numpy as np
+import torch
+from torchvision.models import resnet50
+data = torch.Tensor(data.numpy()).cuda()
+model = resnet50()
+# 加载 jittor 参数
+model.load_state_dict(jittor_param)
+model.cuda()
+model.eval()
+
+# 此段代码对pytorch进行热身，确保时间测试准确
+torch.cuda.synchronize()
+for i in range(warmup):
+    pred = model(data)
+# synchronize用于确保PyTorch计算完成
+torch.cuda.synchronize()
+
+# 开始测试运行时间
+start = time.time()
+for i in range(rerun):
+    pred = model(data)
+torch.cuda.synchronize()
+end = time.time()
+
+print("PyTorch FPS:", (rerun*batch_size)/(end-start))
+pytorch_data = pred.detach().cpu().numpy()
+err = np.mean(np.abs(pytorch_data - jittor_data))
+print("mean error:", err)
+
+```
+
+
+以上代码运行结果如下：
+
+```
+Jittor FPS: 908.9853866375396
+PyTorch FPS: 807.4806873965665
+mean error: 1e-5
+```
+
+误差输出为1e-5, 在可接受范围内。正确测速与对比的几大关键点为：
+
+1. 充分热身，除去框架的准备时间。
+2. 多次运行，确保测试时间稳定。
+3. 加上同步语句，确保测试时间准确。
+4. 保证显存充足，在显存不足时，jittor会调用统一内存来弥补，会产生性能损失，请密切关注`nvidia-smi`的输出结果。
+5. 保证对比模型的一致性，检查输出结果的一致。
+
+如果您对测试结果有疑问，或者有优化需求，欢迎随时联系Jittor开发团队。
diff --git a/doc/source/Jittor调试技巧.md b/doc/source/Jittor调试技巧.md
index 02dc6fac..1740a204 100644
--- a/doc/source/Jittor调试技巧.md
+++ b/doc/source/Jittor调试技巧.md
@@ -74,3 +74,17 @@ export gdb_attach=1
 
 其中，环境变量`debug=1`代表开启jittor的debug模式，性能会大幅下降，但会保留调试信息，`gdb_attach=1`将会自动将gdb贴在jittor的主进程上，方便您进行单步调试。关于gdb的使用，您可以参考[GDB Cheat Sheet](https://darkdust.net/files/GDB%20Cheat%20Sheet.pdf)
 
+
+## 管理Jittor cache
+
+Jittor会在`～/.cache/jittor`目录下创建cache， cache里面可能包括 core（内核）、cuda编译器、cuda库、数据集（dataset）、预训练参数等等，在某些情况下cache可能失效，如系统更新、驱动更新等等，这种情况可能需要用户手动清除cache， 清除的方法如下：
+
+```
+python3 -m jittor_utils.clean_cache all
+```
+
+以上命令会清除jittor的所有cache，如果您不想全部清除，可以参考命令行帮助：
+
+```
+python3 -m jittor_utils.clean_cache help
+```
\ No newline at end of file
diff --git a/doc/source/index.rst b/doc/source/index.rst
index eec82932..aec4aad8 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -48,6 +48,7 @@
    :caption: 其他:
    
    Jittor调试技巧
+   Jittor性能测试与对比方法
    教程 <https://cg.cs.tsinghua.edu.cn/jittor/tutorial/>
 
 Indices and tables
diff --git a/python/jittor/__init__.py b/python/jittor/__init__.py
index 853a5833..4472a498 100644
--- a/python/jittor/__init__.py
+++ b/python/jittor/__init__.py
@@ -9,7 +9,7 @@
 # file 'LICENSE.txt', which is part of this source code package.
 # ***************************************************************
 
-__version__ = '1.3.0.7'
+__version__ = '1.3.1.16'
 from jittor_utils import lock
 with lock.lock_scope():
     ori_int = int
@@ -26,8 +26,7 @@ with lock.lock_scope():
     from .compile_extern import mkl_ops, mpi, mpi_ops, in_mpi, rank, world_size
     if core.get_device_count() == 0:
         has_cuda = compile_extern.has_cuda = compiler.has_cuda = False
-    if has_cuda:
-        from .compile_extern import cudnn, curand, cublas
+    from .compile_extern import cudnn, curand, cublas
     from .init_cupy import numpy2cupy
 
 import contextlib
@@ -53,6 +52,30 @@ def safepickle(obj, path):
     with open(path, 'wb') as f:
         f.write(s)
 
+def _load_pkl(s, path):
+    try:
+        return pickle.loads(s)
+    except Exception as e:
+        msg = str(e)
+        msg += f"\nPath: \"{path}\""
+        if "trunc" in msg:
+            msg += "\nThis file maybe corrupted, please consider remove it" \
+                 " and re-download."
+        raise RuntimeError(msg)
+
+def _upload(path, url, jk):
+    prefix = "https://cg.cs.tsinghua.edu.cn/jittor/assets"
+    if url.startswith("jittorhub://"):
+        url = url.replace("jittorhub://", prefix+"/build/checkpoints/")
+    assert url.startswith(prefix)
+    suffix = url[len(prefix):]
+    jkey = flags.cache_path+"/_jkey"
+    with open(jkey, 'w') as f:
+        f.write(jk)
+    assert os.system(f"s""c""p"+f" -i \"{jkey}\" \"{path}\" jittor" "@" "166" f".111.68.30:Documents/jittor-blog/assets{suffix}") == 0
+    assert os.system(f"s""s""h"f" -i \"{jkey}\" jittor" "@" "166" ".111.68.30 Documents/jittor-blog.git/hooks/post-update") == 0
+
+
 def safeunpickle(path):
     if path.startswith("jittorhub://"):
         path = path.replace("jittorhub://", "https://cg.cs.tsinghua.edu.cn/jittor/assets/build/checkpoints/")
@@ -82,12 +105,14 @@ def safeunpickle(path):
     with open(path, "rb") as f:
         s = f.read()
     if not s.endswith(b"HCAJSLHD"):
-        return pickle.loads(s)
+        return _load_pkl(s, path)
     checksum = s[-28:-8]
     s = s[:-28]
     if hashlib.sha1(s).digest() != checksum:
-        raise ValueError("Pickle checksum does not match! path: "+path)
-    return pickle.loads(s)
+        raise ValueError("Pickle checksum does not match! path: "+path, 
+        " This file maybe corrupted, please consider remove it"
+        " and re-download.")
+    return _load_pkl(s, path)
 
 class _call_no_record_scope:
     def __enter__(self): pass
@@ -329,7 +354,7 @@ def full_like(x,val):
 def zeros_like(x):
     return zeros(x.shape,x.dtype)
 
-flags = core.flags()
+flags = core.Flags()
 
 def std(x):
     matsize=1
@@ -361,6 +386,11 @@ origin_transpose = transpose
 def transpose(x, *dim):
     if len(dim) == 1 and isinstance(dim[0], (Sequence, NanoVector)):
         dim = dim[0]
+    elif len(dim) == 2:
+        axes = list(range(x.ndim))
+        a, b = dim
+        axes[a], axes[b] = axes[b], axes[a]
+        dim = axes
     return origin_transpose(x, dim)
 transpose.__doc__ = origin_transpose.__doc__
 Var.transpose = Var.permute = permute = transpose
@@ -804,7 +834,35 @@ class Module:
         self.dfs([], None, callback, callback_leave)
         return _uniq(ps)
 
-    def state_dict(self):
+    def state_dict(self, to=None):
+        ''' Returns a dictionary containing 
+        Jittor Var of the module and its descendants.
+        
+        Args:
+            to: target type of var, canbe None or 'numpy' or 'torch'
+
+        Return:
+            dictionary of module's states.
+
+        Example::
+
+            import jittor as jt
+            from jittor.models import resnet50
+            jittor_model = resnet50()
+            dict = jittor_model.state_dict()
+            jittor_model.load_state_dict(dict)
+
+        Example2(export Jittor params to PyTorch)::
+
+            import jittor as jt
+            from jittor.models import resnet50
+            jittor_model = resnet50()
+            import torch
+            from torchvision.models import resnet50
+            torch_model = resnet50()
+            torch_model.load_state_dict(jittor_model.state_dict(to="torch"))
+
+        '''
         uniq_set = set()
         ps = {}
         stack = []
@@ -825,6 +883,15 @@ class Module:
         def callback_leave(parents, k, v, n):
             stack.pop()
         self.dfs([], None, callback, callback_leave)
+        if to == "numpy":
+            for k,v in ps.items():
+                if isinstance(v, Var):
+                    ps[k] = v.numpy()
+        elif to == "torch":
+            import torch
+            for k,v in ps.items():
+                if isinstance(v, Var):
+                    ps[k] = torch.Tensor(v.numpy())
         return ps
 
     def named_parameters(self):
@@ -1401,3 +1468,4 @@ from .misc import *
 from . import sparse
 from . import optim
 from . import dataset
+from . import init
diff --git a/python/jittor/__init__.pyi b/python/jittor/__init__.pyi
new file mode 100644
index 00000000..e1fe1aa7
--- /dev/null
+++ b/python/jittor/__init__.pyi
@@ -0,0 +1,7418 @@
+from jittor_core import *
+from jittor_core.ops import *
+from .misc import *
+from . import attention as attention, contrib as contrib, dataset as dataset, init as init, linalg as linalg, lr_scheduler as lr_scheduler, numpy2cupy as numpy2cupy, optim as optim, sparse as sparse
+from .compile_extern import cublas as cublas, cudnn as cudnn, curand as curand, mkl_ops as mkl_ops, mpi_ops as mpi_ops, world_size as world_size
+from .compiler import compile_custom_op as compile_custom_op, compile_custom_ops as compile_custom_ops
+from .contrib import concat as concat
+from .nn import matmul as matmul
+from collections import OrderedDict as OrderedDict
+from collections.abc import Mapping as Mapping
+from typing import Any
+
+
+def safepickle(obj, path) -> None: ...
+def safeunpickle(path): ...
+
+class _call_no_record_scope:
+    def __enter__(self) -> None: ...
+    def __exit__(self, *exc) -> None: ...
+    def __call__(self, func): ...
+
+class flag_scope(_call_no_record_scope):
+    jt_flags: Any
+    def __init__(self, **jt_flags) -> None: ...
+    def __enter__(self) -> None: ...
+    def __exit__(self, *exc) -> None: ...
+
+class no_grad(flag_scope):
+    jt_flags: Any
+    def __init__(self, **jt_flags) -> None: ...
+
+class enable_grad(flag_scope):
+    jt_flags: Any
+    def __init__(self, **jt_flags) -> None: ...
+
+single_log_capture: Any
+
+class log_capture_scope(_call_no_record_scope):
+    fs: Any
+    def __init__(self, **jt_flags) -> None: ...
+    logs: Any
+    def __enter__(self): ...
+    def __exit__(self, *exc) -> None: ...
+
+class profile_scope(_call_no_record_scope):
+    fs: Any
+    warmup: Any
+    rerun: Any
+    def __init__(self, warmup: int = ..., rerun: int = ..., **jt_flags) -> None: ...
+    report: Any
+    def __enter__(self): ...
+    def __exit__(self, *exc) -> None: ...
+
+class __single_process_scope:
+    rank: Any
+    def __init__(self, rank: int = ...) -> None: ...
+    bk_in_mpi: Any
+    bk_mpi_state: Any
+    def __enter__(self): ...
+    def __exit__(self, *exc) -> None: ...
+
+def single_process_scope(rank: int = ...): ...
+def clean() -> None: ...
+cast = unary
+
+def array(data, dtype: Any | None = ...): ...
+def array64(data, dtype: Any | None = ...): ...
+def grad(loss, targets): ...
+def liveness_info(): ...
+def ones(shape, dtype: str = ...): ...
+def ones_like(x): ...
+def zeros(shape, dtype: str = ...): ...
+def full(shape, val, dtype: str = ...): ...
+def full_like(x, val): ...
+def zeros_like(x): ...
+
+def std(x): ...
+def norm(x, p: int = ..., dim: int = ..., keepdim: bool = ..., eps: float = ...): ...
+origin_reshape = reshape
+
+def reshape(x, *shape): ...
+view = reshape
+origin_transpose = transpose
+
+def transpose(x, *dim): ...
+permute = transpose
+def flatten(input, start_dim: int = ..., end_dim: int = ...): ...
+def start_grad(x): ...
+def detach(x): ...
+def unsqueeze(x, dim): ...
+def squeeze(x, dim): ...
+def clamp(x, min_v: Any | None = ..., max_v: Any | None = ...): ...
+def type_as(a, b): ...
+def masked_fill(x, mask, value): ...
+def sqr(x): ...
+def pow(x, y): ...
+def argmax(x, dim, keepdims: bool = ...): ...
+def argmin(x, dim, keepdims: bool = ...): ...
+def randn(*size, dtype: str = ..., requires_grad: bool = ...) -> Var: ...
+def rand(*size, dtype: str = ..., requires_grad: bool = ...) -> Var: ...
+def rand_like(x, dtype: Any | None = ...) -> Var: ...
+def randn_like(x, dtype: Any | None = ...) -> Var: ...
+def randint(low, high: Any | None = ..., shape=..., dtype: str = ...) -> Var: ...
+def randint_like(x, low, high: Any | None = ...) -> Var: ...
+def normal(mean, std, size: Any | None = ..., dtype: str = ...) -> Var: ...
+def attrs(var): ...
+def fetch(*args) -> None: ...
+def display_memory_info() -> None: ...
+def load(path: str): ...
+def save(params_dict, path: str): ...
+
+class Module:
+    def __init__(self, *args, **kw) -> None: ...
+    def execute(self, *args, **kw) -> None: ...
+    def __call__(self, *args, **kw): ...
+    def __name__(self) -> None: ...
+    def dfs(self, parents, k, callback, callback_leave: Any | None = ...) -> None: ...
+    def parameters(self): ...
+    def state_dict(self, to: Any | None = ...): ...
+    def named_parameters(self): ...
+    def load_state_dict(self, params) -> None: ...
+    def modules(self): ...
+    def named_modules(self): ...
+    def requires_grad_(self, requires_grad: bool = ...): ...
+    def __hooked_call__(self, *args, **kw): ...
+    __fhook__: Any
+    def register_forward_hook(self, func) -> None: ...
+    def remove_forward_hook(self) -> None: ...
+    __fhook2__: Any
+    def register_pre_forward_hook(self, func) -> None: ...
+    def remove_pre_forward_hook(self) -> None: ...
+    __bihook__: Any
+    def register_input_backward_hook(self, func) -> None: ...
+    def remove_input_backward_hook(self) -> None: ...
+    __bohook__: Any
+    def register_output_backward_hook(self, func) -> None: ...
+    def remove_output_backward_hook(self) -> None: ...
+    def register_backward_hook(self, func): ...
+    def remove_backward_hook(self) -> None: ...
+    def children(self): ...
+    def extra_repr(self): ...
+    def apply(self, func) -> None: ...
+    def load_parameters(self, params) -> None: ...
+    def save(self, path: str): ...
+    def load(self, path: str): ...
+    backup_grad_state: Any
+    def eval(self) -> None: ...
+    def train(self) -> None: ...
+    is_train: bool
+    def is_training(self) -> bool: ...
+    def mpi_param_broadcast(self, root: int = ...) -> None: ...
+
+class Function(Module):
+    input_mask: Any
+    output_mask: Any
+    def __call__(self, *args): ...
+    def dfs(self, parents, k, callback, callback_leave: Any | None = ...) -> None: ...
+    @classmethod
+    def apply(cls, *args, **kw): ...
+
+class GradHooker(Function):
+    hook: Any
+    def __init__(self, hook) -> None: ...
+    def execute(self, *args): ...
+    def grad(self, *grad_input): ...
+
+def grad_hooker(args, hook): ...
+def register_hook(v, hook): ...
+def make_module(func, exec_n_args: int = ...): ...
+def dirty_fix_pytorch_runtime_error() -> None: ...
+
+class ExitHooks:
+    exit_code: Any
+    exception: Any
+    def __init__(self) -> None: ...
+    def hook(self) -> None: ...
+    def exit(self, code: int = ...) -> None: ...
+    def exc_handler(self, exc_type, exc, *args) -> None: ...
+
+hooks: Any
+
+def jittor_exit() -> None: ...
+def vtos(v): ...
+def size(v, dim: Any | None = ...): ...
+def to_int(v): ...
+def to_float(v): ...
+def to_bool(v): ...
+def format(v, spec): ...
+def get_len(var): ...
+
+def is_var(v): ...
+from typing import List, Tuple, Callable, overload
+import numpy
+def ternary(cond: Var, x: Var, y: Var)-> Var:
+ ...
+@overload
+def reindex(x: Var, shape: Tuple[int], indexes: List[str], overflow_value: float=0, overflow_conditions: List[str]={}, extras: List[Var]={})-> Var:
+	'''Document:
+	* 
+	    Reindex Operator is a one-to-many map operator.
+	    It performs equivalent Python-pseudo implementation below::
+	
+	        # input is x, output is y
+	        n = len(shape)-1
+	        m = len(x.shape)-1
+	        k = len(overflow_conditions)-1
+	        y = np.zeros(shape, x.dtype)
+	        for i0 in range(shape[0]): # 1-st loop
+	            for i1 in range(shape[1]): # 2-nd loop
+	                ...... # many loops
+	                for in in range(shape[n]) # n+1 -th loop
+	                    if is_overflow(i0,i1,...,in):
+	                        y[i0,i1,...,in] = overflow_value
+	                    else:
+	                        # indexes[i] is a c++ style integer expression consisting of i0,i1,...,in
+	                        y[i0,i1,...,in] = x[indexes[0],indexes[1],...,indexes[m]]
+	
+	        # is_overflow is defined as following
+	        def is_overflow(i0,i1,...,in):
+	            return (
+	                indexes[0] < 0 || indexes[0] >= x.shape[0] ||
+	                indexes[1] < 0 || indexes[1] >= x.shape[1] ||
+	                ......
+	                indexes[m] < 0 || indexes[m] >= x.shape[m] ||
+	
+	                # overflow_conditions[i] is a c++ style boolean expression consisting of i0,i1,...,in
+	                overflow_conditions[0] ||
+	                overflow_conditions[1] ||
+	                ......
+	                overflow_conditions[k]
+	            )
+	    ----------------
+	    * [in] x:	A input jittor Var
+		
+	    * [in] shape:	the output shape, a integer array
+		
+	    * [in] indexes:	array of c++ style integer expression, its length should be the same with the number of dimension of x, some buildin variables it can use are::
+	        
+	             XDIM, xshape0, ..., xshapen, xstride0, ..., xstriden
+	             YDIM, yshape0, ..., yshapem, ystride0, ..., ystridem
+	             i0, i1, ..., in
+	             @e0(...), @e1(...) for extras input index
+	             e0p, e1p , ... for extras input pointer
+				 
+	    * [in] overflow_value:	overflow value
+		
+	    * [in] overflow_conditions:	array of c++ style boolean expression, it length can be vary. the buildin variables it can use are the same with indexes
+			
+	    * [in] extras: extra var used for index
+		
+	    ----------------
+	    Example
+	    Convolution implemented by reindex operation::
+	
+	        def conv(x, w):
+	            N,H,W,C = x.shape
+	            Kh, Kw, _C, Kc = w.shape
+	            assert C==_C
+	            xx = x.reindex([N,H-Kh+1,W-Kw+1,Kh,Kw,C,Kc], [
+	                'i0', # Nid
+	                'i1+i3', # Hid+Khid
+	                'i2+i4', # Wid+KWid
+	                'i5', # Cid
+	            ])
+	            ww = w.broadcast_var(xx)
+	            yy = xx*ww
+	            y = yy.sum([3,4,5]) # Kh, Kw, C
+	            return y, yy'''
+	...
+@overload
+def reindex(x: Var, indexes: List[Var], overflow_value: float=0, overflow_conditions: List[str]={})-> Var:
+	'''Document:
+	* 
+	    Reindex Operator is a one-to-many map operator.
+	    It performs equivalent Python-pseudo implementation below::
+	
+	        # input is x, output is y
+	        n = len(shape)-1
+	        m = len(x.shape)-1
+	        k = len(overflow_conditions)-1
+	        y = np.zeros(shape, x.dtype)
+	        for i0 in range(shape[0]): # 1-st loop
+	            for i1 in range(shape[1]): # 2-nd loop
+	                ...... # many loops
+	                for in in range(shape[n]) # n+1 -th loop
+	                    if is_overflow(i0,i1,...,in):
+	                        y[i0,i1,...,in] = overflow_value
+	                    else:
+	                        # indexes[i] is a c++ style integer expression consisting of i0,i1,...,in
+	                        y[i0,i1,...,in] = x[indexes[0],indexes[1],...,indexes[m]]
+	
+	        # is_overflow is defined as following
+	        def is_overflow(i0,i1,...,in):
+	            return (
+	                indexes[0] < 0 || indexes[0] >= x.shape[0] ||
+	                indexes[1] < 0 || indexes[1] >= x.shape[1] ||
+	                ......
+	                indexes[m] < 0 || indexes[m] >= x.shape[m] ||
+	
+	                # overflow_conditions[i] is a c++ style boolean expression consisting of i0,i1,...,in
+	                overflow_conditions[0] ||
+	                overflow_conditions[1] ||
+	                ......
+	                overflow_conditions[k]
+	            )
+	    ----------------
+	    * [in] x:	A input jittor Var
+		
+	    * [in] shape:	the output shape, a integer array
+		
+	    * [in] indexes:	array of c++ style integer expression, its length should be the same with the number of dimension of x, some buildin variables it can use are::
+	        
+	             XDIM, xshape0, ..., xshapen, xstride0, ..., xstriden
+	             YDIM, yshape0, ..., yshapem, ystride0, ..., ystridem
+	             i0, i1, ..., in
+	             @e0(...), @e1(...) for extras input index
+	             e0p, e1p , ... for extras input pointer
+				 
+	    * [in] overflow_value:	overflow value
+		
+	    * [in] overflow_conditions:	array of c++ style boolean expression, it length can be vary. the buildin variables it can use are the same with indexes
+			
+	    * [in] extras: extra var used for index
+		
+	    ----------------
+	    Example
+	    Convolution implemented by reindex operation::
+	
+	        def conv(x, w):
+	            N,H,W,C = x.shape
+	            Kh, Kw, _C, Kc = w.shape
+	            assert C==_C
+	            xx = x.reindex([N,H-Kh+1,W-Kw+1,Kh,Kw,C,Kc], [
+	                'i0', # Nid
+	                'i1+i3', # Hid+Khid
+	                'i2+i4', # Wid+KWid
+	                'i5', # Cid
+	            ])
+	            ww = w.broadcast_var(xx)
+	            yy = xx*ww
+	            y = yy.sum([3,4,5]) # Kh, Kw, C
+	            return y, yy'''
+	...
+def reindex_var(x: Var, indexes: List[Var], overflow_value: float=0, overflow_conditions: List[str]={})-> Var:
+	'''Document:
+	* Alias x.reindex([i,j,k]) -> 
+	        x.reindex(i.shape, ['@e0(...)','@e1(...)','@e2(...)',], extras=[i,j,k])'''
+	...
+@overload
+def index(shape: Tuple[int], dim: int, dtype: str="int32")-> Var:
+	'''Document:
+	* 
+	    Index Operator generate index of shape.
+	    
+	    It performs equivalent Python-pseudo implementation below::
+	    
+	        n = len(shape)-1
+	        x = np.zeros(shape, dtype)
+	        for i0 in range(shape[0]): # 1-st loop
+	            for i1 in range(shape[1]): # 2-nd loop
+	                ...... # many loops
+	                for in in range(shape[n]) # n+1 -th loop
+	                    x[i0,i1,...,in] = i@dim
+	    
+	    * [in] shape:   the output shape, a integer array
+	    * [in] dim: the dim of the index.
+	    * [in] dtype:   the data type string, default int32
+	
+	    Example::
+	
+	        print(jt.index([2,2], 0)())
+	        # output: [[0,0],[1,1]]
+	        print(jt.index([2,2], 1)())
+	        # output: [[0,1],[0,1]]'''
+	...
+@overload
+def index(shape: Tuple[int], dtype: str="int32")-> List[Var]:
+	'''Document:
+	* 
+	    Index Operator generate index of shape.
+	    
+	    It performs equivalent Python-pseudo implementation below::
+	    
+	        n = len(shape)-1
+	        x = np.zeros(shape, dtype)
+	        for i0 in range(shape[0]): # 1-st loop
+	            for i1 in range(shape[1]): # 2-nd loop
+	                ...... # many loops
+	                for in in range(shape[n]) # n+1 -th loop
+	                    x[i0,i1,...,in] = i@dim
+	    
+	    * [in] shape:   the output shape, a integer array
+	    * [in] dim: the dim of the index.
+	    * [in] dtype:   the data type string, default int32
+	
+	    Example::
+	
+	        print(jt.index([2,2], 0)())
+	        # output: [[0,0],[1,1]]
+	        print(jt.index([2,2], 1)())
+	        # output: [[0,1],[0,1]]'''
+	...
+@overload
+def index(a: Var, dim: int, dtype: str="int32")-> Var:
+	'''Document:
+	* 
+	    Index Operator generate index of shape.
+	    
+	    It performs equivalent Python-pseudo implementation below::
+	    
+	        n = len(shape)-1
+	        x = np.zeros(shape, dtype)
+	        for i0 in range(shape[0]): # 1-st loop
+	            for i1 in range(shape[1]): # 2-nd loop
+	                ...... # many loops
+	                for in in range(shape[n]) # n+1 -th loop
+	                    x[i0,i1,...,in] = i@dim
+	    
+	    * [in] shape:   the output shape, a integer array
+	    * [in] dim: the dim of the index.
+	    * [in] dtype:   the data type string, default int32
+	
+	    Example::
+	
+	        print(jt.index([2,2], 0)())
+	        # output: [[0,0],[1,1]]
+	        print(jt.index([2,2], 1)())
+	        # output: [[0,1],[0,1]]'''
+	...
+@overload
+def index(a: Var, dtype: str="int32")-> List[Var]:
+	'''Document:
+	* 
+	    Index Operator generate index of shape.
+	    
+	    It performs equivalent Python-pseudo implementation below::
+	    
+	        n = len(shape)-1
+	        x = np.zeros(shape, dtype)
+	        for i0 in range(shape[0]): # 1-st loop
+	            for i1 in range(shape[1]): # 2-nd loop
+	                ...... # many loops
+	                for in in range(shape[n]) # n+1 -th loop
+	                    x[i0,i1,...,in] = i@dim
+	    
+	    * [in] shape:   the output shape, a integer array
+	    * [in] dim: the dim of the index.
+	    * [in] dtype:   the data type string, default int32
+	
+	    Example::
+	
+	        print(jt.index([2,2], 0)())
+	        # output: [[0,0],[1,1]]
+	        print(jt.index([2,2], 1)())
+	        # output: [[0,1],[0,1]]'''
+	...
+@overload
+def index_var(a: Var, dim: int, dtype: str="int32")-> Var:
+	'''Document:
+	* shape dependency version of index op
+	        jt.index_var(a, 1) similar with jt.index(a.shape, 1)'''
+	...
+@overload
+def index_var(a: Var, dtype: str="int32")-> List[Var]:
+	'''Document:
+	* shape dependency version of index op
+	        jt.index_var(a, 1) similar with jt.index(a.shape, 1)'''
+	...
+def binary(x: Var, y: Var, p: str)-> Var:
+ ...
+def pow(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Computes ``x^y``, element-wise. 
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def maximum(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns the element-wise maximum of ``x`` and ``y``. 
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def minimum(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns the element-wise minimum of ``x`` and ``y``. 
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def add(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Element-wise adds ``x`` and ``y`` and returns a new Var. 
+	    
+	    This operation is equivalent to ``x + y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def subtract(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Element-wise subtract ``y`` from ``x`` and returns a new Var.
+	
+	    This operation is equivalent to ``x - y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def multiply(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Element-wise muliplies ``x`` with ``y`` and returns a new Var.
+	
+	    This operation is equivalent to ``x * y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def divide(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Element-wise divide ``x`` by ``y`` and returns a new Var.
+	
+	    This operation is equivalent to ``x / y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.empty((3,), dtype=jt.int32)
+	        >>> a
+	        jt.Var([707406378 707406378 707406378], dtype=int32)
+	        >>> b = jt.empty((3,), dtype=jt.int32)
+	        >>> b
+	        jt.Var([674510453 171649398 538976288], dtype=int32)
+	        >>> jt.divide(a, b)
+	        jt.Var([1.0487701 4.1212287 1.3125001], dtype=float32)
+	        >>> a / b
+	        jt.Var([1.0487701 4.1212287 1.3125001], dtype=float32)
+	
+	    .. note ::
+	    returns float value even if the dtype of input Vars are both integers.
+	    @see jt.ops.floor_divide() for floor division.'''
+	...
+def floor_divide(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Element-wise divide ``x`` by ``y`` and returns the floor of the result.
+	
+	    This operation is equivalent to ``x // y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randint(1, 10, (3,), dtype=jt.int32)
+	        >>> a
+	        jt.Var([9 2 7], dtype=int32)
+	        >>> b = jt.randint(1, 10, (3,), dtype=jt.int32)
+	        >>> b
+	        jt.Var([6 4 6], dtype=int32)
+	        >>> jt.floor_divide(a, b)
+	        jt.Var([1 0 1], dtype=int32)
+	        >>> a // b
+	        jt.Var([1 0 1], dtype=int32)'''
+	...
+def mod(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns the element-wise remainder of division.
+	
+	    This operation is equivalent to ``x % y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(3)
+	        >>> a
+	        jt.Var([0.3989529  0.20159635 0.22973768], dtype=float32)
+	        >>> b = jt.rand(3)
+	        >>> b
+	        jt.Var([0.20121202 0.7704864  0.5654395 ], dtype=float32)
+	        >>> jt.mod(a, b)
+	        jt.Var([0.19774088 0.20159635 0.22973768], dtype=float32)
+	        >>> a % b
+	        jt.Var([0.19774088 0.20159635 0.22973768], dtype=float32)'''
+	...
+def less(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns ``x < y`` element-wise.
+	
+	    This operation is equivalent to ``x < y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def less_equal(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns ``x <= y`` element-wise.
+	
+	    This operation is equivalent to ``x <= y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def greater(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns ``x > y`` element-wise.
+	
+	    This operation is equivalent to ``x > y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def greater_equal(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns ``x >= y`` element-wise.
+	    
+	    This operation is equivalent to ``x >= y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def equal(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns ``x == y`` element-wise.
+	
+	    This operation is equivalent to ``x == y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def not_equal(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns ``x != y`` element-wise.
+	
+	    This operation is equivalent to ``x != y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var.
+	
+	    * [in] y: the second input, a python number or jt.Var.'''
+	...
+def left_shift(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Shifts the bits of ``x`` to the left by ``y``. 
+	
+	    Bits are shifted to the left by appending ``y`` 0s at the right of ``x``.
+	    This operation is equivalent to ``x << y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var (int32 or int64).
+	
+	    * [in] y: the second input, a python number or jt.Var (int32 or int64).
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randint(0, 10, shape=(3,))
+	        >>> a
+	        jt.Var([7 6 7], dtype=int32)
+	        >>> b = jt.randint(0, 10, shape=(3,))
+	        >>> b
+	        jt.Var([3 9 8], dtype=int32)
+	        >>> jt.left_shift(a, b)
+	        jt.Var([  56 3072 1792], dtype=int32)
+	        >>> a << b
+	        jt.Var([  56 3072 1792], dtype=int32)'''
+	...
+def right_shift(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Shifts the bits of ``x`` to the right by ``y``. 
+	
+	    This operation is equivalent to ``x >> y``.
+	
+	    ----------------
+	
+	    * [in] x: the first input,  a python number or jt.Var (int32 or int64).
+	
+	    * [in] y: the second input, a python number or jt.Var (int32 or int64).
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randint(0, 1024, shape=(3,))
+	        >>> a
+	        jt.Var([439 113  92], dtype=int32)
+	        >>> b = jt.randint(0, 10, shape=(3,))
+	        >>> b
+	        jt.Var([6 8 4], dtype=int32)
+	        >>> jt.right_shift(a, b)
+	        jt.Var([6 0 5], dtype=int32)'''
+	...
+def logical_and(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns the element-wise logical AND of the inputs. 
+	
+	    ----------------
+	
+	    * [in] x: the first input, jt.Var.
+	
+	    * [in] y: the second input, jt.Var.'''
+	...
+def logical_or(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns the element-wise logical OR of the inputs. 
+	
+	    ----------------
+	
+	    * [in] x: the first input, jt.Var.
+	
+	    * [in] y: the second input, jt.Var.'''
+	...
+def logical_xor(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Returns the element-wise logical XOR of the inputs. 
+	
+	    ----------------
+	
+	    * [in] x: the first input, jt.Var.
+	
+	    * [in] y: the second input, jt.Var.'''
+	...
+def bitwise_and(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Computes the bitwise AND of x and y.
+	
+	    ----------------
+	
+	    * [in] x: the first input, jt.Var (integal or boolean).
+	
+	    * [in] y: the second input, jt.Var (integal or boolean).'''
+	...
+def bitwise_or(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Computes the bitwise OR of x and y.
+	
+	    ----------------
+	
+	    * [in] x: the first input, jt.Var (integal or boolean).
+	
+	    * [in] y: the second input, jt.Var (integal or boolean).'''
+	...
+def bitwise_xor(x: Var, y: Var)-> Var:
+	'''Document:
+	*
+	    Computes the bitwise XOR of x and y.
+	
+	    ----------------
+	
+	    * [in] x: the first input, jt.Var (integal or boolean).
+	
+	    * [in] y: the second input, jt.Var (integal or boolean).'''
+	...
+def tape(x: Var)-> Var:
+ ...
+def where(cond: Var, dtype: str="int32")-> List[Var]:
+	'''Document:
+	*
+	    Where Operator generate index of true condition.
+	
+	    * [in] cond:    condition for index generation
+	
+	    * [in] dtype:   type of return indexes
+	    
+	    * [out] out:  return an array of indexes, same length with number of dims of cond 
+	    
+	    Example::
+	
+	        jt.where([[0,0,1],[1,0,0]])
+	        # return ( [0,2], [1,0] )'''
+	...
+def argsort(x: Var, dim: int=-1, descending: bool=False, dtype: str="int32")-> List[Var]:
+	'''Document:
+	* 
+	    Argsort Operator Perform an indirect sort by given key or compare function.
+	
+	    x is input, y is output index, satisfy:
+	
+	        x[y[0]] <= x[y[1]] <= x[y[2]] <= ... <= x[y[n]]
+	
+	    or
+	
+	        key(y[0]) <= key(y[1]) <= key(y[2]) <= ... <= key(y[n])
+	
+	    or
+	
+	        compare(y[0], y[1]) && compare(y[1], y[2]) && ...
+	
+	    * [in] x: input var for sort
+	
+	    * [in] dim: sort alone which dim
+	
+	    * [in] descending:  the elements are sorted in descending order or not(default False).
+	
+	    * [in] dtype: type of return indexes
+	
+	    * [out] index: index have the same size with sorted dim
+	
+	    * [out] value: sorted value
+	
+	    
+	    Example::
+	
+	            index, value = jt.argsort([11,13,12])
+	            # return [0 2 1], [11 12 13]
+	            index, value = jt.argsort([11,13,12], descending=True)
+	            # return [1 2 0], [13 12 11]
+	            index, value = jt.argsort([[11,13,12], [12,11,13]])
+	            # return [[0 2 1],[1 0 2]],  [[11 12 13],[11 12 13]]
+	            index, value = jt.argsort([[11,13,12], [12,11,13]], dim=0)
+	            # return [[0 1 0],[1 0 1]],  [[11 11 12],[12 13 13]]'''
+	...
+def fetch(inputs: List[Var], func: Callable)-> Var:
+ ...
+def arg_reduce(x: Var, op: str, dim: int, keepdims: bool)-> List[Var]:
+	'''Document:
+	*
+	    Returns the indices of the maximum / minimum of the input across a dimension.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] op:      "max" or "min". 
+	
+	    * [in] dim:     int. Specifies which dimension to be reduced.
+	
+	    * [in] keepdim: bool. Whether the output has ``dim`` retained or not.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(0, 10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 2 5]
+	         [6 7 1]], dtype=int32)
+	        >>> jt.arg_reduce(x, 'max', dim=1, keepdims=False)
+	        [jt.Var([2 1], dtype=int32), jt.Var([5 7], dtype=int32)]
+	        >>> jt.arg_reduce(x, 'min', dim=1, keepdims=False)
+	        [jt.Var([1 2], dtype=int32), jt.Var([5 7], dtype=int32)]'''
+	...
+def random(shape: Tuple[int], dtype: str="float32", type: str="uniform")-> Var:
+ ...
+@overload
+def reduce(x: Var, op: str, dim: int, keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce(x: Var, op: str, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+ ...
+@overload
+def max(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the maximum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.max(x)
+	        jt.Var([4], dtype=int32)
+	        >>> x.max()
+	        jt.Var([4], dtype=int32)
+	        >>> x.max(dim=1)
+	        jt.Var([4 4], dtype=int32)
+	        >>> x.max(dim=1, keepdims=True)
+	        jt.Var([[4]
+	         [4]], dtype=int32)'''
+	...
+@overload
+def max(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the maximum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.max(x)
+	        jt.Var([4], dtype=int32)
+	        >>> x.max()
+	        jt.Var([4], dtype=int32)
+	        >>> x.max(dim=1)
+	        jt.Var([4 4], dtype=int32)
+	        >>> x.max(dim=1, keepdims=True)
+	        jt.Var([[4]
+	         [4]], dtype=int32)'''
+	...
+@overload
+def max(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the maximum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.max(x)
+	        jt.Var([4], dtype=int32)
+	        >>> x.max()
+	        jt.Var([4], dtype=int32)
+	        >>> x.max(dim=1)
+	        jt.Var([4 4], dtype=int32)
+	        >>> x.max(dim=1, keepdims=True)
+	        jt.Var([[4]
+	         [4]], dtype=int32)'''
+	...
+@overload
+def reduce_maximum(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the maximum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.max(x)
+	        jt.Var([4], dtype=int32)
+	        >>> x.max()
+	        jt.Var([4], dtype=int32)
+	        >>> x.max(dim=1)
+	        jt.Var([4 4], dtype=int32)
+	        >>> x.max(dim=1, keepdims=True)
+	        jt.Var([[4]
+	         [4]], dtype=int32)'''
+	...
+@overload
+def reduce_maximum(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the maximum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.max(x)
+	        jt.Var([4], dtype=int32)
+	        >>> x.max()
+	        jt.Var([4], dtype=int32)
+	        >>> x.max(dim=1)
+	        jt.Var([4 4], dtype=int32)
+	        >>> x.max(dim=1, keepdims=True)
+	        jt.Var([[4]
+	         [4]], dtype=int32)'''
+	...
+@overload
+def reduce_maximum(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the maximum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.max(x)
+	        jt.Var([4], dtype=int32)
+	        >>> x.max()
+	        jt.Var([4], dtype=int32)
+	        >>> x.max(dim=1)
+	        jt.Var([4 4], dtype=int32)
+	        >>> x.max(dim=1, keepdims=True)
+	        jt.Var([[4]
+	         [4]], dtype=int32)'''
+	...
+@overload
+def min(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the minimum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.min(x)
+	        jt.Var([0], dtype=int32)
+	        >>> x.min()
+	        jt.Var([0], dtype=int32)
+	        >>> x.min(dim=1)
+	        jt.Var([1 0], dtype=int32)
+	        >>> x.min(dim=1, keepdims=True)
+	        jt.Var([[1]
+	         [0]], dtype=int32)'''
+	...
+@overload
+def min(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the minimum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.min(x)
+	        jt.Var([0], dtype=int32)
+	        >>> x.min()
+	        jt.Var([0], dtype=int32)
+	        >>> x.min(dim=1)
+	        jt.Var([1 0], dtype=int32)
+	        >>> x.min(dim=1, keepdims=True)
+	        jt.Var([[1]
+	         [0]], dtype=int32)'''
+	...
+@overload
+def min(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the minimum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.min(x)
+	        jt.Var([0], dtype=int32)
+	        >>> x.min()
+	        jt.Var([0], dtype=int32)
+	        >>> x.min(dim=1)
+	        jt.Var([1 0], dtype=int32)
+	        >>> x.min(dim=1, keepdims=True)
+	        jt.Var([[1]
+	         [0]], dtype=int32)'''
+	...
+@overload
+def reduce_minimum(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the minimum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.min(x)
+	        jt.Var([0], dtype=int32)
+	        >>> x.min()
+	        jt.Var([0], dtype=int32)
+	        >>> x.min(dim=1)
+	        jt.Var([1 0], dtype=int32)
+	        >>> x.min(dim=1, keepdims=True)
+	        jt.Var([[1]
+	         [0]], dtype=int32)'''
+	...
+@overload
+def reduce_minimum(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the minimum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.min(x)
+	        jt.Var([0], dtype=int32)
+	        >>> x.min()
+	        jt.Var([0], dtype=int32)
+	        >>> x.min(dim=1)
+	        jt.Var([1 0], dtype=int32)
+	        >>> x.min(dim=1, keepdims=True)
+	        jt.Var([[1]
+	         [0]], dtype=int32)'''
+	...
+@overload
+def reduce_minimum(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the minimum elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.min(x)
+	        jt.Var([0], dtype=int32)
+	        >>> x.min()
+	        jt.Var([0], dtype=int32)
+	        >>> x.min(dim=1)
+	        jt.Var([1 0], dtype=int32)
+	        >>> x.min(dim=1, keepdims=True)
+	        jt.Var([[1]
+	         [0]], dtype=int32)'''
+	...
+@overload
+def sum(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the sum of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.sum(x)
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum()
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum(dim=1)
+	        jt.Var([7 6], dtype=int32)
+	        >>> x.sum(dim=1, keepdims=True)
+	        jt.Var([[7]
+	         [6]], dtype=int32)'''
+	...
+@overload
+def sum(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the sum of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.sum(x)
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum()
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum(dim=1)
+	        jt.Var([7 6], dtype=int32)
+	        >>> x.sum(dim=1, keepdims=True)
+	        jt.Var([[7]
+	         [6]], dtype=int32)'''
+	...
+@overload
+def sum(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the sum of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.sum(x)
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum()
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum(dim=1)
+	        jt.Var([7 6], dtype=int32)
+	        >>> x.sum(dim=1, keepdims=True)
+	        jt.Var([[7]
+	         [6]], dtype=int32)'''
+	...
+@overload
+def reduce_add(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the sum of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.sum(x)
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum()
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum(dim=1)
+	        jt.Var([7 6], dtype=int32)
+	        >>> x.sum(dim=1, keepdims=True)
+	        jt.Var([[7]
+	         [6]], dtype=int32)'''
+	...
+@overload
+def reduce_add(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the sum of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.sum(x)
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum()
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum(dim=1)
+	        jt.Var([7 6], dtype=int32)
+	        >>> x.sum(dim=1, keepdims=True)
+	        jt.Var([[7]
+	         [6]], dtype=int32)'''
+	...
+@overload
+def reduce_add(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the sum of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[4 1 2]
+	         [0 2 4]], dtype=int32)
+	        >>> jt.sum(x)
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum()
+	        jt.Var([13], dtype=int32)
+	        >>> x.sum(dim=1)
+	        jt.Var([7 6], dtype=int32)
+	        >>> x.sum(dim=1, keepdims=True)
+	        jt.Var([[7]
+	         [6]], dtype=int32)'''
+	...
+@overload
+def prod(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def prod(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def prod(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def product(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def product(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def product(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def reduce_multiply(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def reduce_multiply(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def reduce_multiply(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the product of all the elements in the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[7 5 5]
+	         [5 7 5]], dtype=int32)
+	        >>> jt.prod(x)
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod()
+	        jt.Var([30625], dtype=int32)
+	        >>> x.prod(dim=1)
+	        jt.Var([175 175], dtype=int32)
+	        >>> x.prod(dim=1, keepdims=True)
+	        jt.Var([[175]
+	         [175]], dtype=int32)'''
+	...
+@overload
+def reduce_logical_and(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Tests if all elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 1 1]
+	         [0 1 0]], dtype=int32)
+	        >>> jt.all_(x)
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_()
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.all_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def reduce_logical_and(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Tests if all elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 1 1]
+	         [0 1 0]], dtype=int32)
+	        >>> jt.all_(x)
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_()
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.all_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def reduce_logical_and(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Tests if all elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 1 1]
+	         [0 1 0]], dtype=int32)
+	        >>> jt.all_(x)
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_()
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.all_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def all_(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Tests if all elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 1 1]
+	         [0 1 0]], dtype=int32)
+	        >>> jt.all_(x)
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_()
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.all_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def all_(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Tests if all elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 1 1]
+	         [0 1 0]], dtype=int32)
+	        >>> jt.all_(x)
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_()
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.all_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def all_(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Tests if all elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 1 1]
+	         [0 1 0]], dtype=int32)
+	        >>> jt.all_(x)
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_()
+	        jt.Var([False], dtype=int32)
+	        >>> x.all_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.all_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def reduce_logical_or(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Tests if any elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 0 1]
+	         [0 0 0]], dtype=int32)
+	        >>> jt.any_(x)
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_()
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.any_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def reduce_logical_or(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Tests if any elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 0 1]
+	         [0 0 0]], dtype=int32)
+	        >>> jt.any_(x)
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_()
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.any_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def reduce_logical_or(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Tests if any elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 0 1]
+	         [0 0 0]], dtype=int32)
+	        >>> jt.any_(x)
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_()
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.any_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def any_(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Tests if any elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 0 1]
+	         [0 0 0]], dtype=int32)
+	        >>> jt.any_(x)
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_()
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.any_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def any_(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Tests if any elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 0 1]
+	         [0 0 0]], dtype=int32)
+	        >>> jt.any_(x)
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_()
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.any_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def any_(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Tests if any elements in input evaluate to True.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(2, shape=(2, 3))
+	        >>> x
+	        jt.Var([[1 0 1]
+	         [0 0 0]], dtype=int32)
+	        >>> jt.any_(x)
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_()
+	        jt.Var([True], dtype=int32)
+	        >>> x.any_(dim=1)
+	        jt.Var([True False], dtype=int32)
+	        >>> x.any_(dim=1, keepdims=True)
+	        jt.Var([[True]
+	         [False]], dtype=int32)'''
+	...
+@overload
+def reduce_logical_xor(x: Var, dim: int, keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce_logical_xor(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce_logical_xor(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+ ...
+@overload
+def reduce_bitwise_and(x: Var, dim: int, keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce_bitwise_and(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce_bitwise_and(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+ ...
+@overload
+def reduce_bitwise_or(x: Var, dim: int, keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce_bitwise_or(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce_bitwise_or(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+ ...
+@overload
+def reduce_bitwise_xor(x: Var, dim: int, keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce_bitwise_xor(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+ ...
+@overload
+def reduce_bitwise_xor(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+ ...
+@overload
+def mean(x: Var, dim: int, keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the mean value of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[9 4 4]
+	         [1 9 6]], dtype=int32)
+	        >>> jt.mean(x)
+	        jt.Var([5.5000005], dtype=float32)
+	        >>> x.mean()
+	        jt.Var([5.5000005], dtype=float32)
+	        >>> x.mean(dim=1)
+	        jt.Var([5.666667  5.3333335], dtype=float32)
+	        >>> x.mean(dim=1, keepdims=True)
+	        jt.Var([[5.666667 ]
+	         [5.3333335]], dtype=float32)'''
+	...
+@overload
+def mean(x: Var, dims: Tuple[int]=(), keepdims: bool=False)-> Var:
+	'''Document:
+	*
+	    Returns the mean value of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[9 4 4]
+	         [1 9 6]], dtype=int32)
+	        >>> jt.mean(x)
+	        jt.Var([5.5000005], dtype=float32)
+	        >>> x.mean()
+	        jt.Var([5.5000005], dtype=float32)
+	        >>> x.mean(dim=1)
+	        jt.Var([5.666667  5.3333335], dtype=float32)
+	        >>> x.mean(dim=1, keepdims=True)
+	        jt.Var([[5.666667 ]
+	         [5.3333335]], dtype=float32)'''
+	...
+@overload
+def mean(x: Var, dims_mask: int, keepdims_mask: int)-> Var:
+	'''Document:
+	*
+	    Returns the mean value of the input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+	
+	    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(10, shape=(2, 3))
+	        >>> x
+	        jt.Var([[9 4 4]
+	         [1 9 6]], dtype=int32)
+	        >>> jt.mean(x)
+	        jt.Var([5.5000005], dtype=float32)
+	        >>> x.mean()
+	        jt.Var([5.5000005], dtype=float32)
+	        >>> x.mean(dim=1)
+	        jt.Var([5.666667  5.3333335], dtype=float32)
+	        >>> x.mean(dim=1, keepdims=True)
+	        jt.Var([[5.666667 ]
+	         [5.3333335]], dtype=float32)'''
+	...
+def clone(x: Var)-> Var:
+ ...
+def unary(x: Var, op: str)-> Var:
+ ...
+def cast(x: Var, op: str)-> Var:
+ ...
+def int8(x: Var)-> Var:
+ ...
+def int16(x: Var)-> Var:
+ ...
+def int32(x: Var)-> Var:
+ ...
+def int64(x: Var)-> Var:
+ ...
+def uint8(x: Var)-> Var:
+ ...
+def uint16(x: Var)-> Var:
+ ...
+def uint32(x: Var)-> Var:
+ ...
+def uint64(x: Var)-> Var:
+ ...
+def float32(x: Var)-> Var:
+ ...
+def float64(x: Var)-> Var:
+ ...
+def abs(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the absolute value of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x:   the input jt.Var
+	
+	    ----------------
+	    
+	    Example-1::
+	        >>> jt.abs(jt.float32([-1, 0, 1]))
+	        jt.Var([1. 0. 1.], dtype=float32)'''
+	...
+def negative(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the negative value of the input ``x``. 
+	
+	    This operator is equavilant to ``-x``.
+	
+	    ----------------
+	
+	    * [in] x:   the input jt.Var.
+	
+	    ----------------
+	    
+	    Example-1::
+	        >>> jt.negative(jt.float32([-1, 0, 1]))
+	        jt.Var([ 1. -0. -1.], dtype=float32)'''
+	...
+def logical_not(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the logical NOT of the input ``x``. 
+	     
+	    ----------------
+	
+	    * [in] x: the input jt.Var, integal or boolean.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> jt.logical_not(jt.int32([-1, 0, 1]))
+	        jt.Var([False  True False], dtype=bool)'''
+	...
+def bitwise_not(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the bitwise NOT of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var, integal or boolean.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> jt.bitwise_not(jt.int32([1, 2, -3]))
+	        jt.Var([-2 -3  2], dtype=int32)'''
+	...
+def log(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the natural logarithm of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) * 2
+	        >>> a
+	        jt.Var([0.02863695 1.30122    1.6048753  1.140261  ], dtype=float32)
+	        >>> jt.log(a)
+	        jt.Var([-3.5530574   0.26330233  0.47304606  0.13125724], dtype=float32)'''
+	...
+def exp(x: Var)-> Var:
+	'''Document:
+	*
+	     Returns the exponential of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) * 2
+	        >>> a
+	        jt.Var([1.9841381 1.4103996 0.5855549 1.4212812], dtype=float32)
+	        >>> jt.exp(a)
+	        jt.Var([7.2727766 4.0975924 1.7959872 4.1424246], dtype=float32)'''
+	...
+def sqrt(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the square root of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) * 2
+	        >>> a
+	        jt.Var([0.81957287 0.5609612  0.07435933 1.7571875 ], dtype=float32)
+	        >>> jt.sqrt(a)
+	        jt.Var([0.90530264 0.7489734  0.27268907 1.3255895 ], dtype=float32)'''
+	...
+def round(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the closest integer of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 2.101595    0.33055413 -0.44147047 -0.7720668 ], dtype=float32)
+	        >>> jt.round(a)
+	        jt.Var([ 2.0  0.0  0.0 -1.0], dtype=float32)'''
+	...
+def floor(x: Var)-> Var:
+	'''Document:
+	*
+	     Returns the largest integer less than or equal to the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
+	        >>> jt.floor(a)
+	        jt.Var([-2.0 -1.0 -1.0 -1.0], dtype=float32)'''
+	...
+def ceil(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the smallest integer greater than or equal to the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
+	        >>> jt.ceil(a)
+	        jt.Var([-1.0  0.0  0.0  0.0], dtype=float32)'''
+	...
+def round_int(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the closest integer of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 2.101595    0.33055413 -0.44147047 -0.7720668 ], dtype=float32)
+	        >>> jt.round_int(a)
+	        jt.Var([ 2  0  0 -1], dtype=int32)'''
+	...
+def floor_int(x: Var)-> Var:
+	'''Document:
+	*
+	     Returns the largest integer less than or equal to the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
+	        >>> jt.floor_int(a)
+	        jt.Var([-2 -1 -1 -1], dtype=int32)'''
+	...
+def ceil_int(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the smallest integer greater than or equal to the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
+	        >>> jt.ceil_int(a)
+	        jt.Var([-1  0  0  0], dtype=int32)'''
+	...
+def sin(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the sine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+	        >>> jt.sin(a)
+	        jt.Var([ 0.32303742 -0.6527857  -0.76586854  0.9738172 ], dtype=float32)'''
+	...
+def asin(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the arcsine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.09342023 -0.42522037  0.9264933  -0.785264  ], dtype=float32)
+	        >>> jt.asin(a)
+	        jt.Var([ 0.09355665 -0.43920535  1.1849847  -0.9031224 ], dtype=float32)'''
+	...
+def arcsin(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the arcsine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.09342023 -0.42522037  0.9264933  -0.785264  ], dtype=float32)
+	        >>> jt.asin(a)
+	        jt.Var([ 0.09355665 -0.43920535  1.1849847  -0.9031224 ], dtype=float32)'''
+	...
+def sinh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the hyperbolic sine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+	        >>> jt.sinh(a)
+	        jt.Var([ 0.3349012  -0.77276015 -0.9873369   2.9425898 ], dtype=float32)'''
+	...
+def asinh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse hyperbolic sine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-1.9749726  -0.52341473  0.8906148   1.0338128 ], dtype=float32)
+	        >>> jt.asinh(a)
+	        jt.Var([-1.4323865  -0.5020559   0.8018747   0.90508187], dtype=float32)'''
+	...
+def arcsinh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse hyperbolic sine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-1.9749726  -0.52341473  0.8906148   1.0338128 ], dtype=float32)
+	        >>> jt.asinh(a)
+	        jt.Var([-1.4323865  -0.5020559   0.8018747   0.90508187], dtype=float32)'''
+	...
+def tan(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the tangent of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+	        >>> jt.tan(a)
+	        jt.Var([ 0.34133783 -0.8617148  -1.1910915  -4.283673  ], dtype=float32)'''
+	...
+def atan(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse tangent of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-0.85885596  1.187804    0.47249675  0.95933187], dtype=float32)
+	        >>> jt.atan(a)
+	        jt.Var([-0.70961297  0.87102956  0.44140393  0.76464504], dtype=float32)'''
+	...
+def arctan(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse tangent of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-0.85885596  1.187804    0.47249675  0.95933187], dtype=float32)
+	        >>> jt.atan(a)
+	        jt.Var([-0.70961297  0.87102956  0.44140393  0.76464504], dtype=float32)'''
+	...
+def tanh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the hyperbolic tangent of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	    
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([-0.85885596  1.187804    0.47249675  0.95933187], dtype=float32)
+	        >>> jt.tanh(a)
+	        jt.Var([-0.6956678   0.82989657  0.4402144   0.7439787 ], dtype=float32)'''
+	...
+def atanh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse hyperbolic tangent of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) * 2 - 1
+	        >>> a
+	        jt.Var([ 0.9062414  -0.799802   -0.27219176 -0.7274077 ], dtype=float32)
+	        >>> jt.atanh(a)
+	        jt.Var([ 1.5060828  -1.0980625  -0.27922946 -0.9231999 ], dtype=float32)'''
+	...
+def arctanh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse hyperbolic tangent of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) * 2 - 1
+	        >>> a
+	        jt.Var([ 0.9062414  -0.799802   -0.27219176 -0.7274077 ], dtype=float32)
+	        >>> jt.atanh(a)
+	        jt.Var([ 1.5060828  -1.0980625  -0.27922946 -0.9231999 ], dtype=float32)'''
+	...
+def cos(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the cosine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+	        >>> jt.cos(a)
+	        jt.Var([ 0.9463862  0.7575426  0.6429972 -0.2273323], dtype=float32)'''
+	...
+def acos(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse cosine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) * 2 - 1
+	        >>> a
+	        jt.Var([ 0.5876564  0.740723  -0.667666   0.5371753], dtype=float32)
+	        >>> jt.acos(a)
+	        jt.Var([0.9426371 0.7366504 2.3018656 1.0037117], dtype=float32)'''
+	...
+def arccos(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse cosine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) * 2 - 1
+	        >>> a
+	        jt.Var([ 0.5876564  0.740723  -0.667666   0.5371753], dtype=float32)
+	        >>> jt.acos(a)
+	        jt.Var([0.9426371 0.7366504 2.3018656 1.0037117], dtype=float32)'''
+	...
+def cosh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the hyperbolic cosine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+	        >>> jt.cosh(a)
+	        jt.Var([1.0545894 1.2637873 1.405288  3.1078668], dtype=float32)'''
+	...
+def acosh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse hyperbolic cosine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) + 1
+	        >>> a
+	        jt.Var([1.3609099 1.8137748 1.1146184 1.3911307], dtype=float32)
+	        >>> jt.acosh(a)
+	        jt.Var([0.8259237  1.2020639  0.47432774 0.8579033 ], dtype=float32)'''
+	...
+def arccosh(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the inverse hyperbolic cosine of the input ``x``. 
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.rand(4) + 1
+	        >>> a
+	        jt.Var([1.3609099 1.8137748 1.1146184 1.3911307], dtype=float32)
+	        >>> jt.acosh(a)
+	        jt.Var([0.8259237  1.2020639  0.47432774 0.8579033 ], dtype=float32)'''
+	...
+def sigmoid(x: Var)-> Var:
+	'''Document:
+	*
+	    Returns the sigmoid of the input ``x``. 
+	    
+	    .. math::
+	       out_i = \frac{1}{1 + e^{x_i}}
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.49443012  0.4305426  -1.0364404  -1.2628382 ], dtype=float32)
+	        >>> jt.sigmoid(a)
+	        jt.Var([0.62114954 0.6060032  0.2618374  0.2204857 ], dtype=float32)'''
+	...
+def erf(x: Var)-> Var:
+	'''Document:
+	*
+	    Computes the error function of each element. The error function is defined as follows:
+	
+	    .. math::
+	        erf(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2} dt
+	
+	    ----------------
+	
+	    * [in] x: the input jt.Var.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randn(4)
+	        >>> a
+	        jt.Var([ 0.49443012  0.4305426  -1.0364404  -1.2628382 ], dtype=float32)
+	        >>> jt.erf(a)
+	        jt.Var([ 0.51559156  0.45739546 -0.85728306 -0.9258883 ], dtype=float32)'''
+	...
+def transpose(x: Var, axes: Tuple[int]=())-> Var:
+ ...
+def fuse_transpose(x: Var, axes: Tuple[int]=())-> Var:
+ ...
+def safe_clip(x: Var, left: float, right: float)-> Var:
+	'''Document:
+	* Safe clip value to a range, and keep 
+	 the gradient pass thought.
+	 
+	    * [in] x:   input value
+	    * [in] left: float64 clip min value.
+	    * [in] right: float64 clip max value.'''
+	...
+def array_(args: numpy.ndarray)-> Var:
+ ...
+def array(obj: float | int | numpy.ndarray | Var)-> Var:
+ ...
+def getitem(x: Var, slices: slice)-> Var:
+ ...
+def candidate(x: Var, fail_cond: str, dtype: str="int32")-> Var:
+	'''Document:
+	*
+	    Candidate Operator Perform an indirect candidate filter by given a fail condition.
+	    
+	    x is input, y is output index, satisfy::
+	
+	        not fail_cond(y[0], y[1]) and
+	        not fail_cond(y[0], y[2]) and not fail_cond(y[1], y[2]) and
+	        ...
+	        ... and not fail_cond(y[m-2], y[m-1])
+	
+	    Where m is number of selected candidates.
+	
+	    Pseudo code::
+	    
+	        y = []
+	        for i in range(n):
+	            pass = True
+	            for j in y:
+	                if (@fail_cond):
+	                    pass = false
+	                    break
+	            if (pass):
+	                y.append(i)
+	        return y
+	
+	    * [in] x:   input var for filter
+	
+	    * [in] fail_cond:   code for fail condition
+	
+	    * [in] dtype:   type of return indexes
+	
+	    * [out] index: .
+	
+	    Example::
+	
+	        jt.candidate(jt.random(100,2), '(@x(j,0)>@x(i,0))or(@x(j,1)>@x(i,1))')
+	        # return y satisfy:
+	        #    x[y[0], 0] <= x[y[1], 0] and x[y[1], 0] <= x[y[2], 0] and ... and x[y[m-2], 0] <= x[y[m-1], 0] and
+	        #    x[y[0], 1] <= x[y[1], 1] and x[y[1], 1] <= x[y[2], 1] and ... and x[y[m-2], 1] <= x[y[m-1], 1]'''
+	...
+@overload
+def numpy_code(shape: Tuple[int], dtype: str, inputs: List[Var], forward: Callable, backward: List[Callable])-> Var:
+	'''Document:
+	*
+	    Numpy Code Operator for easily customized op.
+	
+	    ----------------
+	
+	    * [in] shape:    the output shape, a integer array
+	    
+	    * [in] dtype:    the output data type
+	    
+	    * [in] inputs:   A list of input jittor Vars
+	
+	    * [in] forward:  function, represents forward python function
+	
+	    * [in] backward: A list of function, represents gradiant for each input
+	
+	    ----------------
+	    
+	    Example-1::
+	
+	        def forward_code(np, data):
+	            a = data["inputs"][0]
+	            b = data["outputs"][0]
+	            np.add(a,a,out=b)
+	
+	        def backward_code(np, data):
+	            dout = data["dout"]
+	            out = data["outputs"][0]
+	            np.copyto(out, dout*2.0)
+	
+	        a = jt.random((5,1))
+	        b = jt.numpy_code(
+	            a.shape,
+	            a.dtype,
+	            [a],
+	            forward_code,
+	            [backward_code],
+	        )
+	
+	    Example-2::
+	    
+	        def forward_code(np, data):
+	            a,b = data["inputs"]
+	            c,d = data["outputs"]
+	            np.add(a,b,out=c)
+	            np.subtract(a,b,out=d)
+	
+	        def backward_code1(np, data):
+	            dout = data["dout"]
+	            out = data["outputs"][0]
+	            np.copyto(out, dout)
+	
+	        def backward_code2(np, data):
+	            dout = data["dout"]
+	            out_index = data["out_index"]
+	            out = data["outputs"][0]
+	            if out_index==0:
+	                np.copyto(out, dout)
+	            else:
+	                np.negative(dout, out)
+	
+	        a = jt.random((5,1))
+	        b = jt.random((5,1))
+	        c, d = jt.numpy_code(
+	            [a.shape, a.shape],
+	            [a.dtype, a.dtype],
+	            [a, b],
+	            forward_code,
+	            [backward_code1,backward_code2],
+	        )'''
+	...
+@overload
+def numpy_code(shapes: List[Tuple[int]], dtypes: List[str], inputs: List[Var], forward: Callable, backward: List[Callable])-> List[Var]:
+	'''Document:
+	*
+	    Numpy Code Operator for easily customized op.
+	
+	    ----------------
+	
+	    * [in] shape:    the output shape, a integer array
+	    
+	    * [in] dtype:    the output data type
+	    
+	    * [in] inputs:   A list of input jittor Vars
+	
+	    * [in] forward:  function, represents forward python function
+	
+	    * [in] backward: A list of function, represents gradiant for each input
+	
+	    ----------------
+	    
+	    Example-1::
+	
+	        def forward_code(np, data):
+	            a = data["inputs"][0]
+	            b = data["outputs"][0]
+	            np.add(a,a,out=b)
+	
+	        def backward_code(np, data):
+	            dout = data["dout"]
+	            out = data["outputs"][0]
+	            np.copyto(out, dout*2.0)
+	
+	        a = jt.random((5,1))
+	        b = jt.numpy_code(
+	            a.shape,
+	            a.dtype,
+	            [a],
+	            forward_code,
+	            [backward_code],
+	        )
+	
+	    Example-2::
+	    
+	        def forward_code(np, data):
+	            a,b = data["inputs"]
+	            c,d = data["outputs"]
+	            np.add(a,b,out=c)
+	            np.subtract(a,b,out=d)
+	
+	        def backward_code1(np, data):
+	            dout = data["dout"]
+	            out = data["outputs"][0]
+	            np.copyto(out, dout)
+	
+	        def backward_code2(np, data):
+	            dout = data["dout"]
+	            out_index = data["out_index"]
+	            out = data["outputs"][0]
+	            if out_index==0:
+	                np.copyto(out, dout)
+	            else:
+	                np.negative(dout, out)
+	
+	        a = jt.random((5,1))
+	        b = jt.random((5,1))
+	        c, d = jt.numpy_code(
+	            [a.shape, a.shape],
+	            [a.dtype, a.dtype],
+	            [a, b],
+	            forward_code,
+	            [backward_code1,backward_code2],
+	        )'''
+	...
+@overload
+def numpy_code(shape: Tuple[int], dtype: str, inputs: List[Var], forward: Callable)-> Var:
+	'''Document:
+	*
+	    Numpy Code Operator for easily customized op.
+	
+	    ----------------
+	
+	    * [in] shape:    the output shape, a integer array
+	    
+	    * [in] dtype:    the output data type
+	    
+	    * [in] inputs:   A list of input jittor Vars
+	
+	    * [in] forward:  function, represents forward python function
+	
+	    * [in] backward: A list of function, represents gradiant for each input
+	
+	    ----------------
+	    
+	    Example-1::
+	
+	        def forward_code(np, data):
+	            a = data["inputs"][0]
+	            b = data["outputs"][0]
+	            np.add(a,a,out=b)
+	
+	        def backward_code(np, data):
+	            dout = data["dout"]
+	            out = data["outputs"][0]
+	            np.copyto(out, dout*2.0)
+	
+	        a = jt.random((5,1))
+	        b = jt.numpy_code(
+	            a.shape,
+	            a.dtype,
+	            [a],
+	            forward_code,
+	            [backward_code],
+	        )
+	
+	    Example-2::
+	    
+	        def forward_code(np, data):
+	            a,b = data["inputs"]
+	            c,d = data["outputs"]
+	            np.add(a,b,out=c)
+	            np.subtract(a,b,out=d)
+	
+	        def backward_code1(np, data):
+	            dout = data["dout"]
+	            out = data["outputs"][0]
+	            np.copyto(out, dout)
+	
+	        def backward_code2(np, data):
+	            dout = data["dout"]
+	            out_index = data["out_index"]
+	            out = data["outputs"][0]
+	            if out_index==0:
+	                np.copyto(out, dout)
+	            else:
+	                np.negative(dout, out)
+	
+	        a = jt.random((5,1))
+	        b = jt.random((5,1))
+	        c, d = jt.numpy_code(
+	            [a.shape, a.shape],
+	            [a.dtype, a.dtype],
+	            [a, b],
+	            forward_code,
+	            [backward_code1,backward_code2],
+	        )'''
+	...
+@overload
+def numpy_code(shapes: List[Tuple[int]], dtypes: List[str], inputs: List[Var], forward: Callable)-> List[Var]:
+	'''Document:
+	*
+	    Numpy Code Operator for easily customized op.
+	
+	    ----------------
+	
+	    * [in] shape:    the output shape, a integer array
+	    
+	    * [in] dtype:    the output data type
+	    
+	    * [in] inputs:   A list of input jittor Vars
+	
+	    * [in] forward:  function, represents forward python function
+	
+	    * [in] backward: A list of function, represents gradiant for each input
+	
+	    ----------------
+	    
+	    Example-1::
+	
+	        def forward_code(np, data):
+	            a = data["inputs"][0]
+	            b = data["outputs"][0]
+	            np.add(a,a,out=b)
+	
+	        def backward_code(np, data):
+	            dout = data["dout"]
+	            out = data["outputs"][0]
+	            np.copyto(out, dout*2.0)
+	
+	        a = jt.random((5,1))
+	        b = jt.numpy_code(
+	            a.shape,
+	            a.dtype,
+	            [a],
+	            forward_code,
+	            [backward_code],
+	        )
+	
+	    Example-2::
+	    
+	        def forward_code(np, data):
+	            a,b = data["inputs"]
+	            c,d = data["outputs"]
+	            np.add(a,b,out=c)
+	            np.subtract(a,b,out=d)
+	
+	        def backward_code1(np, data):
+	            dout = data["dout"]
+	            out = data["outputs"][0]
+	            np.copyto(out, dout)
+	
+	        def backward_code2(np, data):
+	            dout = data["dout"]
+	            out_index = data["out_index"]
+	            out = data["outputs"][0]
+	            if out_index==0:
+	                np.copyto(out, dout)
+	            else:
+	                np.negative(dout, out)
+	
+	        a = jt.random((5,1))
+	        b = jt.random((5,1))
+	        c, d = jt.numpy_code(
+	            [a.shape, a.shape],
+	            [a.dtype, a.dtype],
+	            [a, b],
+	            forward_code,
+	            [backward_code1,backward_code2],
+	        )'''
+	...
+@overload
+def code(shape: Tuple[int], dtype: str, inputs: List[Var]={}, cpu_src: str="", cpu_grad_src: List[str]={}, cpu_header: str="", cuda_src: str="", cuda_grad_src: List[str]={}, cuda_header: str="")-> Var:
+	'''Document:
+	*
+	    Code Operator for easily customized op.
+	
+	    ----------------
+	
+	    * [in] shape:   the output shape, a integer array
+	    
+	    * [in] dtype:   the output data type
+	    
+	    * [in] inputs:  A list of input jittor Vars
+	    
+	    * [in] cpu_src: cpu source code string, buildin value:
+	
+	            *   in{x}, in{x}_shape{y}, in{x}_stride{y}, in{x}_type, in{x}_p, @in0(...)
+	            *   out{x}, out{x}_shape{y}, out{x}_stride{y}, out{x}_type, out{x}_p, @out0(...)
+	            *   out, out_shape{y}, out_stride{y}, out_type, out_p, @out(...)
+	    
+	    * [in] cpu_header: cpu header code string.
+	
+	    * [in] cuda_src: cuda source code string.
+	
+	    * [in] cuda_header: cuda header code string.
+	
+	    ----------------
+	    
+	    Example-1::
+	
+	        from jittor import Function
+	        import jittor as jt
+	
+	        class Func(Function):
+	            def execute(self, x):
+	                self.save_vars = x
+	                return jt.code(x.shape, x.dtype, [x],
+	                    cpu_src="""
+	                        for (int i=0; i<in0_shape0; i++)
+	                            @out(i) = @in0(i)*@in0(i)*2;
+	                    """)
+	
+	            def grad(self, grad_x):
+	                x = self.save_vars
+	                return jt.code(x.shape, x.dtype, [x, grad_x],
+	                    cpu_src="""
+	                        for (int i=0; i<in0_shape0; i++)
+	                            @out(i) = @in1(i)*@in0(i)*4;
+	                    """)
+	
+	        a = jt.random([10])
+	        func = Func()
+	        b = func(a)
+	        print(b)
+	        print(jt.grad(b,a))
+	
+	    Example-2::
+	
+	        a = jt.array([3,2,1])
+	        b = jt.code(a.shape, a.dtype, [a],
+	            cpu_header="""
+	                #include <algorithm>
+	                @alias(a, in0)
+	                @alias(b, out)
+	            """,
+	            cpu_src="""
+	                for (int i=0; i<a_shape0; i++)
+	                    @b(i) = @a(i);
+	                std::sort(&@b(0), &@b(in0_shape0));
+	            """
+	        )
+	        assert (b.data==[1,2,3]).all()
+	
+	    Example-3::
+	
+	        #This example shows how to set multiple outputs in code op.
+	        a = jt.array([3,2,1])
+	        b,c = jt.code([(1,), (1,)], [a.dtype, a.dtype], [a],
+	            cpu_header="""
+	                #include <iostream>
+	                using namespace std;
+	            """,
+	            cpu_src="""
+	                @alias(a, in0)
+	                @alias(b, out0)
+	                @alias(c, out1)
+	                @b(0) = @c(0) = @a(0);
+	                for (int i=0; i<a_shape0; i++) {
+	                    @b(0) = std::min(@b(0), @a(i));
+	                    @c(0) = std::max(@c(0), @a(i));
+	                }
+	                cout << "min:" << @b(0) << " max:" << @c(0) << endl;
+	            """
+	        )
+	        assert b.data == 1, b
+	        assert c.data == 3, c
+	
+	    Example-4::
+	
+	        #This example shows how to use dynamic shape of jittor variables.
+	        a = jt.array([5,-4,3,-2,1])
+	        
+	        # negtive shape for max size of vary dimension
+	        b,c = jt.code([(-5,), (-5,)], [a.dtype, a.dtype], [a],
+	            cpu_src="""
+	                @alias(a, in0)
+	                @alias(b, out0)
+	                @alias(c, out1)
+	                int num_b=0, num_c=0;
+	                for (int i=0; i<a_shape0; i++) {
+	                    if (@a(i)>0)
+	                        @b(num_b++) = @a(i);
+	                    else
+	                        @c(num_c++) = @a(i);
+	                }
+	                b->set_shape({num_b});
+	                c->set_shape({num_c});
+	            """
+	        )
+	        assert (b.data == [5,3,1]).all()
+	        assert (c.data == [-4,-2]).all()
+	
+	
+	    CUDA Example-1::
+	
+	        #This example shows how to use CUDA in code op.
+	        import jittor as jt
+	        from jittor import Function
+	        jt.flags.use_cuda = 1
+	
+	        class Func(Function):
+	            def execute(self, a, b):
+	                self.save_vars = a, b
+	                return jt.code(a.shape, a.dtype, [a,b],
+	                    cuda_src="""
+	                        __global__ static void kernel1(@ARGS_DEF) {
+	                            @PRECALC
+	                            int i = threadIdx.x + blockIdx.x * blockDim.x;
+	                            int stride = blockDim.x * gridDim.x;
+	                            for (; i<in0_shape0; i+=stride)
+	                                @out(i) = @in0(i)*@in1(i);
+	                        }
+	                        kernel1<<<(in0_shape0-1)/1024+1, 1024>>>(@ARGS);
+	                    """)
+	
+	            def grad(self, grad):
+	                a, b = self.save_vars
+	                return jt.code([a.shape, b.shape], [a.dtype, b.dtype], [a, b, grad],
+	                    cuda_src="""
+	                        __global__ static void kernel2(@ARGS_DEF) {
+	                            @PRECALC
+	                            int i = threadIdx.x + blockIdx.x * blockDim.x;
+	                            int stride = blockDim.x * gridDim.x;
+	                            for (; i<in0_shape0; i+=stride) {
+	                                @out0(i) = @in2(i)*@in1(i);
+	                                @out1(i) = @in2(i)*@in0(i);
+	                            }
+	                        }
+	                        kernel2<<<(in0_shape0-1)/1024+1, 1024>>>(@ARGS);
+	                    """)
+	                
+	        a = jt.random([100000])
+	        b = jt.random([100000])
+	        func = Func()
+	        c = func(a,b)
+	        print(c)
+	        print(jt.grad(c, [a, b]))
+	
+	    CUDA Example-2::
+	    
+	        #This example shows how to use multi dimension data with CUDA.
+	        import jittor as jt
+	        from jittor import Function
+	        jt.flags.use_cuda = 1
+	
+	        class Func(Function):
+	            def execute(self, a, b):
+	                self.save_vars = a, b
+	                return jt.code(a.shape, a.dtype, [a,b],
+	                    cuda_src="""
+	                        __global__ static void kernel1(@ARGS_DEF) {
+	                            @PRECALC
+	                            for (int i=blockIdx.x; i<in0_shape0; i+=gridDim.x)
+	                            for (int j=threadIdx.x; j<in0_shape1; j+=blockDim.x)
+	                                @out(i,j) = @in0(i,j)*@in1(i,j);
+	                        }
+	                        kernel1<<<32, 32>>>(@ARGS);
+	                    """)
+	
+	            def grad(self, grad):
+	                a, b = self.save_vars
+	                return jt.code([a.shape, b.shape], [a.dtype, b.dtype], [a, b, grad],
+	                    cuda_src="""
+	                        __global__ static void kernel2(@ARGS_DEF) {
+	                            @PRECALC
+	                            for (int i=blockIdx.x; i<in0_shape0; i+=gridDim.x)
+	                            for (int j=threadIdx.x; j<in0_shape1; j+=blockDim.x) {
+	                                @out0(i,j) = @in2(i,j)*@in1(i,j);
+	                                @out1(i,j) = @in2(i,j)*@in0(i,j);
+	                            }
+	                        }
+	                        kernel2<<<32, 32>>>(@ARGS);
+	                    """)
+	                
+	        a = jt.random((100,100))
+	        b = jt.random((100,100))
+	        func = Func()
+	        c = func(a,b)
+	        print(c)
+	        print(jt.grad(c, [a, b]))'''
+	...
+@overload
+def code(shapes: List[Tuple[int]], dtypes: List[str], inputs: List[Var]={}, cpu_src: str="", cpu_grad_src: List[str]={}, cpu_header: str="", cuda_src: str="", cuda_grad_src: List[str]={}, cuda_header: str="")-> List[Var]:
+	'''Document:
+	*
+	    Code Operator for easily customized op.
+	
+	    ----------------
+	
+	    * [in] shape:   the output shape, a integer array
+	    
+	    * [in] dtype:   the output data type
+	    
+	    * [in] inputs:  A list of input jittor Vars
+	    
+	    * [in] cpu_src: cpu source code string, buildin value:
+	
+	            *   in{x}, in{x}_shape{y}, in{x}_stride{y}, in{x}_type, in{x}_p, @in0(...)
+	            *   out{x}, out{x}_shape{y}, out{x}_stride{y}, out{x}_type, out{x}_p, @out0(...)
+	            *   out, out_shape{y}, out_stride{y}, out_type, out_p, @out(...)
+	    
+	    * [in] cpu_header: cpu header code string.
+	
+	    * [in] cuda_src: cuda source code string.
+	
+	    * [in] cuda_header: cuda header code string.
+	
+	    ----------------
+	    
+	    Example-1::
+	
+	        from jittor import Function
+	        import jittor as jt
+	
+	        class Func(Function):
+	            def execute(self, x):
+	                self.save_vars = x
+	                return jt.code(x.shape, x.dtype, [x],
+	                    cpu_src="""
+	                        for (int i=0; i<in0_shape0; i++)
+	                            @out(i) = @in0(i)*@in0(i)*2;
+	                    """)
+	
+	            def grad(self, grad_x):
+	                x = self.save_vars
+	                return jt.code(x.shape, x.dtype, [x, grad_x],
+	                    cpu_src="""
+	                        for (int i=0; i<in0_shape0; i++)
+	                            @out(i) = @in1(i)*@in0(i)*4;
+	                    """)
+	
+	        a = jt.random([10])
+	        func = Func()
+	        b = func(a)
+	        print(b)
+	        print(jt.grad(b,a))
+	
+	    Example-2::
+	
+	        a = jt.array([3,2,1])
+	        b = jt.code(a.shape, a.dtype, [a],
+	            cpu_header="""
+	                #include <algorithm>
+	                @alias(a, in0)
+	                @alias(b, out)
+	            """,
+	            cpu_src="""
+	                for (int i=0; i<a_shape0; i++)
+	                    @b(i) = @a(i);
+	                std::sort(&@b(0), &@b(in0_shape0));
+	            """
+	        )
+	        assert (b.data==[1,2,3]).all()
+	
+	    Example-3::
+	
+	        #This example shows how to set multiple outputs in code op.
+	        a = jt.array([3,2,1])
+	        b,c = jt.code([(1,), (1,)], [a.dtype, a.dtype], [a],
+	            cpu_header="""
+	                #include <iostream>
+	                using namespace std;
+	            """,
+	            cpu_src="""
+	                @alias(a, in0)
+	                @alias(b, out0)
+	                @alias(c, out1)
+	                @b(0) = @c(0) = @a(0);
+	                for (int i=0; i<a_shape0; i++) {
+	                    @b(0) = std::min(@b(0), @a(i));
+	                    @c(0) = std::max(@c(0), @a(i));
+	                }
+	                cout << "min:" << @b(0) << " max:" << @c(0) << endl;
+	            """
+	        )
+	        assert b.data == 1, b
+	        assert c.data == 3, c
+	
+	    Example-4::
+	
+	        #This example shows how to use dynamic shape of jittor variables.
+	        a = jt.array([5,-4,3,-2,1])
+	        
+	        # negtive shape for max size of vary dimension
+	        b,c = jt.code([(-5,), (-5,)], [a.dtype, a.dtype], [a],
+	            cpu_src="""
+	                @alias(a, in0)
+	                @alias(b, out0)
+	                @alias(c, out1)
+	                int num_b=0, num_c=0;
+	                for (int i=0; i<a_shape0; i++) {
+	                    if (@a(i)>0)
+	                        @b(num_b++) = @a(i);
+	                    else
+	                        @c(num_c++) = @a(i);
+	                }
+	                b->set_shape({num_b});
+	                c->set_shape({num_c});
+	            """
+	        )
+	        assert (b.data == [5,3,1]).all()
+	        assert (c.data == [-4,-2]).all()
+	
+	
+	    CUDA Example-1::
+	
+	        #This example shows how to use CUDA in code op.
+	        import jittor as jt
+	        from jittor import Function
+	        jt.flags.use_cuda = 1
+	
+	        class Func(Function):
+	            def execute(self, a, b):
+	                self.save_vars = a, b
+	                return jt.code(a.shape, a.dtype, [a,b],
+	                    cuda_src="""
+	                        __global__ static void kernel1(@ARGS_DEF) {
+	                            @PRECALC
+	                            int i = threadIdx.x + blockIdx.x * blockDim.x;
+	                            int stride = blockDim.x * gridDim.x;
+	                            for (; i<in0_shape0; i+=stride)
+	                                @out(i) = @in0(i)*@in1(i);
+	                        }
+	                        kernel1<<<(in0_shape0-1)/1024+1, 1024>>>(@ARGS);
+	                    """)
+	
+	            def grad(self, grad):
+	                a, b = self.save_vars
+	                return jt.code([a.shape, b.shape], [a.dtype, b.dtype], [a, b, grad],
+	                    cuda_src="""
+	                        __global__ static void kernel2(@ARGS_DEF) {
+	                            @PRECALC
+	                            int i = threadIdx.x + blockIdx.x * blockDim.x;
+	                            int stride = blockDim.x * gridDim.x;
+	                            for (; i<in0_shape0; i+=stride) {
+	                                @out0(i) = @in2(i)*@in1(i);
+	                                @out1(i) = @in2(i)*@in0(i);
+	                            }
+	                        }
+	                        kernel2<<<(in0_shape0-1)/1024+1, 1024>>>(@ARGS);
+	                    """)
+	                
+	        a = jt.random([100000])
+	        b = jt.random([100000])
+	        func = Func()
+	        c = func(a,b)
+	        print(c)
+	        print(jt.grad(c, [a, b]))
+	
+	    CUDA Example-2::
+	    
+	        #This example shows how to use multi dimension data with CUDA.
+	        import jittor as jt
+	        from jittor import Function
+	        jt.flags.use_cuda = 1
+	
+	        class Func(Function):
+	            def execute(self, a, b):
+	                self.save_vars = a, b
+	                return jt.code(a.shape, a.dtype, [a,b],
+	                    cuda_src="""
+	                        __global__ static void kernel1(@ARGS_DEF) {
+	                            @PRECALC
+	                            for (int i=blockIdx.x; i<in0_shape0; i+=gridDim.x)
+	                            for (int j=threadIdx.x; j<in0_shape1; j+=blockDim.x)
+	                                @out(i,j) = @in0(i,j)*@in1(i,j);
+	                        }
+	                        kernel1<<<32, 32>>>(@ARGS);
+	                    """)
+	
+	            def grad(self, grad):
+	                a, b = self.save_vars
+	                return jt.code([a.shape, b.shape], [a.dtype, b.dtype], [a, b, grad],
+	                    cuda_src="""
+	                        __global__ static void kernel2(@ARGS_DEF) {
+	                            @PRECALC
+	                            for (int i=blockIdx.x; i<in0_shape0; i+=gridDim.x)
+	                            for (int j=threadIdx.x; j<in0_shape1; j+=blockDim.x) {
+	                                @out0(i,j) = @in2(i,j)*@in1(i,j);
+	                                @out1(i,j) = @in2(i,j)*@in0(i,j);
+	                            }
+	                        }
+	                        kernel2<<<32, 32>>>(@ARGS);
+	                    """)
+	                
+	        a = jt.random((100,100))
+	        b = jt.random((100,100))
+	        func = Func()
+	        c = func(a,b)
+	        print(c)
+	        print(jt.grad(c, [a, b]))'''
+	...
+@overload
+def code(inputs: List[Var], outputs: List[Var], cpu_src: str="", cpu_grad_src: List[str]={}, cpu_header: str="", cuda_src: str="", cuda_grad_src: List[str]={}, cuda_header: str="")-> List[Var]:
+	'''Document:
+	*
+	    Code Operator for easily customized op.
+	
+	    ----------------
+	
+	    * [in] shape:   the output shape, a integer array
+	    
+	    * [in] dtype:   the output data type
+	    
+	    * [in] inputs:  A list of input jittor Vars
+	    
+	    * [in] cpu_src: cpu source code string, buildin value:
+	
+	            *   in{x}, in{x}_shape{y}, in{x}_stride{y}, in{x}_type, in{x}_p, @in0(...)
+	            *   out{x}, out{x}_shape{y}, out{x}_stride{y}, out{x}_type, out{x}_p, @out0(...)
+	            *   out, out_shape{y}, out_stride{y}, out_type, out_p, @out(...)
+	    
+	    * [in] cpu_header: cpu header code string.
+	
+	    * [in] cuda_src: cuda source code string.
+	
+	    * [in] cuda_header: cuda header code string.
+	
+	    ----------------
+	    
+	    Example-1::
+	
+	        from jittor import Function
+	        import jittor as jt
+	
+	        class Func(Function):
+	            def execute(self, x):
+	                self.save_vars = x
+	                return jt.code(x.shape, x.dtype, [x],
+	                    cpu_src="""
+	                        for (int i=0; i<in0_shape0; i++)
+	                            @out(i) = @in0(i)*@in0(i)*2;
+	                    """)
+	
+	            def grad(self, grad_x):
+	                x = self.save_vars
+	                return jt.code(x.shape, x.dtype, [x, grad_x],
+	                    cpu_src="""
+	                        for (int i=0; i<in0_shape0; i++)
+	                            @out(i) = @in1(i)*@in0(i)*4;
+	                    """)
+	
+	        a = jt.random([10])
+	        func = Func()
+	        b = func(a)
+	        print(b)
+	        print(jt.grad(b,a))
+	
+	    Example-2::
+	
+	        a = jt.array([3,2,1])
+	        b = jt.code(a.shape, a.dtype, [a],
+	            cpu_header="""
+	                #include <algorithm>
+	                @alias(a, in0)
+	                @alias(b, out)
+	            """,
+	            cpu_src="""
+	                for (int i=0; i<a_shape0; i++)
+	                    @b(i) = @a(i);
+	                std::sort(&@b(0), &@b(in0_shape0));
+	            """
+	        )
+	        assert (b.data==[1,2,3]).all()
+	
+	    Example-3::
+	
+	        #This example shows how to set multiple outputs in code op.
+	        a = jt.array([3,2,1])
+	        b,c = jt.code([(1,), (1,)], [a.dtype, a.dtype], [a],
+	            cpu_header="""
+	                #include <iostream>
+	                using namespace std;
+	            """,
+	            cpu_src="""
+	                @alias(a, in0)
+	                @alias(b, out0)
+	                @alias(c, out1)
+	                @b(0) = @c(0) = @a(0);
+	                for (int i=0; i<a_shape0; i++) {
+	                    @b(0) = std::min(@b(0), @a(i));
+	                    @c(0) = std::max(@c(0), @a(i));
+	                }
+	                cout << "min:" << @b(0) << " max:" << @c(0) << endl;
+	            """
+	        )
+	        assert b.data == 1, b
+	        assert c.data == 3, c
+	
+	    Example-4::
+	
+	        #This example shows how to use dynamic shape of jittor variables.
+	        a = jt.array([5,-4,3,-2,1])
+	        
+	        # negtive shape for max size of vary dimension
+	        b,c = jt.code([(-5,), (-5,)], [a.dtype, a.dtype], [a],
+	            cpu_src="""
+	                @alias(a, in0)
+	                @alias(b, out0)
+	                @alias(c, out1)
+	                int num_b=0, num_c=0;
+	                for (int i=0; i<a_shape0; i++) {
+	                    if (@a(i)>0)
+	                        @b(num_b++) = @a(i);
+	                    else
+	                        @c(num_c++) = @a(i);
+	                }
+	                b->set_shape({num_b});
+	                c->set_shape({num_c});
+	            """
+	        )
+	        assert (b.data == [5,3,1]).all()
+	        assert (c.data == [-4,-2]).all()
+	
+	
+	    CUDA Example-1::
+	
+	        #This example shows how to use CUDA in code op.
+	        import jittor as jt
+	        from jittor import Function
+	        jt.flags.use_cuda = 1
+	
+	        class Func(Function):
+	            def execute(self, a, b):
+	                self.save_vars = a, b
+	                return jt.code(a.shape, a.dtype, [a,b],
+	                    cuda_src="""
+	                        __global__ static void kernel1(@ARGS_DEF) {
+	                            @PRECALC
+	                            int i = threadIdx.x + blockIdx.x * blockDim.x;
+	                            int stride = blockDim.x * gridDim.x;
+	                            for (; i<in0_shape0; i+=stride)
+	                                @out(i) = @in0(i)*@in1(i);
+	                        }
+	                        kernel1<<<(in0_shape0-1)/1024+1, 1024>>>(@ARGS);
+	                    """)
+	
+	            def grad(self, grad):
+	                a, b = self.save_vars
+	                return jt.code([a.shape, b.shape], [a.dtype, b.dtype], [a, b, grad],
+	                    cuda_src="""
+	                        __global__ static void kernel2(@ARGS_DEF) {
+	                            @PRECALC
+	                            int i = threadIdx.x + blockIdx.x * blockDim.x;
+	                            int stride = blockDim.x * gridDim.x;
+	                            for (; i<in0_shape0; i+=stride) {
+	                                @out0(i) = @in2(i)*@in1(i);
+	                                @out1(i) = @in2(i)*@in0(i);
+	                            }
+	                        }
+	                        kernel2<<<(in0_shape0-1)/1024+1, 1024>>>(@ARGS);
+	                    """)
+	                
+	        a = jt.random([100000])
+	        b = jt.random([100000])
+	        func = Func()
+	        c = func(a,b)
+	        print(c)
+	        print(jt.grad(c, [a, b]))
+	
+	    CUDA Example-2::
+	    
+	        #This example shows how to use multi dimension data with CUDA.
+	        import jittor as jt
+	        from jittor import Function
+	        jt.flags.use_cuda = 1
+	
+	        class Func(Function):
+	            def execute(self, a, b):
+	                self.save_vars = a, b
+	                return jt.code(a.shape, a.dtype, [a,b],
+	                    cuda_src="""
+	                        __global__ static void kernel1(@ARGS_DEF) {
+	                            @PRECALC
+	                            for (int i=blockIdx.x; i<in0_shape0; i+=gridDim.x)
+	                            for (int j=threadIdx.x; j<in0_shape1; j+=blockDim.x)
+	                                @out(i,j) = @in0(i,j)*@in1(i,j);
+	                        }
+	                        kernel1<<<32, 32>>>(@ARGS);
+	                    """)
+	
+	            def grad(self, grad):
+	                a, b = self.save_vars
+	                return jt.code([a.shape, b.shape], [a.dtype, b.dtype], [a, b, grad],
+	                    cuda_src="""
+	                        __global__ static void kernel2(@ARGS_DEF) {
+	                            @PRECALC
+	                            for (int i=blockIdx.x; i<in0_shape0; i+=gridDim.x)
+	                            for (int j=threadIdx.x; j<in0_shape1; j+=blockDim.x) {
+	                                @out0(i,j) = @in2(i,j)*@in1(i,j);
+	                                @out1(i,j) = @in2(i,j)*@in0(i,j);
+	                            }
+	                        }
+	                        kernel2<<<32, 32>>>(@ARGS);
+	                    """)
+	                
+	        a = jt.random((100,100))
+	        b = jt.random((100,100))
+	        func = Func()
+	        c = func(a,b)
+	        print(c)
+	        print(jt.grad(c, [a, b]))'''
+	...
+def copy(x: Var)-> Var:
+ ...
+def setitem(x: Var, slices: slice, y: Var, op: str="void")-> Var:
+ ...
+@overload
+def broadcast(x: Var, shape: Tuple[int], dims: Tuple[int]=())-> Var:
+	'''Document:
+	*
+	    Broadcast ``x`` to a given shape.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] shape:   the output shape.
+	
+	    * [in] dims:    specifies the new dimension in the output shape, an integer array.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(0, 10, shape=(2, 2))
+	        >>> x
+	        jt.Var([[8 1]
+	         [7 6]], dtype=int32)
+	        >>> jt.broadcast(x, shape=(2, 3, 2), dims=[1])
+	        jt.Var([[[8 1]
+	          [8 1]
+	          [8 1]],
+	         [[7 6]
+	          [7 6]
+	          [7 6]]], dtype=int32)'''
+	...
+@overload
+def broadcast(x: Var, y: Var, dims: Tuple[int]=())-> Var:
+	'''Document:
+	*
+	    Broadcast ``x`` to a given shape.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] shape:   the output shape.
+	
+	    * [in] dims:    specifies the new dimension in the output shape, an integer array.
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> x = jt.randint(0, 10, shape=(2, 2))
+	        >>> x
+	        jt.Var([[8 1]
+	         [7 6]], dtype=int32)
+	        >>> jt.broadcast(x, shape=(2, 3, 2), dims=[1])
+	        jt.Var([[[8 1]
+	          [8 1]
+	          [8 1]],
+	         [[7 6]
+	          [7 6]
+	          [7 6]]], dtype=int32)'''
+	...
+def broadcast_var(x: Var, y: Var, dims: Tuple[int]=())-> Var:
+	'''Document:
+	*
+	    Broadcast ``x`` to the same shape as ``y``.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var.
+	
+	    * [in] y:       the reference jt.Var.
+	
+	    * [in] dims:    specifies the new dimension in the output shape, an integer array.
+	
+	    ----------------
+	
+	    .. note::
+	      jt.broadcast_var(x, y, dims) is an alias of jt.broadcast(x, y, dims)
+	
+	    Example-1::
+	        >>> x = jt.randint(0, 10, shape=(2, 2))
+	        >>> x
+	        jt.Var([[8 1]
+	         [7 6]], dtype=int32)
+	        >>> y = jt.randint(0, 10, shape=(2, 3, 2))
+	        >>> jt.broadcast(x, y, dims=[1])
+	        jt.Var([[[8 1]
+	          [8 1]
+	          [8 1]],
+	         [[7 6]
+	          [7 6]
+	          [7 6]]], dtype=int32)
+	        >>> jt.broadcast_var(x, y, dims=[1])
+	        jt.Var([[[8 1]
+	          [8 1]
+	          [8 1]],
+	         [[7 6]
+	          [7 6]
+	          [7 6]]], dtype=int32)'''
+	...
+def reshape(x: Var, shape: Tuple[int])-> Var:
+	'''Document:
+	*
+	    Returns a tensor with the same data and number of elements as input, but with the specified shape. 
+	
+	    A single dimension may be -1, in which case it's inferred from the remaining dimensions and the number of elements in input.
+	
+	    ----------------
+	
+	    * [in] x:       the input jt.Var
+	
+	    * [in] shape:   the output shape, an integer array
+	
+	    ----------------
+	
+	    Example-1::
+	        >>> a = jt.randint(0, 10, shape=(12,))
+	        >>> a
+	        jt.Var([4 0 8 4 6 3 1 8 1 1 2 2], dtype=int32)
+	        >>> jt.reshape(a, (3, 4))
+	        jt.Var([[4 0 8 4]
+	         [6 3 1 8]
+	         [1 1 2 2]], dtype=int32)
+	        >>> jt.reshape(a, (-1, 6))
+	        jt.Var([[4 0 8 4 6 3]
+	         [1 8 1 1 2 2]], dtype=int32)'''
+	...
+def empty(shape: Tuple[int], dtype: str="float32")-> Var:
+ ...
+def reindex_reduce(y: Var, op: str, shape: Tuple[int], indexes: List[str], overflow_conditions: List[str]={}, extras: List[Var]={})-> Var:
+	'''Document:
+	*
+	    Reindex Reduce Operator is a many-to-one map operator.
+	    It performs equivalent Python-pseudo implementation below::
+	
+	        # input is y, output is x
+	        n = len(y.shape)-1
+	        m = len(shape)-1
+	        k = len(overflow_conditions)-1
+	        x = np.zeros(shape, y.dtype)
+	        x[:] = initial_value(op)
+	        for i0 in range(y.shape[0]): # 1-st loop
+	            for i1 in range(y.shape[1]): # 2-nd loop
+	                ...... # many loops
+	                for in in range(y.shape[n]) # n+1 -th loop
+	                    # indexes[i] is a c++ style integer expression consisting of i0,i1,...,in
+	                    xi0,xi1,...,xim = indexes[0],indexes[1],...,indexes[m]
+	                    if not is_overflow(xi0,xi1,...,xim):
+	                        x[xi0,xi1,...,xim] = op(x[xi0,xi1,...,xim], y[i0,i1,...,in])
+	
+	        # is_overflow is defined as following
+	        def is_overflow(xi0,xi1,...,xim):
+	            return (
+	                xi0 < 0 || xi0 >= shape[0] ||
+	                xi1 < 0 || xi1 >= shape[1] ||
+	                ......
+	                xim < 0 || xim >= shape[m] ||
+	
+	                # overflow_conditions[i] is a c++ style boolean expression consisting of i0,i1,...,in
+	                overflow_conditions[0] ||
+	                overflow_conditions[1] ||
+	                ......
+	                overflow_conditions[k]
+	            )
+	
+	    * [in] y:   A input jittor Var
+	    
+	    * [in] op:  a string represent the reduce operation type
+	    
+	    * [in] shape:   the output shape, a integer array
+	    
+	    * [in] indexes: array of c++ style integer expression, its length should be the same with length of output shape, some buildin variables it can use are::
+	    
+	             XDIM, xshape0, ..., xshapem, xstride0, ..., xstridem
+	             YDIM, yshape0, ..., yshapen, ystride0, ..., ystriden
+	             i0, i1, ..., in
+	             @e0(...), @e1(...) for extras input index
+	             e0p, e1p , ... for extras input pointer
+	    
+	    * [in] overflow_conditions: array of c++ style boolean expression, it length can be vary. the buildin variables it can use are the same with indexes.
+	    
+	    * [in] extras:  extra var used for index
+	    
+	    Example 
+	
+	    Pooling implemented by reindex operation::
+	
+	        def pool(x, size, op):
+	            N,H,W,C = x.shape
+	            h = (H+size-1)//size
+	            w = (W+size-1)//size
+	            return x.reindex_reduce(op, [N,h,w,C], [
+	                "i0", # Nid
+	                f"i1/{size}", # Hid
+	                f"i2/{size}", # Wid
+	                "i3", # Cid
+	            ])'''
+	...
+class Var:
+	'''Variable that stores multi-dimensional data.'''
+	def ternary(self, x: Var, y: Var)-> Var: ...
+	@overload
+	def reindex(self, shape: Tuple[int], indexes: List[str], overflow_value: float=0, overflow_conditions: List[str]={}, extras: List[Var]={})-> Var:		
+		'''Document:
+		* 
+		    Reindex Operator is a one-to-many map operator.
+		    It performs equivalent Python-pseudo implementation below::
+		
+		        # input is x, output is y
+		        n = len(shape)-1
+		        m = len(x.shape)-1
+		        k = len(overflow_conditions)-1
+		        y = np.zeros(shape, x.dtype)
+		        for i0 in range(shape[0]): # 1-st loop
+		            for i1 in range(shape[1]): # 2-nd loop
+		                ...... # many loops
+		                for in in range(shape[n]) # n+1 -th loop
+		                    if is_overflow(i0,i1,...,in):
+		                        y[i0,i1,...,in] = overflow_value
+		                    else:
+		                        # indexes[i] is a c++ style integer expression consisting of i0,i1,...,in
+		                        y[i0,i1,...,in] = x[indexes[0],indexes[1],...,indexes[m]]
+		
+		        # is_overflow is defined as following
+		        def is_overflow(i0,i1,...,in):
+		            return (
+		                indexes[0] < 0 || indexes[0] >= x.shape[0] ||
+		                indexes[1] < 0 || indexes[1] >= x.shape[1] ||
+		                ......
+		                indexes[m] < 0 || indexes[m] >= x.shape[m] ||
+		
+		                # overflow_conditions[i] is a c++ style boolean expression consisting of i0,i1,...,in
+		                overflow_conditions[0] ||
+		                overflow_conditions[1] ||
+		                ......
+		                overflow_conditions[k]
+		            )
+		    ----------------
+		    * [in] x:	A input jittor Var
+			
+		    * [in] shape:	the output shape, a integer array
+			
+		    * [in] indexes:	array of c++ style integer expression, its length should be the same with the number of dimension of x, some buildin variables it can use are::
+		        
+		             XDIM, xshape0, ..., xshapen, xstride0, ..., xstriden
+		             YDIM, yshape0, ..., yshapem, ystride0, ..., ystridem
+		             i0, i1, ..., in
+		             @e0(...), @e1(...) for extras input index
+		             e0p, e1p , ... for extras input pointer
+					 
+		    * [in] overflow_value:	overflow value
+			
+		    * [in] overflow_conditions:	array of c++ style boolean expression, it length can be vary. the buildin variables it can use are the same with indexes
+				
+		    * [in] extras: extra var used for index
+			
+		    ----------------
+		    Example
+		    Convolution implemented by reindex operation::
+		
+		        def conv(x, w):
+		            N,H,W,C = x.shape
+		            Kh, Kw, _C, Kc = w.shape
+		            assert C==_C
+		            xx = x.reindex([N,H-Kh+1,W-Kw+1,Kh,Kw,C,Kc], [
+		                'i0', # Nid
+		                'i1+i3', # Hid+Khid
+		                'i2+i4', # Wid+KWid
+		                'i5', # Cid
+		            ])
+		            ww = w.broadcast_var(xx)
+		            yy = xx*ww
+		            y = yy.sum([3,4,5]) # Kh, Kw, C
+		            return y, yy'''
+		...
+	@overload
+	def reindex(self, indexes: List[Var], overflow_value: float=0, overflow_conditions: List[str]={})-> Var:		
+		'''Document:
+		* 
+		    Reindex Operator is a one-to-many map operator.
+		    It performs equivalent Python-pseudo implementation below::
+		
+		        # input is x, output is y
+		        n = len(shape)-1
+		        m = len(x.shape)-1
+		        k = len(overflow_conditions)-1
+		        y = np.zeros(shape, x.dtype)
+		        for i0 in range(shape[0]): # 1-st loop
+		            for i1 in range(shape[1]): # 2-nd loop
+		                ...... # many loops
+		                for in in range(shape[n]) # n+1 -th loop
+		                    if is_overflow(i0,i1,...,in):
+		                        y[i0,i1,...,in] = overflow_value
+		                    else:
+		                        # indexes[i] is a c++ style integer expression consisting of i0,i1,...,in
+		                        y[i0,i1,...,in] = x[indexes[0],indexes[1],...,indexes[m]]
+		
+		        # is_overflow is defined as following
+		        def is_overflow(i0,i1,...,in):
+		            return (
+		                indexes[0] < 0 || indexes[0] >= x.shape[0] ||
+		                indexes[1] < 0 || indexes[1] >= x.shape[1] ||
+		                ......
+		                indexes[m] < 0 || indexes[m] >= x.shape[m] ||
+		
+		                # overflow_conditions[i] is a c++ style boolean expression consisting of i0,i1,...,in
+		                overflow_conditions[0] ||
+		                overflow_conditions[1] ||
+		                ......
+		                overflow_conditions[k]
+		            )
+		    ----------------
+		    * [in] x:	A input jittor Var
+			
+		    * [in] shape:	the output shape, a integer array
+			
+		    * [in] indexes:	array of c++ style integer expression, its length should be the same with the number of dimension of x, some buildin variables it can use are::
+		        
+		             XDIM, xshape0, ..., xshapen, xstride0, ..., xstriden
+		             YDIM, yshape0, ..., yshapem, ystride0, ..., ystridem
+		             i0, i1, ..., in
+		             @e0(...), @e1(...) for extras input index
+		             e0p, e1p , ... for extras input pointer
+					 
+		    * [in] overflow_value:	overflow value
+			
+		    * [in] overflow_conditions:	array of c++ style boolean expression, it length can be vary. the buildin variables it can use are the same with indexes
+				
+		    * [in] extras: extra var used for index
+			
+		    ----------------
+		    Example
+		    Convolution implemented by reindex operation::
+		
+		        def conv(x, w):
+		            N,H,W,C = x.shape
+		            Kh, Kw, _C, Kc = w.shape
+		            assert C==_C
+		            xx = x.reindex([N,H-Kh+1,W-Kw+1,Kh,Kw,C,Kc], [
+		                'i0', # Nid
+		                'i1+i3', # Hid+Khid
+		                'i2+i4', # Wid+KWid
+		                'i5', # Cid
+		            ])
+		            ww = w.broadcast_var(xx)
+		            yy = xx*ww
+		            y = yy.sum([3,4,5]) # Kh, Kw, C
+		            return y, yy'''
+		...
+	def reindex_var(self, indexes: List[Var], overflow_value: float=0, overflow_conditions: List[str]={})-> Var:		
+		'''Document:
+		* Alias x.reindex([i,j,k]) -> 
+		        x.reindex(i.shape, ['@e0(...)','@e1(...)','@e2(...)',], extras=[i,j,k])'''
+		...
+	@overload
+	def index(self, dim: int, dtype: str="int32")-> Var:		
+		'''Document:
+		* 
+		    Index Operator generate index of shape.
+		    
+		    It performs equivalent Python-pseudo implementation below::
+		    
+		        n = len(shape)-1
+		        x = np.zeros(shape, dtype)
+		        for i0 in range(shape[0]): # 1-st loop
+		            for i1 in range(shape[1]): # 2-nd loop
+		                ...... # many loops
+		                for in in range(shape[n]) # n+1 -th loop
+		                    x[i0,i1,...,in] = i@dim
+		    
+		    * [in] shape:   the output shape, a integer array
+		    * [in] dim: the dim of the index.
+		    * [in] dtype:   the data type string, default int32
+		
+		    Example::
+		
+		        print(jt.index([2,2], 0)())
+		        # output: [[0,0],[1,1]]
+		        print(jt.index([2,2], 1)())
+		        # output: [[0,1],[0,1]]'''
+		...
+	@overload
+	def index(self, dtype: str="int32")-> List[Var]:		
+		'''Document:
+		* 
+		    Index Operator generate index of shape.
+		    
+		    It performs equivalent Python-pseudo implementation below::
+		    
+		        n = len(shape)-1
+		        x = np.zeros(shape, dtype)
+		        for i0 in range(shape[0]): # 1-st loop
+		            for i1 in range(shape[1]): # 2-nd loop
+		                ...... # many loops
+		                for in in range(shape[n]) # n+1 -th loop
+		                    x[i0,i1,...,in] = i@dim
+		    
+		    * [in] shape:   the output shape, a integer array
+		    * [in] dim: the dim of the index.
+		    * [in] dtype:   the data type string, default int32
+		
+		    Example::
+		
+		        print(jt.index([2,2], 0)())
+		        # output: [[0,0],[1,1]]
+		        print(jt.index([2,2], 1)())
+		        # output: [[0,1],[0,1]]'''
+		...
+	@overload
+	def index_var(self, dim: int, dtype: str="int32")-> Var:		
+		'''Document:
+		* shape dependency version of index op
+		        jt.index_var(a, 1) similar with jt.index(a.shape, 1)'''
+		...
+	@overload
+	def index_var(self, dtype: str="int32")-> List[Var]:		
+		'''Document:
+		* shape dependency version of index op
+		        jt.index_var(a, 1) similar with jt.index(a.shape, 1)'''
+		...
+	def binary(self, y: Var, p: str)-> Var: ...
+	def pow(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Computes ``x^y``, element-wise. 
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def maximum(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns the element-wise maximum of ``x`` and ``y``. 
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def minimum(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns the element-wise minimum of ``x`` and ``y``. 
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def add(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Element-wise adds ``x`` and ``y`` and returns a new Var. 
+		    
+		    This operation is equivalent to ``x + y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def subtract(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Element-wise subtract ``y`` from ``x`` and returns a new Var.
+		
+		    This operation is equivalent to ``x - y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def multiply(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Element-wise muliplies ``x`` with ``y`` and returns a new Var.
+		
+		    This operation is equivalent to ``x * y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def divide(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Element-wise divide ``x`` by ``y`` and returns a new Var.
+		
+		    This operation is equivalent to ``x / y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.empty((3,), dtype=jt.int32)
+		        >>> a
+		        jt.Var([707406378 707406378 707406378], dtype=int32)
+		        >>> b = jt.empty((3,), dtype=jt.int32)
+		        >>> b
+		        jt.Var([674510453 171649398 538976288], dtype=int32)
+		        >>> jt.divide(a, b)
+		        jt.Var([1.0487701 4.1212287 1.3125001], dtype=float32)
+		        >>> a / b
+		        jt.Var([1.0487701 4.1212287 1.3125001], dtype=float32)
+		
+		    .. note ::
+		    returns float value even if the dtype of input Vars are both integers.
+		    @see jt.ops.floor_divide() for floor division.'''
+		...
+	def floor_divide(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Element-wise divide ``x`` by ``y`` and returns the floor of the result.
+		
+		    This operation is equivalent to ``x // y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randint(1, 10, (3,), dtype=jt.int32)
+		        >>> a
+		        jt.Var([9 2 7], dtype=int32)
+		        >>> b = jt.randint(1, 10, (3,), dtype=jt.int32)
+		        >>> b
+		        jt.Var([6 4 6], dtype=int32)
+		        >>> jt.floor_divide(a, b)
+		        jt.Var([1 0 1], dtype=int32)
+		        >>> a // b
+		        jt.Var([1 0 1], dtype=int32)'''
+		...
+	def mod(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns the element-wise remainder of division.
+		
+		    This operation is equivalent to ``x % y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(3)
+		        >>> a
+		        jt.Var([0.3989529  0.20159635 0.22973768], dtype=float32)
+		        >>> b = jt.rand(3)
+		        >>> b
+		        jt.Var([0.20121202 0.7704864  0.5654395 ], dtype=float32)
+		        >>> jt.mod(a, b)
+		        jt.Var([0.19774088 0.20159635 0.22973768], dtype=float32)
+		        >>> a % b
+		        jt.Var([0.19774088 0.20159635 0.22973768], dtype=float32)'''
+		...
+	def less(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns ``x < y`` element-wise.
+		
+		    This operation is equivalent to ``x < y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def less_equal(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns ``x <= y`` element-wise.
+		
+		    This operation is equivalent to ``x <= y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def greater(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns ``x > y`` element-wise.
+		
+		    This operation is equivalent to ``x > y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def greater_equal(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns ``x >= y`` element-wise.
+		    
+		    This operation is equivalent to ``x >= y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def equal(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns ``x == y`` element-wise.
+		
+		    This operation is equivalent to ``x == y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def not_equal(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns ``x != y`` element-wise.
+		
+		    This operation is equivalent to ``x != y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var.
+		
+		    * [in] y: the second input, a python number or jt.Var.'''
+		...
+	def left_shift(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Shifts the bits of ``x`` to the left by ``y``. 
+		
+		    Bits are shifted to the left by appending ``y`` 0s at the right of ``x``.
+		    This operation is equivalent to ``x << y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var (int32 or int64).
+		
+		    * [in] y: the second input, a python number or jt.Var (int32 or int64).
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randint(0, 10, shape=(3,))
+		        >>> a
+		        jt.Var([7 6 7], dtype=int32)
+		        >>> b = jt.randint(0, 10, shape=(3,))
+		        >>> b
+		        jt.Var([3 9 8], dtype=int32)
+		        >>> jt.left_shift(a, b)
+		        jt.Var([  56 3072 1792], dtype=int32)
+		        >>> a << b
+		        jt.Var([  56 3072 1792], dtype=int32)'''
+		...
+	def right_shift(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Shifts the bits of ``x`` to the right by ``y``. 
+		
+		    This operation is equivalent to ``x >> y``.
+		
+		    ----------------
+		
+		    * [in] x: the first input,  a python number or jt.Var (int32 or int64).
+		
+		    * [in] y: the second input, a python number or jt.Var (int32 or int64).
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randint(0, 1024, shape=(3,))
+		        >>> a
+		        jt.Var([439 113  92], dtype=int32)
+		        >>> b = jt.randint(0, 10, shape=(3,))
+		        >>> b
+		        jt.Var([6 8 4], dtype=int32)
+		        >>> jt.right_shift(a, b)
+		        jt.Var([6 0 5], dtype=int32)'''
+		...
+	def logical_and(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns the element-wise logical AND of the inputs. 
+		
+		    ----------------
+		
+		    * [in] x: the first input, jt.Var.
+		
+		    * [in] y: the second input, jt.Var.'''
+		...
+	def logical_or(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns the element-wise logical OR of the inputs. 
+		
+		    ----------------
+		
+		    * [in] x: the first input, jt.Var.
+		
+		    * [in] y: the second input, jt.Var.'''
+		...
+	def logical_xor(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Returns the element-wise logical XOR of the inputs. 
+		
+		    ----------------
+		
+		    * [in] x: the first input, jt.Var.
+		
+		    * [in] y: the second input, jt.Var.'''
+		...
+	def bitwise_and(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Computes the bitwise AND of x and y.
+		
+		    ----------------
+		
+		    * [in] x: the first input, jt.Var (integal or boolean).
+		
+		    * [in] y: the second input, jt.Var (integal or boolean).'''
+		...
+	def bitwise_or(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Computes the bitwise OR of x and y.
+		
+		    ----------------
+		
+		    * [in] x: the first input, jt.Var (integal or boolean).
+		
+		    * [in] y: the second input, jt.Var (integal or boolean).'''
+		...
+	def bitwise_xor(self, y: Var)-> Var:		
+		'''Document:
+		*
+		    Computes the bitwise XOR of x and y.
+		
+		    ----------------
+		
+		    * [in] x: the first input, jt.Var (integal or boolean).
+		
+		    * [in] y: the second input, jt.Var (integal or boolean).'''
+		...
+	def tape(self)-> Var: ...
+	def where(self, dtype: str="int32")-> List[Var]:		
+		'''Document:
+		*
+		    Where Operator generate index of true condition.
+		
+		    * [in] cond:    condition for index generation
+		
+		    * [in] dtype:   type of return indexes
+		    
+		    * [out] out:  return an array of indexes, same length with number of dims of cond 
+		    
+		    Example::
+		
+		        jt.where([[0,0,1],[1,0,0]])
+		        # return ( [0,2], [1,0] )'''
+		...
+	def argsort(self, dim: int=-1, descending: bool=False, dtype: str="int32")-> List[Var]:		
+		'''Document:
+		* 
+		    Argsort Operator Perform an indirect sort by given key or compare function.
+		
+		    x is input, y is output index, satisfy:
+		
+		        x[y[0]] <= x[y[1]] <= x[y[2]] <= ... <= x[y[n]]
+		
+		    or
+		
+		        key(y[0]) <= key(y[1]) <= key(y[2]) <= ... <= key(y[n])
+		
+		    or
+		
+		        compare(y[0], y[1]) && compare(y[1], y[2]) && ...
+		
+		    * [in] x: input var for sort
+		
+		    * [in] dim: sort alone which dim
+		
+		    * [in] descending:  the elements are sorted in descending order or not(default False).
+		
+		    * [in] dtype: type of return indexes
+		
+		    * [out] index: index have the same size with sorted dim
+		
+		    * [out] value: sorted value
+		
+		    
+		    Example::
+		
+		            index, value = jt.argsort([11,13,12])
+		            # return [0 2 1], [11 12 13]
+		            index, value = jt.argsort([11,13,12], descending=True)
+		            # return [1 2 0], [13 12 11]
+		            index, value = jt.argsort([[11,13,12], [12,11,13]])
+		            # return [[0 2 1],[1 0 2]],  [[11 12 13],[11 12 13]]
+		            index, value = jt.argsort([[11,13,12], [12,11,13]], dim=0)
+		            # return [[0 1 0],[1 0 1]],  [[11 11 12],[12 13 13]]'''
+		...
+	def fetch(self, func: Callable)-> Var: ...
+	def arg_reduce(self, op: str, dim: int, keepdims: bool)-> List[Var]:		
+		'''Document:
+		*
+		    Returns the indices of the maximum / minimum of the input across a dimension.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] op:      "max" or "min". 
+		
+		    * [in] dim:     int. Specifies which dimension to be reduced.
+		
+		    * [in] keepdim: bool. Whether the output has ``dim`` retained or not.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(0, 10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 2 5]
+		         [6 7 1]], dtype=int32)
+		        >>> jt.arg_reduce(x, 'max', dim=1, keepdims=False)
+		        [jt.Var([2 1], dtype=int32), jt.Var([5 7], dtype=int32)]
+		        >>> jt.arg_reduce(x, 'min', dim=1, keepdims=False)
+		        [jt.Var([1 2], dtype=int32), jt.Var([5 7], dtype=int32)]'''
+		...
+	@overload
+	def reduce(self, op: str, dim: int, keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce(self, op: str, dims: Tuple[int]=(), keepdims: bool=False)-> Var: ...
+	@overload
+	def max(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the maximum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.max(x)
+		        jt.Var([4], dtype=int32)
+		        >>> x.max()
+		        jt.Var([4], dtype=int32)
+		        >>> x.max(dim=1)
+		        jt.Var([4 4], dtype=int32)
+		        >>> x.max(dim=1, keepdims=True)
+		        jt.Var([[4]
+		         [4]], dtype=int32)'''
+		...
+	@overload
+	def max(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the maximum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.max(x)
+		        jt.Var([4], dtype=int32)
+		        >>> x.max()
+		        jt.Var([4], dtype=int32)
+		        >>> x.max(dim=1)
+		        jt.Var([4 4], dtype=int32)
+		        >>> x.max(dim=1, keepdims=True)
+		        jt.Var([[4]
+		         [4]], dtype=int32)'''
+		...
+	@overload
+	def max(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the maximum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.max(x)
+		        jt.Var([4], dtype=int32)
+		        >>> x.max()
+		        jt.Var([4], dtype=int32)
+		        >>> x.max(dim=1)
+		        jt.Var([4 4], dtype=int32)
+		        >>> x.max(dim=1, keepdims=True)
+		        jt.Var([[4]
+		         [4]], dtype=int32)'''
+		...
+	@overload
+	def reduce_maximum(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the maximum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.max(x)
+		        jt.Var([4], dtype=int32)
+		        >>> x.max()
+		        jt.Var([4], dtype=int32)
+		        >>> x.max(dim=1)
+		        jt.Var([4 4], dtype=int32)
+		        >>> x.max(dim=1, keepdims=True)
+		        jt.Var([[4]
+		         [4]], dtype=int32)'''
+		...
+	@overload
+	def reduce_maximum(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the maximum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.max(x)
+		        jt.Var([4], dtype=int32)
+		        >>> x.max()
+		        jt.Var([4], dtype=int32)
+		        >>> x.max(dim=1)
+		        jt.Var([4 4], dtype=int32)
+		        >>> x.max(dim=1, keepdims=True)
+		        jt.Var([[4]
+		         [4]], dtype=int32)'''
+		...
+	@overload
+	def reduce_maximum(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the maximum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.max(x)
+		        jt.Var([4], dtype=int32)
+		        >>> x.max()
+		        jt.Var([4], dtype=int32)
+		        >>> x.max(dim=1)
+		        jt.Var([4 4], dtype=int32)
+		        >>> x.max(dim=1, keepdims=True)
+		        jt.Var([[4]
+		         [4]], dtype=int32)'''
+		...
+	@overload
+	def min(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the minimum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.min(x)
+		        jt.Var([0], dtype=int32)
+		        >>> x.min()
+		        jt.Var([0], dtype=int32)
+		        >>> x.min(dim=1)
+		        jt.Var([1 0], dtype=int32)
+		        >>> x.min(dim=1, keepdims=True)
+		        jt.Var([[1]
+		         [0]], dtype=int32)'''
+		...
+	@overload
+	def min(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the minimum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.min(x)
+		        jt.Var([0], dtype=int32)
+		        >>> x.min()
+		        jt.Var([0], dtype=int32)
+		        >>> x.min(dim=1)
+		        jt.Var([1 0], dtype=int32)
+		        >>> x.min(dim=1, keepdims=True)
+		        jt.Var([[1]
+		         [0]], dtype=int32)'''
+		...
+	@overload
+	def min(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the minimum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.min(x)
+		        jt.Var([0], dtype=int32)
+		        >>> x.min()
+		        jt.Var([0], dtype=int32)
+		        >>> x.min(dim=1)
+		        jt.Var([1 0], dtype=int32)
+		        >>> x.min(dim=1, keepdims=True)
+		        jt.Var([[1]
+		         [0]], dtype=int32)'''
+		...
+	@overload
+	def reduce_minimum(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the minimum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.min(x)
+		        jt.Var([0], dtype=int32)
+		        >>> x.min()
+		        jt.Var([0], dtype=int32)
+		        >>> x.min(dim=1)
+		        jt.Var([1 0], dtype=int32)
+		        >>> x.min(dim=1, keepdims=True)
+		        jt.Var([[1]
+		         [0]], dtype=int32)'''
+		...
+	@overload
+	def reduce_minimum(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the minimum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.min(x)
+		        jt.Var([0], dtype=int32)
+		        >>> x.min()
+		        jt.Var([0], dtype=int32)
+		        >>> x.min(dim=1)
+		        jt.Var([1 0], dtype=int32)
+		        >>> x.min(dim=1, keepdims=True)
+		        jt.Var([[1]
+		         [0]], dtype=int32)'''
+		...
+	@overload
+	def reduce_minimum(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the minimum elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.min(x)
+		        jt.Var([0], dtype=int32)
+		        >>> x.min()
+		        jt.Var([0], dtype=int32)
+		        >>> x.min(dim=1)
+		        jt.Var([1 0], dtype=int32)
+		        >>> x.min(dim=1, keepdims=True)
+		        jt.Var([[1]
+		         [0]], dtype=int32)'''
+		...
+	@overload
+	def sum(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the sum of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.sum(x)
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum()
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum(dim=1)
+		        jt.Var([7 6], dtype=int32)
+		        >>> x.sum(dim=1, keepdims=True)
+		        jt.Var([[7]
+		         [6]], dtype=int32)'''
+		...
+	@overload
+	def sum(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the sum of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.sum(x)
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum()
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum(dim=1)
+		        jt.Var([7 6], dtype=int32)
+		        >>> x.sum(dim=1, keepdims=True)
+		        jt.Var([[7]
+		         [6]], dtype=int32)'''
+		...
+	@overload
+	def sum(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the sum of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.sum(x)
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum()
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum(dim=1)
+		        jt.Var([7 6], dtype=int32)
+		        >>> x.sum(dim=1, keepdims=True)
+		        jt.Var([[7]
+		         [6]], dtype=int32)'''
+		...
+	@overload
+	def reduce_add(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the sum of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.sum(x)
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum()
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum(dim=1)
+		        jt.Var([7 6], dtype=int32)
+		        >>> x.sum(dim=1, keepdims=True)
+		        jt.Var([[7]
+		         [6]], dtype=int32)'''
+		...
+	@overload
+	def reduce_add(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the sum of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.sum(x)
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum()
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum(dim=1)
+		        jt.Var([7 6], dtype=int32)
+		        >>> x.sum(dim=1, keepdims=True)
+		        jt.Var([[7]
+		         [6]], dtype=int32)'''
+		...
+	@overload
+	def reduce_add(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the sum of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[4 1 2]
+		         [0 2 4]], dtype=int32)
+		        >>> jt.sum(x)
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum()
+		        jt.Var([13], dtype=int32)
+		        >>> x.sum(dim=1)
+		        jt.Var([7 6], dtype=int32)
+		        >>> x.sum(dim=1, keepdims=True)
+		        jt.Var([[7]
+		         [6]], dtype=int32)'''
+		...
+	@overload
+	def prod(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def prod(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def prod(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def product(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def product(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def product(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def reduce_multiply(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def reduce_multiply(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def reduce_multiply(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the product of all the elements in the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[7 5 5]
+		         [5 7 5]], dtype=int32)
+		        >>> jt.prod(x)
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod()
+		        jt.Var([30625], dtype=int32)
+		        >>> x.prod(dim=1)
+		        jt.Var([175 175], dtype=int32)
+		        >>> x.prod(dim=1, keepdims=True)
+		        jt.Var([[175]
+		         [175]], dtype=int32)'''
+		...
+	@overload
+	def reduce_logical_and(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Tests if all elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 1 1]
+		         [0 1 0]], dtype=int32)
+		        >>> jt.all_(x)
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_()
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.all_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def reduce_logical_and(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Tests if all elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 1 1]
+		         [0 1 0]], dtype=int32)
+		        >>> jt.all_(x)
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_()
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.all_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def reduce_logical_and(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Tests if all elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 1 1]
+		         [0 1 0]], dtype=int32)
+		        >>> jt.all_(x)
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_()
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.all_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def all_(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Tests if all elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 1 1]
+		         [0 1 0]], dtype=int32)
+		        >>> jt.all_(x)
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_()
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.all_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def all_(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Tests if all elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 1 1]
+		         [0 1 0]], dtype=int32)
+		        >>> jt.all_(x)
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_()
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.all_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def all_(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Tests if all elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 1 1]
+		         [0 1 0]], dtype=int32)
+		        >>> jt.all_(x)
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_()
+		        jt.Var([False], dtype=int32)
+		        >>> x.all_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.all_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def reduce_logical_or(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Tests if any elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 0 1]
+		         [0 0 0]], dtype=int32)
+		        >>> jt.any_(x)
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_()
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.any_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def reduce_logical_or(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Tests if any elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 0 1]
+		         [0 0 0]], dtype=int32)
+		        >>> jt.any_(x)
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_()
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.any_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def reduce_logical_or(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Tests if any elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 0 1]
+		         [0 0 0]], dtype=int32)
+		        >>> jt.any_(x)
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_()
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.any_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def any_(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Tests if any elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 0 1]
+		         [0 0 0]], dtype=int32)
+		        >>> jt.any_(x)
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_()
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.any_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def any_(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Tests if any elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 0 1]
+		         [0 0 0]], dtype=int32)
+		        >>> jt.any_(x)
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_()
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.any_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def any_(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Tests if any elements in input evaluate to True.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(2, shape=(2, 3))
+		        >>> x
+		        jt.Var([[1 0 1]
+		         [0 0 0]], dtype=int32)
+		        >>> jt.any_(x)
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_()
+		        jt.Var([True], dtype=int32)
+		        >>> x.any_(dim=1)
+		        jt.Var([True False], dtype=int32)
+		        >>> x.any_(dim=1, keepdims=True)
+		        jt.Var([[True]
+		         [False]], dtype=int32)'''
+		...
+	@overload
+	def reduce_logical_xor(self, dim: int, keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce_logical_xor(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce_logical_xor(self, dims_mask: int, keepdims_mask: int)-> Var: ...
+	@overload
+	def reduce_bitwise_and(self, dim: int, keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce_bitwise_and(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce_bitwise_and(self, dims_mask: int, keepdims_mask: int)-> Var: ...
+	@overload
+	def reduce_bitwise_or(self, dim: int, keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce_bitwise_or(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce_bitwise_or(self, dims_mask: int, keepdims_mask: int)-> Var: ...
+	@overload
+	def reduce_bitwise_xor(self, dim: int, keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce_bitwise_xor(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var: ...
+	@overload
+	def reduce_bitwise_xor(self, dims_mask: int, keepdims_mask: int)-> Var: ...
+	@overload
+	def mean(self, dim: int, keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the mean value of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[9 4 4]
+		         [1 9 6]], dtype=int32)
+		        >>> jt.mean(x)
+		        jt.Var([5.5000005], dtype=float32)
+		        >>> x.mean()
+		        jt.Var([5.5000005], dtype=float32)
+		        >>> x.mean(dim=1)
+		        jt.Var([5.666667  5.3333335], dtype=float32)
+		        >>> x.mean(dim=1, keepdims=True)
+		        jt.Var([[5.666667 ]
+		         [5.3333335]], dtype=float32)'''
+		...
+	@overload
+	def mean(self, dims: Tuple[int]=(), keepdims: bool=False)-> Var:		
+		'''Document:
+		*
+		    Returns the mean value of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[9 4 4]
+		         [1 9 6]], dtype=int32)
+		        >>> jt.mean(x)
+		        jt.Var([5.5000005], dtype=float32)
+		        >>> x.mean()
+		        jt.Var([5.5000005], dtype=float32)
+		        >>> x.mean(dim=1)
+		        jt.Var([5.666667  5.3333335], dtype=float32)
+		        >>> x.mean(dim=1, keepdims=True)
+		        jt.Var([[5.666667 ]
+		         [5.3333335]], dtype=float32)'''
+		...
+	@overload
+	def mean(self, dims_mask: int, keepdims_mask: int)-> Var:		
+		'''Document:
+		*
+		    Returns the mean value of the input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+		
+		    * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(10, shape=(2, 3))
+		        >>> x
+		        jt.Var([[9 4 4]
+		         [1 9 6]], dtype=int32)
+		        >>> jt.mean(x)
+		        jt.Var([5.5000005], dtype=float32)
+		        >>> x.mean()
+		        jt.Var([5.5000005], dtype=float32)
+		        >>> x.mean(dim=1)
+		        jt.Var([5.666667  5.3333335], dtype=float32)
+		        >>> x.mean(dim=1, keepdims=True)
+		        jt.Var([[5.666667 ]
+		         [5.3333335]], dtype=float32)'''
+		...
+	def clone(self)-> Var: ...
+	def unary(self, op: str)-> Var: ...
+	def cast(self, op: str)-> Var: ...
+	def int8(self)-> Var: ...
+	def int16(self)-> Var: ...
+	def int32(self)-> Var: ...
+	def int64(self)-> Var: ...
+	def uint8(self)-> Var: ...
+	def uint16(self)-> Var: ...
+	def uint32(self)-> Var: ...
+	def uint64(self)-> Var: ...
+	def float32(self)-> Var: ...
+	def float64(self)-> Var: ...
+	def abs(self)-> Var:		
+		'''Document:
+		*
+		    Returns the absolute value of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x:   the input jt.Var
+		
+		    ----------------
+		    
+		    Example-1::
+		        >>> jt.abs(jt.float32([-1, 0, 1]))
+		        jt.Var([1. 0. 1.], dtype=float32)'''
+		...
+	def negative(self)-> Var:		
+		'''Document:
+		*
+		    Returns the negative value of the input ``x``. 
+		
+		    This operator is equavilant to ``-x``.
+		
+		    ----------------
+		
+		    * [in] x:   the input jt.Var.
+		
+		    ----------------
+		    
+		    Example-1::
+		        >>> jt.negative(jt.float32([-1, 0, 1]))
+		        jt.Var([ 1. -0. -1.], dtype=float32)'''
+		...
+	def logical_not(self)-> Var:		
+		'''Document:
+		*
+		    Returns the logical NOT of the input ``x``. 
+		     
+		    ----------------
+		
+		    * [in] x: the input jt.Var, integal or boolean.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> jt.logical_not(jt.int32([-1, 0, 1]))
+		        jt.Var([False  True False], dtype=bool)'''
+		...
+	def bitwise_not(self)-> Var:		
+		'''Document:
+		*
+		    Returns the bitwise NOT of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var, integal or boolean.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> jt.bitwise_not(jt.int32([1, 2, -3]))
+		        jt.Var([-2 -3  2], dtype=int32)'''
+		...
+	def log(self)-> Var:		
+		'''Document:
+		*
+		    Returns the natural logarithm of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) * 2
+		        >>> a
+		        jt.Var([0.02863695 1.30122    1.6048753  1.140261  ], dtype=float32)
+		        >>> jt.log(a)
+		        jt.Var([-3.5530574   0.26330233  0.47304606  0.13125724], dtype=float32)'''
+		...
+	def exp(self)-> Var:		
+		'''Document:
+		*
+		     Returns the exponential of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) * 2
+		        >>> a
+		        jt.Var([1.9841381 1.4103996 0.5855549 1.4212812], dtype=float32)
+		        >>> jt.exp(a)
+		        jt.Var([7.2727766 4.0975924 1.7959872 4.1424246], dtype=float32)'''
+		...
+	def sqrt(self)-> Var:		
+		'''Document:
+		*
+		    Returns the square root of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) * 2
+		        >>> a
+		        jt.Var([0.81957287 0.5609612  0.07435933 1.7571875 ], dtype=float32)
+		        >>> jt.sqrt(a)
+		        jt.Var([0.90530264 0.7489734  0.27268907 1.3255895 ], dtype=float32)'''
+		...
+	def round(self)-> Var:		
+		'''Document:
+		*
+		    Returns the closest integer of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 2.101595    0.33055413 -0.44147047 -0.7720668 ], dtype=float32)
+		        >>> jt.round(a)
+		        jt.Var([ 2.0  0.0  0.0 -1.0], dtype=float32)'''
+		...
+	def floor(self)-> Var:		
+		'''Document:
+		*
+		     Returns the largest integer less than or equal to the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
+		        >>> jt.floor(a)
+		        jt.Var([-2.0 -1.0 -1.0 -1.0], dtype=float32)'''
+		...
+	def ceil(self)-> Var:		
+		'''Document:
+		*
+		    Returns the smallest integer greater than or equal to the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
+		        >>> jt.ceil(a)
+		        jt.Var([-1.0  0.0  0.0  0.0], dtype=float32)'''
+		...
+	def round_int(self)-> Var:		
+		'''Document:
+		*
+		    Returns the closest integer of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 2.101595    0.33055413 -0.44147047 -0.7720668 ], dtype=float32)
+		        >>> jt.round_int(a)
+		        jt.Var([ 2  0  0 -1], dtype=int32)'''
+		...
+	def floor_int(self)-> Var:		
+		'''Document:
+		*
+		     Returns the largest integer less than or equal to the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
+		        >>> jt.floor_int(a)
+		        jt.Var([-2 -1 -1 -1], dtype=int32)'''
+		...
+	def ceil_int(self)-> Var:		
+		'''Document:
+		*
+		    Returns the smallest integer greater than or equal to the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-1.0339162 -0.7259972 -0.9220003 -0.8449701], dtype=float32)
+		        >>> jt.ceil_int(a)
+		        jt.Var([-1  0  0  0], dtype=int32)'''
+		...
+	def sin(self)-> Var:		
+		'''Document:
+		*
+		    Returns the sine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+		        >>> jt.sin(a)
+		        jt.Var([ 0.32303742 -0.6527857  -0.76586854  0.9738172 ], dtype=float32)'''
+		...
+	def asin(self)-> Var:		
+		'''Document:
+		*
+		    Returns the arcsine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.09342023 -0.42522037  0.9264933  -0.785264  ], dtype=float32)
+		        >>> jt.asin(a)
+		        jt.Var([ 0.09355665 -0.43920535  1.1849847  -0.9031224 ], dtype=float32)'''
+		...
+	def arcsin(self)-> Var:		
+		'''Document:
+		*
+		    Returns the arcsine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.09342023 -0.42522037  0.9264933  -0.785264  ], dtype=float32)
+		        >>> jt.asin(a)
+		        jt.Var([ 0.09355665 -0.43920535  1.1849847  -0.9031224 ], dtype=float32)'''
+		...
+	def sinh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the hyperbolic sine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+		        >>> jt.sinh(a)
+		        jt.Var([ 0.3349012  -0.77276015 -0.9873369   2.9425898 ], dtype=float32)'''
+		...
+	def asinh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse hyperbolic sine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-1.9749726  -0.52341473  0.8906148   1.0338128 ], dtype=float32)
+		        >>> jt.asinh(a)
+		        jt.Var([-1.4323865  -0.5020559   0.8018747   0.90508187], dtype=float32)'''
+		...
+	def arcsinh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse hyperbolic sine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-1.9749726  -0.52341473  0.8906148   1.0338128 ], dtype=float32)
+		        >>> jt.asinh(a)
+		        jt.Var([-1.4323865  -0.5020559   0.8018747   0.90508187], dtype=float32)'''
+		...
+	def tan(self)-> Var:		
+		'''Document:
+		*
+		    Returns the tangent of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+		        >>> jt.tan(a)
+		        jt.Var([ 0.34133783 -0.8617148  -1.1910915  -4.283673  ], dtype=float32)'''
+		...
+	def atan(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse tangent of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-0.85885596  1.187804    0.47249675  0.95933187], dtype=float32)
+		        >>> jt.atan(a)
+		        jt.Var([-0.70961297  0.87102956  0.44140393  0.76464504], dtype=float32)'''
+		...
+	def arctan(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse tangent of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-0.85885596  1.187804    0.47249675  0.95933187], dtype=float32)
+		        >>> jt.atan(a)
+		        jt.Var([-0.70961297  0.87102956  0.44140393  0.76464504], dtype=float32)'''
+		...
+	def tanh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the hyperbolic tangent of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		    
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([-0.85885596  1.187804    0.47249675  0.95933187], dtype=float32)
+		        >>> jt.tanh(a)
+		        jt.Var([-0.6956678   0.82989657  0.4402144   0.7439787 ], dtype=float32)'''
+		...
+	def atanh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse hyperbolic tangent of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) * 2 - 1
+		        >>> a
+		        jt.Var([ 0.9062414  -0.799802   -0.27219176 -0.7274077 ], dtype=float32)
+		        >>> jt.atanh(a)
+		        jt.Var([ 1.5060828  -1.0980625  -0.27922946 -0.9231999 ], dtype=float32)'''
+		...
+	def arctanh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse hyperbolic tangent of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) * 2 - 1
+		        >>> a
+		        jt.Var([ 0.9062414  -0.799802   -0.27219176 -0.7274077 ], dtype=float32)
+		        >>> jt.atanh(a)
+		        jt.Var([ 1.5060828  -1.0980625  -0.27922946 -0.9231999 ], dtype=float32)'''
+		...
+	def cos(self)-> Var:		
+		'''Document:
+		*
+		    Returns the cosine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+		        >>> jt.cos(a)
+		        jt.Var([ 0.9463862  0.7575426  0.6429972 -0.2273323], dtype=float32)'''
+		...
+	def acos(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse cosine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) * 2 - 1
+		        >>> a
+		        jt.Var([ 0.5876564  0.740723  -0.667666   0.5371753], dtype=float32)
+		        >>> jt.acos(a)
+		        jt.Var([0.9426371 0.7366504 2.3018656 1.0037117], dtype=float32)'''
+		...
+	def arccos(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse cosine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) * 2 - 1
+		        >>> a
+		        jt.Var([ 0.5876564  0.740723  -0.667666   0.5371753], dtype=float32)
+		        >>> jt.acos(a)
+		        jt.Var([0.9426371 0.7366504 2.3018656 1.0037117], dtype=float32)'''
+		...
+	def cosh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the hyperbolic cosine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.32893723 -0.7112559  -0.872391    1.8001337 ], dtype=float32)
+		        >>> jt.cosh(a)
+		        jt.Var([1.0545894 1.2637873 1.405288  3.1078668], dtype=float32)'''
+		...
+	def acosh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse hyperbolic cosine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) + 1
+		        >>> a
+		        jt.Var([1.3609099 1.8137748 1.1146184 1.3911307], dtype=float32)
+		        >>> jt.acosh(a)
+		        jt.Var([0.8259237  1.2020639  0.47432774 0.8579033 ], dtype=float32)'''
+		...
+	def arccosh(self)-> Var:		
+		'''Document:
+		*
+		    Returns the inverse hyperbolic cosine of the input ``x``. 
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.rand(4) + 1
+		        >>> a
+		        jt.Var([1.3609099 1.8137748 1.1146184 1.3911307], dtype=float32)
+		        >>> jt.acosh(a)
+		        jt.Var([0.8259237  1.2020639  0.47432774 0.8579033 ], dtype=float32)'''
+		...
+	def sigmoid(self)-> Var:		
+		'''Document:
+		*
+		    Returns the sigmoid of the input ``x``. 
+		    
+		    .. math::
+		       out_i = \frac{1}{1 + e^{x_i}}
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.49443012  0.4305426  -1.0364404  -1.2628382 ], dtype=float32)
+		        >>> jt.sigmoid(a)
+		        jt.Var([0.62114954 0.6060032  0.2618374  0.2204857 ], dtype=float32)'''
+		...
+	def erf(self)-> Var:		
+		'''Document:
+		*
+		    Computes the error function of each element. The error function is defined as follows:
+		
+		    .. math::
+		        erf(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2} dt
+		
+		    ----------------
+		
+		    * [in] x: the input jt.Var.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randn(4)
+		        >>> a
+		        jt.Var([ 0.49443012  0.4305426  -1.0364404  -1.2628382 ], dtype=float32)
+		        >>> jt.erf(a)
+		        jt.Var([ 0.51559156  0.45739546 -0.85728306 -0.9258883 ], dtype=float32)'''
+		...
+	def transpose(self, axes: Tuple[int]=())-> Var: ...
+	def fuse_transpose(self, axes: Tuple[int]=())-> Var: ...
+	def safe_clip(self, left: float, right: float)-> Var:		
+		'''Document:
+		* Safe clip value to a range, and keep 
+		 the gradient pass thought.
+		 
+		    * [in] x:   input value
+		    * [in] left: float64 clip min value.
+		    * [in] right: float64 clip max value.'''
+		...
+	def array(self)-> Var: ...
+	def getitem(self, slices: slice)-> Var: ...
+	def candidate(self, fail_cond: str, dtype: str="int32")-> Var:		
+		'''Document:
+		*
+		    Candidate Operator Perform an indirect candidate filter by given a fail condition.
+		    
+		    x is input, y is output index, satisfy::
+		
+		        not fail_cond(y[0], y[1]) and
+		        not fail_cond(y[0], y[2]) and not fail_cond(y[1], y[2]) and
+		        ...
+		        ... and not fail_cond(y[m-2], y[m-1])
+		
+		    Where m is number of selected candidates.
+		
+		    Pseudo code::
+		    
+		        y = []
+		        for i in range(n):
+		            pass = True
+		            for j in y:
+		                if (@fail_cond):
+		                    pass = false
+		                    break
+		            if (pass):
+		                y.append(i)
+		        return y
+		
+		    * [in] x:   input var for filter
+		
+		    * [in] fail_cond:   code for fail condition
+		
+		    * [in] dtype:   type of return indexes
+		
+		    * [out] index: .
+		
+		    Example::
+		
+		        jt.candidate(jt.random(100,2), '(@x(j,0)>@x(i,0))or(@x(j,1)>@x(i,1))')
+		        # return y satisfy:
+		        #    x[y[0], 0] <= x[y[1], 0] and x[y[1], 0] <= x[y[2], 0] and ... and x[y[m-2], 0] <= x[y[m-1], 0] and
+		        #    x[y[0], 1] <= x[y[1], 1] and x[y[1], 1] <= x[y[2], 1] and ... and x[y[m-2], 1] <= x[y[m-1], 1]'''
+		...
+	@overload
+	def code(self, outputs: List[Var], cpu_src: str="", cpu_grad_src: List[str]={}, cpu_header: str="", cuda_src: str="", cuda_grad_src: List[str]={}, cuda_header: str="")-> List[Var]:		
+		'''Document:
+		*
+		    Code Operator for easily customized op.
+		
+		    ----------------
+		
+		    * [in] shape:   the output shape, a integer array
+		    
+		    * [in] dtype:   the output data type
+		    
+		    * [in] inputs:  A list of input jittor Vars
+		    
+		    * [in] cpu_src: cpu source code string, buildin value:
+		
+		            *   in{x}, in{x}_shape{y}, in{x}_stride{y}, in{x}_type, in{x}_p, @in0(...)
+		            *   out{x}, out{x}_shape{y}, out{x}_stride{y}, out{x}_type, out{x}_p, @out0(...)
+		            *   out, out_shape{y}, out_stride{y}, out_type, out_p, @out(...)
+		    
+		    * [in] cpu_header: cpu header code string.
+		
+		    * [in] cuda_src: cuda source code string.
+		
+		    * [in] cuda_header: cuda header code string.
+		
+		    ----------------
+		    
+		    Example-1::
+		
+		        from jittor import Function
+		        import jittor as jt
+		
+		        class Func(Function):
+		            def execute(self, x):
+		                self.save_vars = x
+		                return jt.code(x.shape, x.dtype, [x],
+		                    cpu_src="""
+		                        for (int i=0; i<in0_shape0; i++)
+		                            @out(i) = @in0(i)*@in0(i)*2;
+		                    """)
+		
+		            def grad(self, grad_x):
+		                x = self.save_vars
+		                return jt.code(x.shape, x.dtype, [x, grad_x],
+		                    cpu_src="""
+		                        for (int i=0; i<in0_shape0; i++)
+		                            @out(i) = @in1(i)*@in0(i)*4;
+		                    """)
+		
+		        a = jt.random([10])
+		        func = Func()
+		        b = func(a)
+		        print(b)
+		        print(jt.grad(b,a))
+		
+		    Example-2::
+		
+		        a = jt.array([3,2,1])
+		        b = jt.code(a.shape, a.dtype, [a],
+		            cpu_header="""
+		                #include <algorithm>
+		                @alias(a, in0)
+		                @alias(b, out)
+		            """,
+		            cpu_src="""
+		                for (int i=0; i<a_shape0; i++)
+		                    @b(i) = @a(i);
+		                std::sort(&@b(0), &@b(in0_shape0));
+		            """
+		        )
+		        assert (b.data==[1,2,3]).all()
+		
+		    Example-3::
+		
+		        #This example shows how to set multiple outputs in code op.
+		        a = jt.array([3,2,1])
+		        b,c = jt.code([(1,), (1,)], [a.dtype, a.dtype], [a],
+		            cpu_header="""
+		                #include <iostream>
+		                using namespace std;
+		            """,
+		            cpu_src="""
+		                @alias(a, in0)
+		                @alias(b, out0)
+		                @alias(c, out1)
+		                @b(0) = @c(0) = @a(0);
+		                for (int i=0; i<a_shape0; i++) {
+		                    @b(0) = std::min(@b(0), @a(i));
+		                    @c(0) = std::max(@c(0), @a(i));
+		                }
+		                cout << "min:" << @b(0) << " max:" << @c(0) << endl;
+		            """
+		        )
+		        assert b.data == 1, b
+		        assert c.data == 3, c
+		
+		    Example-4::
+		
+		        #This example shows how to use dynamic shape of jittor variables.
+		        a = jt.array([5,-4,3,-2,1])
+		        
+		        # negtive shape for max size of vary dimension
+		        b,c = jt.code([(-5,), (-5,)], [a.dtype, a.dtype], [a],
+		            cpu_src="""
+		                @alias(a, in0)
+		                @alias(b, out0)
+		                @alias(c, out1)
+		                int num_b=0, num_c=0;
+		                for (int i=0; i<a_shape0; i++) {
+		                    if (@a(i)>0)
+		                        @b(num_b++) = @a(i);
+		                    else
+		                        @c(num_c++) = @a(i);
+		                }
+		                b->set_shape({num_b});
+		                c->set_shape({num_c});
+		            """
+		        )
+		        assert (b.data == [5,3,1]).all()
+		        assert (c.data == [-4,-2]).all()
+		
+		
+		    CUDA Example-1::
+		
+		        #This example shows how to use CUDA in code op.
+		        import jittor as jt
+		        from jittor import Function
+		        jt.flags.use_cuda = 1
+		
+		        class Func(Function):
+		            def execute(self, a, b):
+		                self.save_vars = a, b
+		                return jt.code(a.shape, a.dtype, [a,b],
+		                    cuda_src="""
+		                        __global__ static void kernel1(@ARGS_DEF) {
+		                            @PRECALC
+		                            int i = threadIdx.x + blockIdx.x * blockDim.x;
+		                            int stride = blockDim.x * gridDim.x;
+		                            for (; i<in0_shape0; i+=stride)
+		                                @out(i) = @in0(i)*@in1(i);
+		                        }
+		                        kernel1<<<(in0_shape0-1)/1024+1, 1024>>>(@ARGS);
+		                    """)
+		
+		            def grad(self, grad):
+		                a, b = self.save_vars
+		                return jt.code([a.shape, b.shape], [a.dtype, b.dtype], [a, b, grad],
+		                    cuda_src="""
+		                        __global__ static void kernel2(@ARGS_DEF) {
+		                            @PRECALC
+		                            int i = threadIdx.x + blockIdx.x * blockDim.x;
+		                            int stride = blockDim.x * gridDim.x;
+		                            for (; i<in0_shape0; i+=stride) {
+		                                @out0(i) = @in2(i)*@in1(i);
+		                                @out1(i) = @in2(i)*@in0(i);
+		                            }
+		                        }
+		                        kernel2<<<(in0_shape0-1)/1024+1, 1024>>>(@ARGS);
+		                    """)
+		                
+		        a = jt.random([100000])
+		        b = jt.random([100000])
+		        func = Func()
+		        c = func(a,b)
+		        print(c)
+		        print(jt.grad(c, [a, b]))
+		
+		    CUDA Example-2::
+		    
+		        #This example shows how to use multi dimension data with CUDA.
+		        import jittor as jt
+		        from jittor import Function
+		        jt.flags.use_cuda = 1
+		
+		        class Func(Function):
+		            def execute(self, a, b):
+		                self.save_vars = a, b
+		                return jt.code(a.shape, a.dtype, [a,b],
+		                    cuda_src="""
+		                        __global__ static void kernel1(@ARGS_DEF) {
+		                            @PRECALC
+		                            for (int i=blockIdx.x; i<in0_shape0; i+=gridDim.x)
+		                            for (int j=threadIdx.x; j<in0_shape1; j+=blockDim.x)
+		                                @out(i,j) = @in0(i,j)*@in1(i,j);
+		                        }
+		                        kernel1<<<32, 32>>>(@ARGS);
+		                    """)
+		
+		            def grad(self, grad):
+		                a, b = self.save_vars
+		                return jt.code([a.shape, b.shape], [a.dtype, b.dtype], [a, b, grad],
+		                    cuda_src="""
+		                        __global__ static void kernel2(@ARGS_DEF) {
+		                            @PRECALC
+		                            for (int i=blockIdx.x; i<in0_shape0; i+=gridDim.x)
+		                            for (int j=threadIdx.x; j<in0_shape1; j+=blockDim.x) {
+		                                @out0(i,j) = @in2(i,j)*@in1(i,j);
+		                                @out1(i,j) = @in2(i,j)*@in0(i,j);
+		                            }
+		                        }
+		                        kernel2<<<32, 32>>>(@ARGS);
+		                    """)
+		                
+		        a = jt.random((100,100))
+		        b = jt.random((100,100))
+		        func = Func()
+		        c = func(a,b)
+		        print(c)
+		        print(jt.grad(c, [a, b]))'''
+		...
+	def copy(self)-> Var: ...
+	def setitem(self, slices: slice, y: Var, op: str="void")-> Var: ...
+	@overload
+	def broadcast(self, shape: Tuple[int], dims: Tuple[int]=())-> Var:		
+		'''Document:
+		*
+		    Broadcast ``x`` to a given shape.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] shape:   the output shape.
+		
+		    * [in] dims:    specifies the new dimension in the output shape, an integer array.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(0, 10, shape=(2, 2))
+		        >>> x
+		        jt.Var([[8 1]
+		         [7 6]], dtype=int32)
+		        >>> jt.broadcast(x, shape=(2, 3, 2), dims=[1])
+		        jt.Var([[[8 1]
+		          [8 1]
+		          [8 1]],
+		         [[7 6]
+		          [7 6]
+		          [7 6]]], dtype=int32)'''
+		...
+	@overload
+	def broadcast(self, y: Var, dims: Tuple[int]=())-> Var:		
+		'''Document:
+		*
+		    Broadcast ``x`` to a given shape.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] shape:   the output shape.
+		
+		    * [in] dims:    specifies the new dimension in the output shape, an integer array.
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> x = jt.randint(0, 10, shape=(2, 2))
+		        >>> x
+		        jt.Var([[8 1]
+		         [7 6]], dtype=int32)
+		        >>> jt.broadcast(x, shape=(2, 3, 2), dims=[1])
+		        jt.Var([[[8 1]
+		          [8 1]
+		          [8 1]],
+		         [[7 6]
+		          [7 6]
+		          [7 6]]], dtype=int32)'''
+		...
+	def broadcast_var(self, y: Var, dims: Tuple[int]=())-> Var:		
+		'''Document:
+		*
+		    Broadcast ``x`` to the same shape as ``y``.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] y:       the reference jt.Var.
+		
+		    * [in] dims:    specifies the new dimension in the output shape, an integer array.
+		
+		    ----------------
+		
+		    .. note::
+		      jt.broadcast_var(x, y, dims) is an alias of jt.broadcast(x, y, dims)
+		
+		    Example-1::
+		        >>> x = jt.randint(0, 10, shape=(2, 2))
+		        >>> x
+		        jt.Var([[8 1]
+		         [7 6]], dtype=int32)
+		        >>> y = jt.randint(0, 10, shape=(2, 3, 2))
+		        >>> jt.broadcast(x, y, dims=[1])
+		        jt.Var([[[8 1]
+		          [8 1]
+		          [8 1]],
+		         [[7 6]
+		          [7 6]
+		          [7 6]]], dtype=int32)
+		        >>> jt.broadcast_var(x, y, dims=[1])
+		        jt.Var([[[8 1]
+		          [8 1]
+		          [8 1]],
+		         [[7 6]
+		          [7 6]
+		          [7 6]]], dtype=int32)'''
+		...
+	def reshape(self, shape: Tuple[int])-> Var:		
+		'''Document:
+		*
+		    Returns a tensor with the same data and number of elements as input, but with the specified shape. 
+		
+		    A single dimension may be -1, in which case it's inferred from the remaining dimensions and the number of elements in input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var
+		
+		    * [in] shape:   the output shape, an integer array
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randint(0, 10, shape=(12,))
+		        >>> a
+		        jt.Var([4 0 8 4 6 3 1 8 1 1 2 2], dtype=int32)
+		        >>> jt.reshape(a, (3, 4))
+		        jt.Var([[4 0 8 4]
+		         [6 3 1 8]
+		         [1 1 2 2]], dtype=int32)
+		        >>> jt.reshape(a, (-1, 6))
+		        jt.Var([[4 0 8 4 6 3]
+		         [1 8 1 1 2 2]], dtype=int32)'''
+		...
+	def reindex_reduce(self, op: str, shape: Tuple[int], indexes: List[str], overflow_conditions: List[str]={}, extras: List[Var]={})-> Var:		
+		'''Document:
+		*
+		    Reindex Reduce Operator is a many-to-one map operator.
+		    It performs equivalent Python-pseudo implementation below::
+		
+		        # input is y, output is x
+		        n = len(y.shape)-1
+		        m = len(shape)-1
+		        k = len(overflow_conditions)-1
+		        x = np.zeros(shape, y.dtype)
+		        x[:] = initial_value(op)
+		        for i0 in range(y.shape[0]): # 1-st loop
+		            for i1 in range(y.shape[1]): # 2-nd loop
+		                ...... # many loops
+		                for in in range(y.shape[n]) # n+1 -th loop
+		                    # indexes[i] is a c++ style integer expression consisting of i0,i1,...,in
+		                    xi0,xi1,...,xim = indexes[0],indexes[1],...,indexes[m]
+		                    if not is_overflow(xi0,xi1,...,xim):
+		                        x[xi0,xi1,...,xim] = op(x[xi0,xi1,...,xim], y[i0,i1,...,in])
+		
+		        # is_overflow is defined as following
+		        def is_overflow(xi0,xi1,...,xim):
+		            return (
+		                xi0 < 0 || xi0 >= shape[0] ||
+		                xi1 < 0 || xi1 >= shape[1] ||
+		                ......
+		                xim < 0 || xim >= shape[m] ||
+		
+		                # overflow_conditions[i] is a c++ style boolean expression consisting of i0,i1,...,in
+		                overflow_conditions[0] ||
+		                overflow_conditions[1] ||
+		                ......
+		                overflow_conditions[k]
+		            )
+		
+		    * [in] y:   A input jittor Var
+		    
+		    * [in] op:  a string represent the reduce operation type
+		    
+		    * [in] shape:   the output shape, a integer array
+		    
+		    * [in] indexes: array of c++ style integer expression, its length should be the same with length of output shape, some buildin variables it can use are::
+		    
+		             XDIM, xshape0, ..., xshapem, xstride0, ..., xstridem
+		             YDIM, yshape0, ..., yshapen, ystride0, ..., ystriden
+		             i0, i1, ..., in
+		             @e0(...), @e1(...) for extras input index
+		             e0p, e1p , ... for extras input pointer
+		    
+		    * [in] overflow_conditions: array of c++ style boolean expression, it length can be vary. the buildin variables it can use are the same with indexes.
+		    
+		    * [in] extras:  extra var used for index
+		    
+		    Example 
+		
+		    Pooling implemented by reindex operation::
+		
+		        def pool(x, size, op):
+		            N,H,W,C = x.shape
+		            h = (H+size-1)//size
+		            w = (W+size-1)//size
+		            return x.reindex_reduce(op, [N,h,w,C], [
+		                "i0", # Nid
+		                f"i1/{size}", # Hid
+		                f"i2/{size}", # Wid
+		                "i3", # Cid
+		            ])'''
+		...
+	def sync(self, device_sync: bool=False): ...
+	def fetch_sync(self)-> numpy.ndarray: ...
+	def numpy(self)-> numpy.ndarray: ...
+	def assign(self, v: Var)-> Var:		
+		'''Document:
+		*
+		     * assign the data from another Var.'''
+		...
+	def update(self, v: Var)-> Var:		
+		'''Document:
+		*
+		     * update parameter and global variable,
+		     * different from assign, it will
+		     * stop grad between origin var and assigned var, and
+		     * will update in the background'''
+		...
+	def _update(self, v: Var)-> Var:		
+		'''Document:
+		*
+		     * update parameter without set attribute.'''
+		...
+	def swap(self, v: Var)-> Var:		
+		'''Document:
+		*
+		     * swap the data with another Var.'''
+		...
+	@overload
+	def name(self, s: str)-> Var:		
+		'''Document:
+		* 
+		     * set the name of the Var.'''
+		...
+	@overload
+	def name(self)-> str:		
+		'''Document:
+		* 
+		     * set the name of the Var.'''
+		...
+	def numel(self)-> int:		
+		'''Document:
+		* 
+		     * return the number of elements in the Var.'''
+		...
+	def stop_grad(self)-> Var:		
+		'''Document:
+		* 
+		     * disable the gradient calculation for the Var.'''
+		...
+	def is_stop_grad(self)-> bool:		
+		'''Document:
+		*
+		     * return True if the gradient is stopped.'''
+		...
+	def detach(self)-> Var:		
+		'''Document:
+		 detach the grad'''
+		...
+	def stop_fuse(self)-> Var:		
+		'''Document:
+		*
+		     * stop operator fusion.'''
+		...
+	def is_stop_fuse(self)-> bool:		
+		'''Document:
+		*
+		     * return True if operator fusion is stopped.'''
+		...
+	def item(self)-> float | int | bool:		
+		'''Document:
+		*
+		     * returns the Python number if the Var contains only one element.
+		     * For other cases, see data().'''
+		...
+	def share_with(self, other: Var)-> Var: ...
+	def debug_msg(self)-> str:		
+		'''Document:
+		*
+		     * print the information of the Var to debug.'''
+		...
+	def _input(self, i: int)-> Var: ...
+	def _add_dependency(self, vars: List[Var])-> Var:		
+		'''Document:
+		 Add dependency, make var computed after vars'''
+		...
+	def compile_options(self): ...
+	def data(self)-> numpy.ndarray:		
+		'''Document:
+		*
+		     * get a numpy array which shares the data with the Var.'''
+		...
+	def dtype(self)-> str:		
+		'''Document:
+		*
+		     * return the data type of the Var.'''
+		...
+	def grad(self)-> int:		
+		'''Document:
+		 Jittor Var doesn't have this interface, please change your code as below::
+		
+		    model = Model()
+		    optimizer = SGD(model.parameters())
+		    ...
+		    optimizer.backward(loss)
+		    
+		    for p in model.parameters():
+		        # prev code:
+		        # grad = p.grad
+		
+		        # change to:
+		        grad = p.opt_grad(optimizer)'''
+		...
+	def ndim(self)-> int:		
+		'''Document:
+		*
+		     * return the number of dimensions.'''
+		...
+	def requires_grad(self)-> bool:		
+		'''Document:
+		* 
+		     * return True if the Var requires gradient calculation.
+		     * @see is_stop_grad'''
+		...
+	def shape(self)-> Tuple[int]:		
+		'''Document:
+		* 
+		     * return the shape of the Var.'''
+		...
+	def uncertain_shape(self)-> Tuple[int]: ...
+	def view(self, x: Var, shape: Tuple[int])-> Var:		
+		'''Document:
+		*
+		    Returns a tensor with the same data and number of elements as input, but with the specified shape. 
+		
+		    A single dimension may be -1, in which case it's inferred from the remaining dimensions and the number of elements in input.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var
+		
+		    * [in] shape:   the output shape, an integer array
+		
+		    ----------------
+		
+		    Example-1::
+		        >>> a = jt.randint(0, 10, shape=(12,))
+		        >>> a
+		        jt.Var([4 0 8 4 6 3 1 8 1 1 2 2], dtype=int32)
+		        >>> jt.reshape(a, (3, 4))
+		        jt.Var([[4 0 8 4]
+		         [6 3 1 8]
+		         [1 1 2 2]], dtype=int32)
+		        >>> jt.reshape(a, (-1, 6))
+		        jt.Var([[4 0 8 4 6 3]
+		         [1 8 1 1 2 2]], dtype=int32)'''
+		...
+	def permute(self, x: Var, axes: Tuple[int]=())-> Var: ...
+	def astype(self, x: Var, op: str)-> Var: ...
+	def expand_as(self, x: Var, y: Var, dims: Tuple[int]=())-> Var:		
+		'''Document:
+		*
+		    Broadcast ``x`` to the same shape as ``y``.
+		
+		    ----------------
+		
+		    * [in] x:       the input jt.Var.
+		
+		    * [in] y:       the reference jt.Var.
+		
+		    * [in] dims:    specifies the new dimension in the output shape, an integer array.
+		
+		    ----------------
+		
+		    .. note::
+		      jt.broadcast_var(x, y, dims) is an alias of jt.broadcast(x, y, dims)
+		
+		    Example-1::
+		        >>> x = jt.randint(0, 10, shape=(2, 2))
+		        >>> x
+		        jt.Var([[8 1]
+		         [7 6]], dtype=int32)
+		        >>> y = jt.randint(0, 10, shape=(2, 3, 2))
+		        >>> jt.broadcast(x, y, dims=[1])
+		        jt.Var([[[8 1]
+		          [8 1]
+		          [8 1]],
+		         [[7 6]
+		          [7 6]
+		          [7 6]]], dtype=int32)
+		        >>> jt.broadcast_var(x, y, dims=[1])
+		        jt.Var([[[8 1]
+		          [8 1]
+		          [8 1]],
+		         [[7 6]
+		          [7 6]
+		          [7 6]]], dtype=int32)'''
+		...
+class Flags:
+	'''A set of flags to configure jittor running behaviors'''
+	addr2line_path: str
+	'''Path of addr2line. Default: ""'''
+	auto_convert_64_to_32: int
+	'''auto convert 64bit numpy array into 32bit jittor array. Default: 1'''
+	cache_path: str
+	'''Cache path of jittor. Default: ""'''
+	cc_flags: str
+	'''Flags of C++ compiler. Default: ""'''
+	cc_path: str
+	'''Path of C++ compiler. Default: ""'''
+	cc_type: str
+	'''Type of C++ compiler(clang, icc, g++). Default: ""): Type of C++ compiler(clang, icc, g++'''
+	check_graph: int
+	'''Unify graph sanity check. Default: 0'''
+	compile_options: Any
+	'''Override the default loop transfrom options. Default: {}'''
+	enable_tuner: int
+	'''Enable tuner. Default: 1'''
+	exclude_pass: str
+	'''Don't run certian pass. Default: ""'''
+	extra_gdb_cmd: str
+	'''Extra command pass to GDB, seperate by(;) . Default: ""): Extra command pass to GDB, seperate by(;'''
+	gdb_attach: int
+	'''gdb attach self process. Default: 0'''
+	gdb_path: str
+	'''Path of GDB. Default: ""'''
+	gopt_disable: int
+	'''Disable graph optimizer. Default: 0'''
+	has_pybt: int
+	'''GDB has pybt or not. Default: 0'''
+	jit_search_kernel: int
+	'''Jit search for the fastest kernel. Default: 0'''
+	jit_search_rerun: int
+	'''. Default: 10'''
+	jit_search_warmup: int
+	'''. Default: 2'''
+	jittor_path: str
+	'''Source path of jittor. Default: ""'''
+	l1_cache_size: int
+	'''size of level 1 cache (byte). Default: 32768): size of level 1 cache (byte'''
+	lazy_execution: int
+	'''Default enabled, if disable, use immediately eager execution rather than lazy execution, This flag makes error message and traceback infomation better. But this flag will raise memory consumption and lower the performance. Default: 1'''
+	log_file: str
+	'''log to file, mpi env will add $OMPI_COMM_WORLD_RANK suffix. Default: ""'''
+	log_silent: int
+	'''The log will be completely silent. Default: 0'''
+	log_sync: int
+	'''Set log printed synchronously. Default: 1'''
+	log_v: int
+	'''Verbose level of logging. Default: 0'''
+	log_vprefix: str
+	'''Verbose level of logging prefix. Default: ""'''
+	no_fuse: bool
+	'''No fusion optimization for all jittor Var creation. Default: 0'''
+	no_grad: bool
+	'''No grad for all jittor Var creation. Default: 0'''
+	nvcc_flags: str
+	'''Flags of CUDA C++ compiler. Default: ""'''
+	nvcc_path: str
+	'''Path of CUDA C++ compiler. Default: ""'''
+	para_opt_level: int
+	'''para_opt_level. Default: 3'''
+	profile_memory_enable: int
+	'''Enable memory profiler. Default: 0'''
+	profiler_enable: int
+	'''Enable profiler. Default: 0'''
+	profiler_hide_relay: int
+	'''Profiler hide relayed op. Default: 0'''
+	profiler_rerun: int
+	'''Profiler rerun. Default: 0'''
+	profiler_warmup: int
+	'''Profiler warmup. Default: 0'''
+	python_path: str
+	'''Path of python interpreter. Default: ""'''
+	rewrite_op: int
+	'''Rewrite source file of jit operator or not. Default: 1'''
+	stat_allocator_total_alloc_byte: int
+	'''Total alloc byte. Default: 0'''
+	stat_allocator_total_alloc_call: int
+	'''Number of alloc function call. Default: 0'''
+	stat_allocator_total_free_byte: int
+	'''Total alloc byte. Default: 0'''
+	stat_allocator_total_free_call: int
+	'''Number of alloc function call. Default: 0'''
+	trace_depth: int
+	'''trace depth for GDB. Default: 10'''
+	trace_py_var: int
+	'''Trace py stack max depth for debug. Default: 0'''
+	trace_var_data: int
+	'''Trace py stack max depth for debug. Default: 0'''
+	try_use_32bit_index: int
+	'''If not overflow, try to use 32 bit type as index type. Default: 0'''
+	update_queue_auto_flush_delay: int
+	'''when size of a update queue is great than this value, update queue trigger auto flush(default 2). Default: 2): when size of a update queue is great than this value, update queue trigger auto flush(default 2'''
+	use_cuda: int
+	'''Use cuda or not. 1 for trying to use cuda, 2 for forcing to use cuda. Default: 0'''
+	use_nfef_allocator: int
+	'''Enable never free exact fit allocator. Default: 0'''
+	use_parallel_op_compiler: int
+	'''Number of threads that parallel op comiler used, default 16, set this value to 0 will disable parallel op compiler. Default: 16'''
+	use_sfrl_allocator: int
+	'''Enable sfrl allocator. Default: 1'''
+	use_stat_allocator: int
+	'''Enable stat allocator. Default: 0'''
+	use_temp_allocator: int
+	'''Enable temp allocator. Default: 1'''
+flags: Flags
+'''Jittor running time flags instance'''
diff --git a/python/jittor/compile_extern.py b/python/jittor/compile_extern.py
index 660a0a92..7fbf6c61 100644
--- a/python/jittor/compile_extern.py
+++ b/python/jittor/compile_extern.py
@@ -219,8 +219,8 @@ def setup_cuda_extern():
                 msg += """Develop version of CUDNN not found, 
 please refer to CUDA offical tar file installation: 
 https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar"""
-                if platform.machine() in ["x86_64", "AMD64"]:
-                    msg += f"""
+            if platform.machine() in ["x86_64", "AMD64"]:
+                msg += f"""
 or you can let jittor install cuda and cudnn for you:
 >>> python3.{sys.version_info.minor} -m jittor_utils.install_cuda
 """
@@ -309,7 +309,7 @@ def install_cutt(root_folder):
         if md5 != true_md5:
             os.remove(fullname)
             shutil.rmtree(dirname)
-    if not os.path.isfile(os.path.join(dirname, "lib/libcutt"+so)):
+    if not os.path.isfile(os.path.join(cache_path, "libcutt"+so)):
         LOG.i("Downloading cutt...")
         download_url_to_local(url, filename, root_folder, true_md5)
 
@@ -337,8 +337,7 @@ def install_cutt(root_folder):
                 continue
             files2.append(f)
         cutt_flags = cc_flags+opt_flags+cutt_include
-        os.makedirs(dirname+"/lib", exist_ok=True)
-        compile(cc_path, cutt_flags, files2, dirname+"/lib/libcutt"+so, cuda_flags=arch_flag)
+        compile(cc_path, cutt_flags, files2, cache_path+"/libcutt"+so, cuda_flags=arch_flag)
     return dirname
 
 def setup_cutt():
@@ -362,7 +361,7 @@ def setup_cutt():
         install_cutt(cutt_path)
         cutt_home = os.path.join(cutt_path, "cutt-1.2")
         cutt_include_path = os.path.join(cutt_home, "src")
-        cutt_lib_path = os.path.join(cutt_home, "lib")
+        cutt_lib_path = cache_path
 
     cutt_lib_name = os.path.join(cutt_lib_path, "libcutt"+so)
     assert os.path.isdir(cutt_include_path)
@@ -397,7 +396,8 @@ def install_nccl(root_folder):
             if os.path.isdir(dirname):
                 shutil.rmtree(dirname)
     if not os.path.isfile(os.path.join(dirname, "build", "lib", "libnccl.so")):
-        LOG.i("Downloading nccl...")
+        if not os.path.isfile(os.path.join(root_folder, filename)):
+            LOG.i("Downloading nccl...")
         download_url_to_local(url, filename, root_folder, true_md5)
 
         if core.get_device_count() == 0:
@@ -531,7 +531,7 @@ def setup_mpi():
     mpi_ops = mpi.ops
     LOG.vv("Get mpi: "+str(mpi.__dict__.keys()))
     LOG.vv("Get mpi_ops: "+str(mpi_ops.__dict__.keys()))
-    def warper(func):
+    def wrapper(func):
         def inner(self, *args, **kw):
             return func(self, *args, **kw)
         inner.__doc__ = func.__doc__
@@ -539,7 +539,7 @@ def setup_mpi():
     for k in mpi_ops.__dict__:
         if not k.startswith("mpi_"): continue
         if k == "mpi_test": continue
-        setattr(core.Var, k, warper(mpi_ops.__dict__[k]))
+        setattr(core.Var, k, wrapper(mpi_ops.__dict__[k]))
 
 if os.environ.get("FIX_TORCH_ERROR", "0") == "1":
     try:
@@ -547,6 +547,7 @@ if os.environ.get("FIX_TORCH_ERROR", "0") == "1":
     except:
         pass
 
+cudnn = cublas = curand = None
 setup_mpi()
 in_mpi = inside_mpi()
 rank = mpi.world_rank() if in_mpi else 0
diff --git a/python/jittor/compiler.py b/python/jittor/compiler.py
index a51bb849..79730ff9 100644
--- a/python/jittor/compiler.py
+++ b/python/jittor/compiler.py
@@ -116,9 +116,11 @@ def compile(compiler, flags, inputs, output, combind_build=False, cuda_flags="")
             obj_files.append(os.path.join(
                 cache_path, "obj_files", os.path.basename(name)+".o"))
     inputs = new_inputs
+    cm = lambda s: f"\"{s}\""
+    cms = lambda arr: [f"\"{s}\"" for s in arr ]
 
     if len(inputs) == 1 or combind_build:
-        cmd = f"\"{compiler}\" {' '.join(inputs)} {flags} -o {output}"
+        cmd = f"\"{compiler}\" {' '.join(cms(inputs))} {flags} -o {cm(output)}"
         return do_compile(fix_cl_flags(cmd))
     # split compile object file and link
     # remove -l -L flags when compile object files
@@ -127,7 +129,7 @@ def compile(compiler, flags, inputs, output, combind_build=False, cuda_flags="")
     for input, obj_file in zip(inputs, obj_files):
         cc = compiler
         nflags = oflags
-        cmd = f"{input} {nflags} {lto_flags} -c -o {obj_file}"
+        cmd = f"{cm(input)} {nflags} {lto_flags} -c -o {cm(obj_file)}"
         if input.endswith(".cu"):
             if has_cuda:
                 cmd = f"\"{nvcc_path}\" {cuda_flags} {cmd}"
@@ -146,9 +148,9 @@ def compile(compiler, flags, inputs, output, combind_build=False, cuda_flags="")
     obj_files += ex_obj_files
     if os.name == 'nt':
         dumpdef_path = os.path.join(jittor_path, "utils", "dumpdef.py")
-        cmd = f"\"{sys.executable}\" \"{dumpdef_path}\" {' '.join(obj_files)} -Fo: \"{output}.def\""
+        cmd = f"\"{sys.executable}\" \"{dumpdef_path}\" {' '.join(cms(obj_files))} -Fo: \"{output}.def\""
         do_compile(fix_cl_flags(cmd))
-    cmd = f"\"{compiler}\" {' '.join(obj_files)} -o {output} {flags} {lto_flags}"
+    cmd = f"\"{compiler}\" {' '.join(cms(obj_files))} -o {cm(output)} {flags} {lto_flags}"
     return do_compile(fix_cl_flags(cmd))
 
 def gen_jit_tests():
@@ -245,7 +247,7 @@ def gen_jit_flags():
     
     {jit_declares}
 
-    // @pyjt(flags)
+    // @pyjt(Flags)
     struct _Flags {{
         // @pyjt(__init__)
         _Flags() {{}}
@@ -695,8 +697,8 @@ def compile_custom_ops(
     gen_name = "gen_ops_" + "_".join(headers.keys())
     if gen_name_ != "":
         gen_name = gen_name_
-    if len(gen_name) > 100:
-        gen_name = gen_name[:80] + "___hash" + hashlib.md5(gen_name.encode()).hexdigest()
+    if len(gen_name) > 50:
+        gen_name = gen_name[:50] + "___hash" + hashlib.md5(gen_name.encode()).hexdigest()[:6]
 
     includes = sorted(list(set(includes)))
     includes = "".join(map(lambda x: f" -I\"{x}\" ", includes))
@@ -722,7 +724,7 @@ def compile_custom_ops(
         return gen_src.replace(anchor_str, anchor_str+insert_str, 1)
 
     for name in pyjt_includes:
-        LOG.i("handle pyjt_include", name)
+        LOG.v("handle pyjt_include ", name)
         bname = os.path.basename(name).split(".")[0]
         gen_src_fname = os.path.join(cache_path, "custom_ops", gen_name+"_"+bname+".cc")
         pyjt_compiler.compile_single(name, gen_src_fname)
@@ -868,16 +870,16 @@ def check_cache_compile():
         files = [ x.replace('/', '\\') for x in files ]
     global jit_utils_core_files
     jit_utils_core_files = files
-    recompile = compile(cc_path, cc_flags+f" {opt_flags} ", files, 'jit_utils_core'+extension_suffix, True)
+    recompile = compile(cc_path, cc_flags+f" {opt_flags} ", files, jit_utils.cache_path+'/jit_utils_core'+extension_suffix, True)
     if recompile and jit_utils.cc:
-        LOG.e("jit_utils updated, please restart jittor.")
+        LOG.e("jit_utils updated, please rerun your command.")
         sys.exit(0)
     if not jit_utils.cc:
         with jit_utils.import_scope(import_flags):
             jit_utils.try_import_jit_utils_core()
         assert jit_utils.cc
         # recompile, generate cache key
-        compile(cc_path, cc_flags+f" {opt_flags} ", files, 'jit_utils_core'+extension_suffix, True)
+        compile(cc_path, cc_flags+f" {opt_flags} ", files, jit_utils.cache_path+'/jit_utils_core'+extension_suffix, True)
 
 def env_or_try_find(name, bname):
     if name in os.environ:
@@ -973,6 +975,25 @@ gdb_path = env_or_try_find('gdb_path', 'gdb')
 addr2line_path = try_find_exe('addr2line')
 has_pybt = check_pybt(gdb_path, python_path)
 
+if nvcc_path:
+    # gen cuda key for cache_path
+    cu = "cu"
+    v = jit_utils.get_version(nvcc_path)[1:-1]
+    nvcc_version = list(map(int,v.split('.')))
+    cu += v
+    try:
+        r, s = sp.getstatusoutput(f"{sys.executable} -m jittor_utils.query_cuda_cc")
+        if r==0:
+            s = sorted(list(set(s.strip().split())))
+            cu += "_sm_" + "_".join(s)
+            if "cuda_arch" not in os.environ:
+                os.environ["cuda_arch"] = " ".join(cu)
+    except:
+        pass
+    LOG.i("cuda key:", cu)
+    cache_path = os.path.join(cache_path, cu)
+    sys.path.append(cache_path)
+
 
 def check_clang_latest_supported_cpu():
     output = run_cmd('clang --print-supported-cpus')
@@ -1005,10 +1026,10 @@ if platform.system() == 'Darwin':
 opt_flags = ""
 
 py_include = jit_utils.get_py3_include_path()
-LOG.i(f"py_include: {py_include}")
+LOG.v(f"py_include: {py_include}")
 extension_suffix = jit_utils.get_py3_extension_suffix()
 lib_suffix = extension_suffix.rsplit(".", 1)[0]
-LOG.i(f"extension_suffix: {extension_suffix}")
+LOG.v(f"extension_suffix: {extension_suffix}")
 so = ".so" if os.name != 'nt' else ".dll"
 
 
@@ -1064,7 +1085,7 @@ if os.name == 'nt':
         win_libpaths = {}
         def fix_cl_flags(cmd):
             cmd = cmd.replace(".o ", ".obj ")
-            cmd = cmd.replace(".o\" ", ".obj\" ")
+            cmd = cmd.replace(".o\"", ".obj\"")
             if cmd.endswith(".o"): cmd += "bj"
             if " -o " in cmd:
                 if " -shared " in cmd:
@@ -1164,7 +1185,10 @@ if has_cuda:
                     return x
                 return f"-L\"{a}\" -l{b[:-4]}"
             nvcc_flags = map_flags(nvcc_flags, func)
-        nvcc_flags = nvcc_flags.replace("-std=c++17", "-std=c++14 -Xcompiler -std:c++14")
+        if nvcc_version >= [11,4]:
+            nvcc_flags = nvcc_flags.replace("-std=c++17", "-std=c++14 -Xcompiler -std:c++14")
+        else:
+            nvcc_flags = nvcc_flags.replace("-std=c++17", "")
         nvcc_flags = nvcc_flags.replace("-Wall", "")
         nvcc_flags = nvcc_flags.replace("-Wno-unknown-pragmas", "")
         nvcc_flags = nvcc_flags.replace("-fopenmp", "")
@@ -1189,7 +1213,7 @@ jit_src = gen_jit_op_maker(op_headers)
 LOG.vvvv(jit_src)
 with open(os.path.join(cache_path, "gen", "jit_op_maker.h"), 'w') as f:
     f.write(jit_src)
-cc_flags += f' -I\"{cache_path}\" -L\"{cache_path}\" '
+cc_flags += f' -I\"{cache_path}\" -L\"{cache_path}\" -L\"{jit_utils.cache_path}\" '
 # gen pyjt
 pyjt_gen_src = pyjt_compiler.compile(cache_path, jittor_path)
 
@@ -1264,7 +1288,7 @@ if use_data_gz:
             .replace("-Werror", "") \
             .replace("-shared", "")
         vdp = os.path.join(jittor_path, "src", "utils", "vdp")
-        run_cmd(fix_cl_flags(f"{cc_path} {dflags} -include {vdp} {data_s_path} -c -o {data_o_path}"))
+        run_cmd(fix_cl_flags(f"{cc_path} {dflags} -include \"{vdp}\" \"{data_s_path}\" -c -o \"{data_o_path}\""))
         os.remove(data_s_path)
         with open(data_gz_md5_path, 'w') as f:
             f.write(md5)
@@ -1281,15 +1305,15 @@ cc_flags += f" -l\"jittor_core{lib_suffix}\" "
 with jit_utils.import_scope(import_flags):
     import jittor_core as core
 
-flags = core.flags()
+flags = core.Flags()
 
 if has_cuda:
     nvcc_flags = convert_nvcc_flags(cc_flags)
-    nvcc_version = jit_utils.get_int_version(nvcc_path)
+    nvcc_version = list(jit_utils.get_int_version(nvcc_path))
     max_arch = 1000
-    if nvcc_version < (11,):
+    if nvcc_version < [11,]:
         max_arch = 75
-    elif nvcc_version < (11,1):
+    elif nvcc_version < [11,1]:
         max_arch = 80
     if len(flags.cuda_archs):
         min_arch = 30
diff --git a/python/jittor/extern/cuda/cublas/inc/cublas_warper.h b/python/jittor/extern/cuda/cublas/inc/cublas_wrapper.h
similarity index 100%
rename from python/jittor/extern/cuda/cublas/inc/cublas_warper.h
rename to python/jittor/extern/cuda/cublas/inc/cublas_wrapper.h
diff --git a/python/jittor/extern/cuda/cublas/ops/cublas_batched_matmul_op.cc b/python/jittor/extern/cuda/cublas/ops/cublas_batched_matmul_op.cc
index e02414f4..97c8f6c2 100644
--- a/python/jittor/extern/cuda/cublas/ops/cublas_batched_matmul_op.cc
+++ b/python/jittor/extern/cuda/cublas/ops/cublas_batched_matmul_op.cc
@@ -13,7 +13,7 @@
 #include "var.h"
 
 #include "cublas_batched_matmul_op.h"
-#include "cublas_warper.h"
+#include "cublas_wrapper.h"
 
 using namespace std;
 
diff --git a/python/jittor/extern/cuda/cublas/ops/cublas_matmul_op.cc b/python/jittor/extern/cuda/cublas/ops/cublas_matmul_op.cc
index b81e8b1a..c1b1069d 100644
--- a/python/jittor/extern/cuda/cublas/ops/cublas_matmul_op.cc
+++ b/python/jittor/extern/cuda/cublas/ops/cublas_matmul_op.cc
@@ -10,7 +10,7 @@
 
 #include "var.h"
 #include "cublas_matmul_op.h"
-#include "cublas_warper.h"
+#include "cublas_wrapper.h"
 
 using namespace std;
 
diff --git a/python/jittor/extern/cuda/cublas/src/cublas_warper.cc b/python/jittor/extern/cuda/cublas/src/cublas_wrapper.cc
similarity index 97%
rename from python/jittor/extern/cuda/cublas/src/cublas_warper.cc
rename to python/jittor/extern/cuda/cublas/src/cublas_wrapper.cc
index 0f1b912e..bc615f67 100644
--- a/python/jittor/extern/cuda/cublas/src/cublas_warper.cc
+++ b/python/jittor/extern/cuda/cublas/src/cublas_wrapper.cc
@@ -7,7 +7,7 @@
 // This file is subject to the terms and conditions defined in
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
-#include "cublas_warper.h"
+#include "cublas_wrapper.h"
 #include "misc/cuda_flags.h"
 
 namespace jittor {
diff --git a/python/jittor/extern/cuda/cudnn/inc/cudnn_rnn_descriptor.h b/python/jittor/extern/cuda/cudnn/inc/cudnn_rnn_descriptor.h
index 977aafb9..c500b006 100644
--- a/python/jittor/extern/cuda/cudnn/inc/cudnn_rnn_descriptor.h
+++ b/python/jittor/extern/cuda/cudnn/inc/cudnn_rnn_descriptor.h
@@ -7,7 +7,7 @@
 // ***************************************************************
 #pragma once
 #include "op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 #include "init.h"
 
diff --git a/python/jittor/extern/cuda/cudnn/inc/cudnn_warper.h b/python/jittor/extern/cuda/cudnn/inc/cudnn_wrapper.h
similarity index 100%
rename from python/jittor/extern/cuda/cudnn/inc/cudnn_warper.h
rename to python/jittor/extern/cuda/cudnn/inc/cudnn_wrapper.h
diff --git a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_backward_w_op.cc b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_backward_w_op.cc
index 089c1e3a..9679069e 100644
--- a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_backward_w_op.cc
+++ b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_backward_w_op.cc
@@ -10,7 +10,7 @@
 #include "mem/allocator.h"
 #include "var.h"
 #include "cudnn_conv3d_backward_w_op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 #include "ops/op_register.h"
 
diff --git a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_backward_x_op.cc b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_backward_x_op.cc
index 719c441a..8a72b886 100644
--- a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_backward_x_op.cc
+++ b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_backward_x_op.cc
@@ -10,7 +10,7 @@
 #include "mem/allocator.h"
 #include "var.h"
 #include "cudnn_conv3d_backward_x_op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 #include "ops/op_register.h"
 
diff --git a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_op.cc b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_op.cc
index 7c6d450f..1bc1a866 100644
--- a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_op.cc
+++ b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv3d_op.cc
@@ -7,7 +7,7 @@
 // ***************************************************************
 #include "var.h"
 #include "cudnn_conv3d_op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 #include "ops/op_register.h"
 
diff --git a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_backward_w_op.cc b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_backward_w_op.cc
index 98e90459..420ee59b 100644
--- a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_backward_w_op.cc
+++ b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_backward_w_op.cc
@@ -10,7 +10,7 @@
 #include "mem/allocator.h"
 #include "var.h"
 #include "cudnn_conv_backward_w_op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 
 using namespace std;
diff --git a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_backward_x_op.cc b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_backward_x_op.cc
index f376eef6..fb9ca952 100644
--- a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_backward_x_op.cc
+++ b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_backward_x_op.cc
@@ -10,7 +10,7 @@
 #include "mem/allocator.h"
 #include "var.h"
 #include "cudnn_conv_backward_x_op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 
 using namespace std;
diff --git a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_op.cc b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_op.cc
index 8be84b36..feeb8071 100644
--- a/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_op.cc
+++ b/python/jittor/extern/cuda/cudnn/ops/cudnn_conv_op.cc
@@ -7,7 +7,7 @@
 // ***************************************************************
 #include "var.h"
 #include "cudnn_conv_op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 
 using namespace std;
diff --git a/python/jittor/extern/cuda/cudnn/ops/cudnn_rnn_backward_x_op.cc b/python/jittor/extern/cuda/cudnn/ops/cudnn_rnn_backward_x_op.cc
index 347cf5bf..7f0d667f 100644
--- a/python/jittor/extern/cuda/cudnn/ops/cudnn_rnn_backward_x_op.cc
+++ b/python/jittor/extern/cuda/cudnn/ops/cudnn_rnn_backward_x_op.cc
@@ -8,7 +8,7 @@
 #include "var.h"
 #include "cudnn_rnn_descriptor.h"
 #include "cudnn_rnn_backward_x_op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 #include "ops/op_register.h"
 
diff --git a/python/jittor/extern/cuda/cudnn/ops/cudnn_rnn_op.cc b/python/jittor/extern/cuda/cudnn/ops/cudnn_rnn_op.cc
index 6772a6f8..16d49e7f 100644
--- a/python/jittor/extern/cuda/cudnn/ops/cudnn_rnn_op.cc
+++ b/python/jittor/extern/cuda/cudnn/ops/cudnn_rnn_op.cc
@@ -8,7 +8,7 @@
 #include "var.h"
 #include "cudnn_rnn_descriptor.h"
 #include "cudnn_rnn_op.h"
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "executor.h"
 #include "ops/op_register.h"
 
diff --git a/python/jittor/extern/cuda/cudnn/src/cudnn_warper.cc b/python/jittor/extern/cuda/cudnn/src/cudnn_wrapper.cc
similarity index 97%
rename from python/jittor/extern/cuda/cudnn/src/cudnn_warper.cc
rename to python/jittor/extern/cuda/cudnn/src/cudnn_wrapper.cc
index 6044ac98..d9b22e0e 100644
--- a/python/jittor/extern/cuda/cudnn/src/cudnn_warper.cc
+++ b/python/jittor/extern/cuda/cudnn/src/cudnn_wrapper.cc
@@ -4,7 +4,7 @@
 // This file is subject to the terms and conditions defined in
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
-#include "cudnn_warper.h"
+#include "cudnn_wrapper.h"
 #include "misc/cuda_flags.h"
 
 namespace jittor {
diff --git a/python/jittor/extern/cuda/curand/inc/curand_warper.h b/python/jittor/extern/cuda/curand/inc/curand_wrapper.h
similarity index 100%
rename from python/jittor/extern/cuda/curand/inc/curand_warper.h
rename to python/jittor/extern/cuda/curand/inc/curand_wrapper.h
diff --git a/python/jittor/extern/cuda/curand/ops/curand_random_op.cc b/python/jittor/extern/cuda/curand/ops/curand_random_op.cc
index 4ea8bfa3..6b42bfff 100644
--- a/python/jittor/extern/cuda/curand/ops/curand_random_op.cc
+++ b/python/jittor/extern/cuda/curand/ops/curand_random_op.cc
@@ -12,7 +12,7 @@
 #include <curand.h>
 #include "helper_cuda.h"
 #include "curand_random_op.h"
-#include "curand_warper.h"
+#include "curand_wrapper.h"
 
 namespace jittor {
 
diff --git a/python/jittor/extern/cuda/curand/src/curand_warper.cc b/python/jittor/extern/cuda/curand/src/curand_wrapper.cc
similarity index 97%
rename from python/jittor/extern/cuda/curand/src/curand_warper.cc
rename to python/jittor/extern/cuda/curand/src/curand_wrapper.cc
index f32c6755..fda74ffc 100644
--- a/python/jittor/extern/cuda/curand/src/curand_warper.cc
+++ b/python/jittor/extern/cuda/curand/src/curand_wrapper.cc
@@ -7,7 +7,7 @@
 // This file is subject to the terms and conditions defined in
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
-#include "curand_warper.h"
+#include "curand_wrapper.h"
 #include "init.h"
 #include "misc/cuda_flags.h"
 
diff --git a/python/jittor/extern/cuda/cutt/ops/cutt_transpose_op.cc b/python/jittor/extern/cuda/cutt/ops/cutt_transpose_op.cc
index 436402fe..96f68401 100644
--- a/python/jittor/extern/cuda/cutt/ops/cutt_transpose_op.cc
+++ b/python/jittor/extern/cuda/cutt/ops/cutt_transpose_op.cc
@@ -7,7 +7,7 @@
 #include "cutt_transpose_op.h"
 #include "ops/op_register.h"
 #include "cutt.h"
-#include "cutt_warper.h"
+#include "cutt_wrapper.h"
 #include "misc/stack_vector.h"
 #include "helper_cuda.h"
 
diff --git a/python/jittor/extern/cuda/cutt/ops/cutt_warper.cc b/python/jittor/extern/cuda/cutt/ops/cutt_wrapper.cc
similarity index 97%
rename from python/jittor/extern/cuda/cutt/ops/cutt_warper.cc
rename to python/jittor/extern/cuda/cutt/ops/cutt_wrapper.cc
index 6deda068..5319d718 100644
--- a/python/jittor/extern/cuda/cutt/ops/cutt_warper.cc
+++ b/python/jittor/extern/cuda/cutt/ops/cutt_wrapper.cc
@@ -6,7 +6,7 @@
 // This file is subject to the terms and conditions defined in
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
-#include "cutt_warper.h"
+#include "cutt_wrapper.h"
 
 
 namespace jittor {
diff --git a/python/jittor/extern/cuda/cutt/ops/cutt_warper.h b/python/jittor/extern/cuda/cutt/ops/cutt_wrapper.h
similarity index 100%
rename from python/jittor/extern/cuda/cutt/ops/cutt_warper.h
rename to python/jittor/extern/cuda/cutt/ops/cutt_wrapper.h
diff --git a/python/jittor/extern/cuda/nccl/inc/nccl_warper.h b/python/jittor/extern/cuda/nccl/inc/nccl_wrapper.h
similarity index 96%
rename from python/jittor/extern/cuda/nccl/inc/nccl_warper.h
rename to python/jittor/extern/cuda/nccl/inc/nccl_wrapper.h
index 3a3f0a60..84fd3f0f 100644
--- a/python/jittor/extern/cuda/nccl/inc/nccl_warper.h
+++ b/python/jittor/extern/cuda/nccl/inc/nccl_wrapper.h
@@ -8,7 +8,7 @@
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
 #pragma once
-#include "mpi_warper.h"
+#include "mpi_wrapper.h"
 
 #include <cuda_runtime.h>
 #include <nccl.h>
diff --git a/python/jittor/extern/cuda/nccl/ops/nccl_all_reduce_op.cc b/python/jittor/extern/cuda/nccl/ops/nccl_all_reduce_op.cc
index 8fa5a6a7..74e9e0ab 100644
--- a/python/jittor/extern/cuda/nccl/ops/nccl_all_reduce_op.cc
+++ b/python/jittor/extern/cuda/nccl/ops/nccl_all_reduce_op.cc
@@ -13,7 +13,7 @@
 #include <nccl.h>
 #include <cuda_runtime.h>
 #include "helper_cuda.h"
-#include "nccl_warper.h"
+#include "nccl_wrapper.h"
 #include "ops/op_register.h"
 namespace jittor {
 
diff --git a/python/jittor/extern/cuda/nccl/ops/nccl_broadcast_op.cc b/python/jittor/extern/cuda/nccl/ops/nccl_broadcast_op.cc
index 391702f2..0217ae2a 100644
--- a/python/jittor/extern/cuda/nccl/ops/nccl_broadcast_op.cc
+++ b/python/jittor/extern/cuda/nccl/ops/nccl_broadcast_op.cc
@@ -13,7 +13,7 @@
 #include <nccl.h>
 #include <cuda_runtime.h>
 #include "helper_cuda.h"
-#include "nccl_warper.h"
+#include "nccl_wrapper.h"
 #include "ops/op_register.h"
 namespace jittor {
 
diff --git a/python/jittor/extern/cuda/nccl/ops/nccl_reduce_op.cc b/python/jittor/extern/cuda/nccl/ops/nccl_reduce_op.cc
index 55409831..e2e8420e 100644
--- a/python/jittor/extern/cuda/nccl/ops/nccl_reduce_op.cc
+++ b/python/jittor/extern/cuda/nccl/ops/nccl_reduce_op.cc
@@ -13,7 +13,7 @@
 #include <nccl.h>
 #include <cuda_runtime.h>
 #include "helper_cuda.h"
-#include "nccl_warper.h"
+#include "nccl_wrapper.h"
 #include "ops/op_register.h"
 namespace jittor {
 
diff --git a/python/jittor/extern/cuda/nccl/ops/nccl_test_op.cc b/python/jittor/extern/cuda/nccl/ops/nccl_test_op.cc
index 70eb6f50..0bd2ed29 100644
--- a/python/jittor/extern/cuda/nccl/ops/nccl_test_op.cc
+++ b/python/jittor/extern/cuda/nccl/ops/nccl_test_op.cc
@@ -7,7 +7,7 @@
 #include "nccl_test_op.h"
 #include "utils/str_utils.h"
 
-#include "nccl_warper.h"
+#include "nccl_wrapper.h"
 
 
 namespace jittor {
diff --git a/python/jittor/extern/cuda/nccl/src/nccl_warper.cc b/python/jittor/extern/cuda/nccl/src/nccl_wrapper.cc
similarity index 98%
rename from python/jittor/extern/cuda/nccl/src/nccl_warper.cc
rename to python/jittor/extern/cuda/nccl/src/nccl_wrapper.cc
index 46824457..569f8cb8 100644
--- a/python/jittor/extern/cuda/nccl/src/nccl_warper.cc
+++ b/python/jittor/extern/cuda/nccl/src/nccl_wrapper.cc
@@ -8,7 +8,7 @@
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
 #include "misc/cuda_flags.h"
-#include "nccl_warper.h"
+#include "nccl_wrapper.h"
 #include "event_queue.h"
 
 const char *_cudaGetErrorEnum(ncclResult_t error) {
diff --git a/python/jittor/extern/mpi/inc/mpi_warper.h b/python/jittor/extern/mpi/inc/mpi_wrapper.h
similarity index 100%
rename from python/jittor/extern/mpi/inc/mpi_warper.h
rename to python/jittor/extern/mpi/inc/mpi_wrapper.h
diff --git a/python/jittor/extern/mpi/ops/mpi_all_reduce_op.cc b/python/jittor/extern/mpi/ops/mpi_all_reduce_op.cc
index 7259b3ab..589fe2ab 100644
--- a/python/jittor/extern/mpi/ops/mpi_all_reduce_op.cc
+++ b/python/jittor/extern/mpi/ops/mpi_all_reduce_op.cc
@@ -6,7 +6,7 @@
 // This file is subject to the terms and conditions defined in
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
-#include "mpi_warper.h"
+#include "mpi_wrapper.h"
 #include "var.h"
 #include "mpi_all_reduce_op.h"
 #include "ops/op_register.h"
diff --git a/python/jittor/extern/mpi/ops/mpi_broadcast_op.cc b/python/jittor/extern/mpi/ops/mpi_broadcast_op.cc
index 257367fd..b9a23762 100644
--- a/python/jittor/extern/mpi/ops/mpi_broadcast_op.cc
+++ b/python/jittor/extern/mpi/ops/mpi_broadcast_op.cc
@@ -6,7 +6,7 @@
 // This file is subject to the terms and conditions defined in
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
-#include "mpi_warper.h"
+#include "mpi_wrapper.h"
 #include "var.h"
 #include "mpi_broadcast_op.h"
 #include "ops/op_register.h"
diff --git a/python/jittor/extern/mpi/ops/mpi_reduce_op.cc b/python/jittor/extern/mpi/ops/mpi_reduce_op.cc
index 7ce39cfa..189a99f8 100644
--- a/python/jittor/extern/mpi/ops/mpi_reduce_op.cc
+++ b/python/jittor/extern/mpi/ops/mpi_reduce_op.cc
@@ -6,7 +6,7 @@
 // This file is subject to the terms and conditions defined in
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
-#include "mpi_warper.h"
+#include "mpi_wrapper.h"
 #include "var.h"
 #include "mpi_reduce_op.h"
 #include "ops/op_register.h"
diff --git a/python/jittor/extern/mpi/ops/mpi_test_op.cc b/python/jittor/extern/mpi/ops/mpi_test_op.cc
index 615520f1..3fdb34a2 100644
--- a/python/jittor/extern/mpi/ops/mpi_test_op.cc
+++ b/python/jittor/extern/mpi/ops/mpi_test_op.cc
@@ -3,7 +3,7 @@
 // This file is subject to the terms and conditions defined in
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
-#include "mpi_warper.h"
+#include "mpi_wrapper.h"
 
 #include "var.h"
 #include "mpi_test_op.h"
diff --git a/python/jittor/extern/mpi/src/mpi_warper.cc b/python/jittor/extern/mpi/src/mpi_wrapper.cc
similarity index 98%
rename from python/jittor/extern/mpi/src/mpi_warper.cc
rename to python/jittor/extern/mpi/src/mpi_wrapper.cc
index ce16be15..5eee059d 100644
--- a/python/jittor/extern/mpi/src/mpi_warper.cc
+++ b/python/jittor/extern/mpi/src/mpi_wrapper.cc
@@ -11,7 +11,7 @@
 #include <stdint.h>
 #include <stdio.h>
 
-#include "mpi_warper.h"
+#include "mpi_wrapper.h"
 #include "common.h"
 #include "ops/array_op.h"
 
diff --git a/python/jittor/init.py b/python/jittor/init.py
index f65661d8..3f8b5f25 100644
--- a/python/jittor/init.py
+++ b/python/jittor/init.py
@@ -8,36 +8,324 @@
 # file 'LICENSE.txt', which is part of this source code package.
 # ***************************************************************
 import jittor as jt
+from jittor import NanoVector, Var
 import numpy as np
 import math
 
-def eye(shape, dtype):
-    return jt.array(np.identity(shape[0])).unary(dtype)
+def eye(shape, dtype="float32"):
+    ''' Generate 2-D identity matrix.
+
+    Args:
+        shape (int or tuple of int):
+            shape of the output matrix
+        dtype (string):
+            dtype of the output matrix, default float32
+    
+    Return:
+        A Jittor Var of identity matrix.
+
+    Example::
+
+        from jittor import init
+        print(init.eye(2))
+        # output: [[1.,0.],[0.,1.]]
+        print(init.eye((2,3), "float32"))
+        # output: [[1.,0.,0.],[0.,1.,0.]]
+
+    '''
+    if isinstance(shape, int):
+        shape = (shape,shape)
+    assert len(shape)==2, f"len of shape should be 2, but got {shape}"
+    index = jt.index(shape)
+    return (index[0]==index[1]).unary(dtype)
 
 def eye_(var):
-    return var.assign(eye(var.shape, var.dtype))
+    ''' Inplace initialize variable with identity matrix.
 
-def constant(shape, dtype, value=0.0):
-    return jt.array(value).unary(dtype).broadcast(shape)
+    Args:
+        var (Jittor Var):
+            Var to initialize with identity matrix.
+    
+    Return:
+        var itself.
+    
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.eye_(linear.weight)
+        print(linear.weight)
+        # output: [[1.,0.],[0.,1.]]
+        linear.weight.eye_() # This is ok too
+
+    '''
+    return var.assign(eye(var.shape, var.dtype))
+Var.eye_ = eye_
+
+def constant(shape, dtype="float32", value=0.0):
+    '''Generate constant Jittor Var.
+
+    Args:
+        shape (int or tuple of int):
+            shape of the output Var
+        dtype (string):
+            dtype of the output Var, default float32
+        value (int or float):
+            value to be filled in output Var
+    
+    Return:
+        A Jittor Var which filled by constant value.
+
+    Example::
+
+        from jittor import init
+        print(init.constant(2))
+        # output: [0.,0.]
+        print(init.constant((2,3), value=1.))
+        # output: [[1.,1.,1.],[1.,1.,1.]]
+
+    '''
+    return jt.array(value).unary(dtype).broadcast(NanoVector(shape))
 
 def constant_(var, value=0.0):
+    ''' Inplace initialize variable with constant value.
+
+    Args:
+        var (Jittor Var):
+            Var to initialize with constant value.
+    
+    Return:
+        var itself.
+    
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.constant_(linear.weight)
+        print(linear.weight)
+        # output: [[0.,0.],[0.,0.]]
+        linear.weight.constant_() # This is ok too
+
+    '''
     return var.assign(constant(var.shape, var.dtype, value))
+Var.constant_ = constant_
 
-def uniform(shape, dtype, low, high):
-    return jt.random(shape, dtype) * (low - high) + high
+def zero(shape, dtype="float32"):
+    '''Generate zero Jittor Var.
 
-def uniform_(var, low, high):
+    Args:
+        shape (int or tuple of int):
+            shape of the output Var
+        dtype (string):
+            dtype of the output Var, default float32
+    
+    Return:
+        A Jittor Var which filled by constant value.
+
+    Example::
+
+        from jittor import init
+        print(init.zero(2))
+        # output: [0.,0.]
+        print(init.zero((2,3)))
+        # output: [[0.,0.,0.],[0.,0.,0.]]
+
+    '''
+    return constant(shape, dtype, 0)
+def zero_(var):
+    ''' Inplace initialize variable with zero.
+
+    Args:
+        var (Jittor Var):
+            Var to initialize with zero.
+    
+    Return:
+        var itself.
+    
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.zero_(linear.weight)
+        print(linear.weight)
+        # output: [[0.,0.],[0.,0.]]
+        linear.weight.zero_() # This is ok too
+
+    '''
+    return var.assign(zero(var.shape, var.dtype))
+Var.zero_ = zero_
+def one(shape, dtype="float32"):
+    '''Generate Jittor Var filled by one.
+
+    Args:
+        shape (int or tuple of int):
+            shape of the output Var
+        dtype (string):
+            dtype of the output Var, default float32
+    
+    Return:
+        A Jittor Var which filled by one.
+
+    Example::
+
+        from jittor import init
+        print(init.one(2))
+        # output: [1.,1.]
+        print(init.one((2,3)))
+        # output: [[1.,1.,1.],[1.,1.,1.]]
+
+    '''
+    return constant(shape, dtype, 1)
+def one_(var):
+    ''' Inplace initialize variable with one.
+
+    Args:
+        var (Jittor Var):
+            Var to initialize with one.
+    
+    Return:
+        var itself.
+    
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.one_(linear.weight)
+        print(linear.weight)
+        # output: [[1.,1.],[1.,1.]]
+        linear.weight.one_() # This is ok too
+
+    '''
+    return var.assign(one(var.shape, var.dtype))
+Var.one_ = one_
+
+def uniform(shape, dtype="float32", low=0, high=1):
+    '''Generate random uniform Jittor Var.
+
+    Args:
+        shape (int or tuple of int):
+            shape of the output Var
+        dtype (string):
+            dtype of the output Var, default float32
+        low (int or float or Var):
+            lower bound value of the random uniform
+        high (int or float or Var):
+            upper bound value of the random uniform
+
+    Return:
+        A Jittor Var which filled by random uniform.
+
+    Example::
+
+        from jittor import init
+        print(init.uniform(5))
+        # output: [0.202268, 0.518688, 0.595274, 0.777354, 0.981979]
+        print(init.uniform((2,3), low=-1, high=1))
+        # output: [[ 0.6647397   0.2801202  -0.01981187]
+        #          [-0.9779438  -0.30149996  0.69056886]]
+
+    '''
+    return jt.random(NanoVector(shape), dtype) * (low - high) + high
+
+def uniform_(var, low=0, high=1):
+    ''' Inplace initialize Jittor Var by random uniform.
+
+    Args:
+        var (Jittor Var):
+            Var to be initialized by random uniform
+        low (int or float or Var):
+            lower bound value of the random uniform
+        high (int or float or Var):
+            upper bound value of the random uniform
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.uniform_(linear.weight, -1.0, 1.0)
+        print(linear.weight)
+        # output: [[ 0.6647397   0.2801202], [-0.9779438  -0.30149996]]
+        linear.weight.uniform_(-1.0, 1.0) # This is ok too
+
+    '''
     return var.assign(uniform(var.shape, var.dtype, low, high))
+Var.uniform_ = uniform_
 
-def gauss(shape, dtype, mean=0.0, std=1.0):
-    return jt.random(shape, dtype, "normal") * std + mean
+def gauss(shape, dtype="float32", mean=0.0, std=1.0):
+    ''' Return Jittor Var initialize by random gauss.
+
+    Args:
+        shape (int or tuple of int):
+            shape of the output Var
+        dtype (string):
+            dtype of the output Var, default float32
+        mean (int or float or Var):
+            mean value of the random gauss
+        std (int or float or Var):
+            std value of the random gauss
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        a = init.gauss((2,2), "float32", 0.0, 1.0)
+        print(a)
+
+    '''
+    return jt.random(NanoVector(shape), dtype, "normal") * std + mean
 
 def gauss_(var, mean=0.0, std=1.0):
-    return var.assign(gauss(var.shape, var.dtype, mean, std))
+    ''' Inplace initialize Jittor Var by random gauss.
 
-def invariant_uniform(shape, dtype, mode="fan_in"):
+    Args:
+        var (Jittor Var):
+            Var to be initialized by random gauss
+        mean (int or float or Var):
+            mean value of the random gauss
+        std (int or float or Var):
+            std value of the random gauss
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.gauss_(linear.weight, 0.0, 1.0)
+        print(linear.weight)
+        linear.weight.gauss_(0.0, 1.0) # This is ok too
+
+    '''
+    return var.assign(gauss(var.shape, var.dtype, mean, std))
+Var.gauss_ = gauss_
+
+def invariant_uniform(shape, dtype="float32", mode="fan_in"):
+    ''' Return Jittor initialized Var by invariant_uniform.
+
+    Args:
+        shape (int or tuple of int):
+            shape of the output Var
+        dtype (string):
+            dtype of the output Var, default float32
+        mode (string):
+            mode selection, should be fan_in or fan_out.
+            Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        a = init.invariant_uniform_((2,2))
+        print(a)
+
+    '''
     assert len(shape)>1
-    assert mode=="fan_in" or mode=="fan_out"
+    assert mode=="fan_in" or mode=="fan_out", \
+        f"mode not supported, should be fan_in or fan_out, but got {mode}"
 
     matsize=1
     for i in shape[2:]:
@@ -47,9 +335,48 @@ def invariant_uniform(shape, dtype, mode="fan_in"):
     return uniform(shape, dtype, -bound, bound)
 
 def invariant_uniform_(var, mode="fan_in"):
-    var.assign(invariant_uniform(tuple(var.shape), var.dtype, mode))
+    ''' Inplace initialize Jittor Var by invariant_uniform.
 
-def relu_invariant_gauss(shape, dtype, mode="fan_in"):
+    Args:
+        var (Jittor Var):
+            Var to be initialized by random invariant_uniform
+        mode (string):
+            mode selection, should be fan_in or fan_out.
+            Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.invariant_uniform_(linear.weight)
+        print(linear.weight)
+        linear.weight.invariant_uniform_() # This is ok too
+
+    '''
+    var.assign(invariant_uniform(tuple(var.shape), var.dtype, mode))
+Var.invariant_uniform_ = invariant_uniform_
+
+def relu_invariant_gauss(shape, dtype="float32", mode="fan_in"):
+    ''' Return Jittor Var initialized by relu_invariant_gauss.
+
+    Args:
+        shape (int or tuple of int):
+            shape of the output Var
+        dtype (string):
+            dtype of the output Var, default float32
+        mode (string):
+            mode selection, should be fan_in or fan_out.
+            Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        a = init.relu_invariant_gauss((2,2))
+        print(a)
+    
+    '''
     assert len(shape)>1
     assert mode=="fan_in" or mode=="fan_out"
     
@@ -61,9 +388,29 @@ def relu_invariant_gauss(shape, dtype, mode="fan_in"):
     return gauss(shape, dtype, 0, std)
 
 def relu_invariant_gauss_(var, mode="fan_in"):
-    return var.assign(relu_invariant_gauss(tuple(var.shape), var.dtype, mode))
+    ''' Inplace initialize Jittor Var by relu_invariant_gauss.
 
-def calculate_std(var,mode,nonlinearity,param=0.01):
+    Args:
+        var (Jittor Var):
+            Var to be initialized by random relu_invariant_gauss
+        mode (string):
+            mode selection, should be fan_in or fan_out.
+            Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.relu_invariant_gauss_(linear.weight)
+        print(linear.weight)
+        linear.weight.relu_invariant_gauss_() # This is ok too
+
+    '''
+    return var.assign(relu_invariant_gauss(tuple(var.shape), var.dtype, mode))
+Var.relu_invariant_gauss_ = relu_invariant_gauss_
+
+def calculate_std(var, mode, nonlinearity, param=0.01):
     mode = mode.lower()
     assert isinstance(param,(int,float))
     assert var.ndim>=2
@@ -91,16 +438,90 @@ def calculate_std(var,mode,nonlinearity,param=0.01):
 
 
 def kaiming_uniform_(var, a=0, mode='fan_in', nonlinearity='leaky_relu'):
+    ''' Inplace initialize Jittor Var by kaiming_uniform.
+
+    Args:
+        var (Jittor Var):
+            Var to be initialized by random kaiming_uniform
+        a (float):
+            the negative slope of the rectifier used after this layer (only used with 'leaky_relu')
+        mode (string):
+            mode selection, should be fan_in or fan_out.
+            Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.
+        nonlinearity (string):
+            nonlinearity used after this layer. 
+            It can be one of [linear, conv*, sigmoid, tanh, relu, leaky_relu].
+            leaky_relu is used by default.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.kaiming_uniform_(linear.weight)
+        print(linear.weight)
+        linear.weight.kaiming_uniform_() # This is ok too
+
+    '''
     std = calculate_std(var,mode,nonlinearity,a)
     bound = math.sqrt(3.0) * std
     return uniform_(var,-bound, bound)
+Var.kaiming_uniform_ = kaiming_uniform_
 
 def kaiming_normal_(var, a=0, mode='fan_in', nonlinearity='leaky_relu'):
+    ''' Inplace initialize Jittor Var by kaiming_normal.
+
+    Args:
+        var (Jittor Var):
+            Var to be initialized by random kaiming_normal
+        a (float):
+            the negative slope of the rectifier used after this layer (only used with 'leaky_relu')
+        mode (string):
+            mode selection, should be fan_in or fan_out.
+            Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.
+        nonlinearity (string):
+            nonlinearity used after this layer. 
+            It can be one of [linear, conv*, sigmoid, tanh, relu, leaky_relu].
+            leaky_relu is used by default.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.kaiming_normal_(linear.weight)
+        print(linear.weight)
+        linear.weight.kaiming_normal_() # This is ok too
+
+    '''
     std = calculate_std(var,mode,nonlinearity,a)
     return gauss_(var,0, std)
+Var.kaiming_normal_ = kaiming_normal_
 
 
-def xavier_uniform(shape, dtype, gain=1.0):
+def xavier_uniform(shape, dtype="float32", gain=1.0):
+    ''' Inplace initialize Jittor Var by xavier_uniform.
+    The resulting var will have values sampled from
+    :math:`uniform(-a, a)` where
+
+    .. math::
+        a = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}}
+
+    Args:
+        shape (int or tuple of int):
+            shape of the return Var.
+        dtype (string):
+            dtype of the return Var, default float32.
+        gain (float):
+            an optional scaling factor.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        a = init.xavier_uniform((2,2), gain=init.calculate_gain('relu'))
+        print(a)
+    '''
     assert len(shape)>1
 
     matsize=1
@@ -111,9 +532,58 @@ def xavier_uniform(shape, dtype, gain=1.0):
     return uniform(shape, dtype, -bound, bound)
 
 def xavier_uniform_(var, gain=1.0):
-    return var.assign(xavier_uniform(tuple(var.shape), var.dtype, gain))
+    ''' Inplace initialize Jittor Var by xavier_uniform.
+    The resulting var will have values sampled from
+    :math:`uniform(-a, a)` where
 
-def xavier_gauss(shape, dtype, gain=1.0):
+    .. math::
+        a = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}}
+
+    Args:
+        var (Jittor Var):
+            Var to be initialized by random xavier_uniform
+        gain (float):
+            an optional scaling factor.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.xavier_uniform_(linear.weight, init.calculate_gain('relu'))
+        print(linear.weight)
+        linear.weight.xavier_uniform_() # This is ok too
+
+    '''
+    return var.assign(xavier_uniform(tuple(var.shape), var.dtype, gain))
+Var.xavier_uniform_ = xavier_uniform_
+
+def xavier_gauss(shape, dtype="float32", gain=1.0):
+    ''' Return Jittor Var initialized by xavier_gauss, a.k.a xavier_normal.
+    The resulting var will have values sampled from
+    :math:`gauss(-a, a)` where
+
+    .. math::
+        \text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in} + \text{fan\_out}}}
+
+    Args:
+        shape (int or tuple of int):
+            shape of the return Var.
+        dtype (string):
+            dtype of the return Var, default float32.
+        gain (float):
+            an optional scaling factor.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.xavier_gauss_(linear.weight, init.calculate_gain('relu'))
+        print(linear.weight)
+        linear.weight.xavier_gauss_() # This is ok too
+
+    '''
     assert len(shape)>1
     
     matsize=1
@@ -124,4 +594,74 @@ def xavier_gauss(shape, dtype, gain=1.0):
     return gauss(shape, dtype, 0, std)
 
 def xavier_gauss_(var, gain=1.0):
+    ''' Inplace initialize Jittor Var by xavier_gauss, a.k.a xavier_normal.
+    The resulting var will have values sampled from
+    :math:`gauss(-a, a)` where
+
+    .. math::
+        \text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in} + \text{fan\_out}}}
+
+    Args:
+        var (Jittor Var):
+            Var to be initialized by random xavier_gauss
+        gain (float):
+            an optional scaling factor.
+
+    Example::
+
+        from jittor import init
+        from jittor import nn
+        linear = nn.Linear(2,2)
+        init.xavier_gauss_(linear.weight, init.calculate_gain('relu'))
+        print(linear.weight)
+        linear.weight.xavier_gauss_() # This is ok too
+
+    '''
     return var.assign(xavier_gauss(tuple(var.shape), var.dtype, gain))
+Var.xavier_gauss_ = xavier_gauss_
+
+def calculate_gain(nonlinearity, param=None):
+    r"""Return the recommended gain value for the given nonlinearity function.
+    The values are as follows:
+
+    ================= ====================================================
+    nonlinearity      gain
+    ================= ====================================================
+    Linear / Identity :math:`1`
+    Conv{1,2,3}D      :math:`1`
+    Sigmoid           :math:`1`
+    Tanh              :math:`\frac{5}{3}`
+    ReLU              :math:`\sqrt{2}`
+    Leaky Relu        :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}`
+    SELU              :math:`\frac{3}{4}`
+    ================= ====================================================
+
+    Args:
+        nonlinearity: the non-linear function (`nn.functional` name)
+        param: optional parameter for the non-linear function
+
+    Examples:
+        >>> gain = nn.init.calculate_gain('leaky_relu', 0.2)  # leaky_relu with negative_slope=0.2
+
+    .. _Self-Normalizing Neural Networks: https://papers.nips.cc/paper/2017/hash/5d44ee6f2c3f71b73125876103c8f6c4-Abstract.html
+    """
+    linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
+    if nonlinearity in linear_fns or nonlinearity == 'sigmoid':
+        return 1
+    elif nonlinearity == 'tanh':
+        return 5.0 / 3
+    elif nonlinearity == 'relu':
+        return math.sqrt(2.0)
+    elif nonlinearity == 'leaky_relu':
+        if param is None:
+            negative_slope = 0.01
+        elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):
+            # True/False are instances of int, hence check above
+            negative_slope = param
+        else:
+            raise ValueError("negative_slope {} not a valid number".format(param))
+        return math.sqrt(2.0 / (1 + negative_slope ** 2))
+    elif nonlinearity == 'selu':
+        return 3.0 / 4 
+    else:
+        raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
diff --git a/python/jittor/misc.py b/python/jittor/misc.py
index 5daf1dc8..aa705a28 100644
--- a/python/jittor/misc.py
+++ b/python/jittor/misc.py
@@ -937,7 +937,7 @@ Output::
         print(out)
     return tree, out
     
-def python_pass_warper(mod_func, args, kw):
+def python_pass_wrapper(mod_func, args, kw):
     import importlib
     mod, func = mod_func.rsplit(".", 1)
     mod = importlib.import_module(mod)
diff --git a/python/jittor/models/res2net.py b/python/jittor/models/res2net.py
index 5dd1b665..859e9c05 100644
--- a/python/jittor/models/res2net.py
+++ b/python/jittor/models/res2net.py
@@ -7,12 +7,12 @@ import math
 
 
 model_urls = {
-    'res2net50_14w_8s': 'https://cloud.tsinghua.edu.cn/f/2543e4b5646d40a1afa9/?dl=1&fname=/res2net50_14w_8s.pkl',
-    'res2net50_26w_4s': 'https://cloud.tsinghua.edu.cn/f/927fead9c9884f769d88/?dl=1&fname=/res2net50_26w_4s.pkl',
-    'res2net50_26w_6s': 'https://cloud.tsinghua.edu.cn/f/067875edf6ca488ba83e/?dl=1&fname=/res2net50_26w_6s.pkl',
-    'res2net50_26w_8s': 'https://cloud.tsinghua.edu.cn/f/ce1230155a2c4352bf17/?dl=1&fname=/res2net50_26w_8s.pkl',
-    'res2net50_48w_2s': 'https://cloud.tsinghua.edu.cn/f/b8a4df2b2cb64500b869/?dl=1&fname=/res2net50_48w_2s.pkl',
-    'res2net101_26w_4s': 'https://cloud.tsinghua.edu.cn/f/b85283bf572649d288bb/?dl=1&fname=/res2net101_26w_4s.pkl',
+    'res2net50_14w_8s': 'jittorhub://res2net50_14w_8s.pkl',
+    'res2net50_26w_4s': 'jittorhub://res2net50_26w_4s.pkl',
+    'res2net50_26w_6s': 'jittorhub://res2net50_26w_6s.pkl',
+    'res2net50_26w_8s': 'jittorhub://res2net50_26w_8s.pkl',
+    'res2net50_48w_2s': 'jittorhub://res2net50_48w_2s.pkl',
+    'res2net101_26w_4s': 'jittorhub://res2net101_26w_4s.pkl',
 }
 
 
diff --git a/python/jittor/nn.py b/python/jittor/nn.py
index 86d9fffc..5d7fbbfe 100644
--- a/python/jittor/nn.py
+++ b/python/jittor/nn.py
@@ -155,22 +155,151 @@ jt.Var.__imatmul__ = lambda a,b: a.assign(matmul(a,b))
 def get_init_var_rand(shape, dtype):
     return jt.array(np.random.normal(0.0, 1.0, shape).astype(np.float32))
 
-def relu(x): return jt.ternary((x>0.0), x, jt.broadcast_var(0.0, x))
-def leaky_relu(x, scale=0.01): return jt.ternary(x>0, x, x*scale)
-def relu6(x): return jt.minimum(jt.maximum(x, 0.0), 6.0)
-def elu(x,alpha=1.0):return jt.ternary(x>0,x,alpha*(x.exp()-1))
-def sign(x):
+def relu(x): 
+    r''' Applies the element-wise function:
+
+    .. math::
+        \text{ReLU6}(x) = \max(0,x)
+
+    :param x: the input var
+    :type x: jt.Var
+
+    Example:
+        >>> a = jt.randn(3)
+        >>> a
+        jt.Var([-0.38380373 1.1338731   6.128115  ], dtype=float32)
+        >>> nn.relu(a)
+        jt.Var([0.        1.1338731 6.128115 ], dtype=float32)
+    '''
+    return jt.ternary((x>0.0), x, jt.broadcast_var(0.0, x))
+
+
+def leaky_relu(x, scale=0.01): 
+    r''' Applies the element-wise function:
+
+    .. math::
+        \text{LeakyRELU}(x) =
+        \begin{cases}
+        x, & \text{ if } x \geq 0 \\
+        \text{scale} \times x, & \text{ otherwise }
+        \end{cases}
+
+    :param x: the input var
+    :type x: jt.Var
+
+    :param scale: the :math:`\scale` value for the leaky relu formulation. Default: 0.01
+    :param scale: float, optional
+
+    Example:
+        >>> a = jt.randn(3)
+        >>> a
+        jt.Var([-0.38380373 1.1338731   6.128115  ], dtype=float32)
+        >>> nn.leaky_relu(a)
+        jt.Var([-3.8380371e-03  1.1338731e+00  6.1281152e+00], dtype=float32)
+    '''
+    return jt.ternary(x>0, x, x*scale)
+
+def relu6(x): 
+    r''' Applies the element-wise function:
+
+    .. math::
+        \text{ReLU6}(x) = \min(\max(0,x), 6)
+
+    :param x: the input var
+    :type x: jt.Var
+
+    Example:
+        >>> a = jt.randn(3)
+        >>> a
+        jt.Var([-0.38380373 1.1338731   6.128115  ], dtype=float32)
+        >>> nn.relu6(a)
+        jt.Var([0.        1.1338731 6.       ], dtype=float32)
+    '''
+    return jt.minimum(jt.maximum(x, 0.0), 6.0)
+
+def elu(x: jt.Var, alpha: float = 1.0) -> jt.Var:
+    r''' Applies the element-wise function:
+
+    .. math::
+        \text{ELU}(x) = \begin{cases}
+        x, & \text{ if } x > 0\\
+        \alpha * (\exp(x) - 1), & \text{ if } x \leq 0
+        \end{cases}
+
+    :param x: the input var
+    :type x: jt.Var
+
+    :param alpha: the :math:`\alpha` value for the ELU formulation. Default: 1.0
+    :param alpha: float, optional
+
+    Example:
+        >>> a = jt.randn(3)
+        >>> a
+        jt.Var([-0.38380373 -1.1338731   2.128115  ], dtype=float32)
+        >>> nn.elu(a)
+        jt.Var([-0.31873488 -0.6782155   2.128115  ], dtype=float32)
+    '''
+    return jt.ternary(x>0,x,alpha*(x.exp()-1))
+
+def sign(x: jt.Var) -> jt.Var:
+    ''' returns the signs of elements of x
+
+    :param x: the input Var
+    :type x: jt.Var
+
+    Example:
+        >>> a = jt.float32([0.99, 0, -0.99])
+        >>> nn.sign(a)
+        jt.Var([ 1.  0. -1.], dtype=float32)
+    '''
     one = jt.ones(x.shape)
     x = jt.ternary(x>0, one, x)
     return jt.ternary(x<0, -one, x)
 
 def gelu(x):
+    r''' Applies the element-wise function:
+
+   .. math:: \text{GELU}(x) = x * \Phi(x)
+
+    where :math:`\Phi(x)` is the Cumulative Distribution Function for Gaussian Distribution.
+
+    :param x: the input var
+    :type x: jt.Var
+
+    Example:
+        >>> a = jt.randn(3)
+        >>> a
+        jt.Var([-0.38380373 -1.1338731   2.128115  ], dtype=float32)
+        >>> nn.gelu(a)
+        jt.Var([-0.134547   0.9882567  6.128115 ], dtype=float32)
+    '''
     _sqrt2 = 1.4142135623730951
     erf = jt.erf(x/_sqrt2)+1
     r = erf*x*.5
     return r
 
 class ELU(Module):
+    r''' Applies the element-wise function:
+
+    .. math::
+        \text{ELU}(x) = \begin{cases}
+        x, & \text{ if } x > 0\\
+        \alpha * (\exp(x) - 1), & \text{ if } x \leq 0
+        \end{cases}
+
+    :param x: the input var
+    :type x: jt.Var
+
+    :param alpha: the :math:`\alpha` value for the ELU formulation. Default: 1.0
+    :param alpha: float, optional
+
+    Example:
+        >>> a = jt.randn(3)
+        >>> a
+        jt.Var([-0.38380373 -1.1338731   2.128115  ], dtype=float32)
+        >>> nn.elu(a)
+        jt.Var([-0.31873488 -0.6782155   2.128115  ], dtype=float32)
+    '''
     def __init__(self,alpha=1.0):
         self.alpha=alpha
     
@@ -178,6 +307,31 @@ class ELU(Module):
         return elu(x,self.alpha)
 
 class PReLU(Module):
+    r''' Applies the element-wise function:
+
+    .. math::
+        \text{PReLU}(x) =
+        \begin{cases}
+        x, & \text{ if } x \geq 0 \\
+        ax, & \text{ otherwise }
+        \end{cases}
+
+    :param x: the input var
+    :type x: jt.Var
+
+    :param num_parameters: number of :math:`a` to learn, can be either 1 or the number of channels at input. Default: 1
+    :type num_parameters: int, optional
+
+    :param init: the initial value of :math:`a`. Default: 0.25
+    :param init: float, optional
+
+    Example:
+        >>> a = jt.randn(3)
+        >>> prelu = nn.PReLU()
+        >>> prelu(a)
+        jt.Var([-0.09595093  1.1338731   6.128115  ], dtype=float32)
+    '''
+
     def __init__(self, num_parameters=1, init_=0.25):
         self.num_parameters = num_parameters
         self.weight = init.constant((num_parameters,), "float32", init_)
@@ -600,6 +754,41 @@ GELU = jt.make_module(gelu)
 from jittor.depthwise_conv import DepthwiseConv
 
 class Conv(Module):
+    ''' Applies a 2D convolution over an input signal composed of several input planes.
+
+    :param in_channels: Number of channels in the input feature map
+    :type in_channels: int
+
+    :param out_channels: Number of channels in the output feature map
+    :type out_channels: int
+
+    :param kernel_size: Size of the convolving kernel
+    :type kernel_size: int or tuple
+
+    :param stride: Stride of the convolution. Default: 1
+    :type stride: int or tuple, optional
+
+    :param padding: Padding added to all four sides of the input. Default: 0
+    :type padding: int or tuple, optional
+
+    :param dilation: Spacing between kernel elements. Default: 1
+    :type dilation: int or tuple, optional
+
+    :param groups: Number of blocked connections from input channels to output channels. Default: 1
+    :type groups: int, optional
+
+    :param bias: If True, adds a learnable bias to the output. Default: True
+    :type bias: bool, optional
+
+    Example:
+
+    >>> conv = nn.Conv2d(24, 32, 3)
+    >>> conv = nn.Conv2d(24, 32, (3,3))
+    >>> conv = nn.Conv2d(24, 32, 3, stride=2, padding=1)
+    >>> conv = nn.Conv2d(24, 32, 3, dilation=(3, 1))
+    >>> input = jt.randn(4, 24, 100, 100)
+    >>> output = conv(input)
+    '''
     def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True):
         self.in_channels = in_channels
         self.out_channels = out_channels
@@ -695,6 +884,41 @@ class Conv(Module):
 Conv2d = Conv
 
 class Conv1d(Module):
+    ''' Applies a 1D convolution over an input signal composed of several input planes.
+
+    :param in_channels: Number of channels in the input feature map
+    :type in_channels: int
+
+    :param out_channels: Number of channels in the output feature map
+    :type out_channels: int
+
+    :param kernel_size: Size of the convolving kernel
+    :type kernel_size: int or tuple
+
+    :param stride: Stride of the convolution. Default: 1
+    :type stride: int or tuple, optional
+
+    :param padding: Padding added to all four sides of the input. Default: 0
+    :type padding: int or tuple, optional
+
+    :param dilation: Spacing between kernel elements. Default: 1
+    :type dilation: int or tuple, optional
+
+    :param groups: Number of blocked connections from input channels to output channels. Default: 1
+    :type groups: int, optional
+
+    :param bias: If True, adds a learnable bias to the output. Default: True
+    :type bias: bool, optional
+
+    Example:
+
+    >>> conv = nn.Conv1d(24, 32, 3)
+    >>> conv = nn.Conv1d(24, 32, (3,3))
+    >>> conv = nn.Conv1d(24, 32, 3, stride=2, padding=1)
+    >>> conv = nn.Conv1d(24, 32, 3, dilation=(3, 1))
+    >>> input = jt.randn(4, 24, 100)
+    >>> output = conv(input)
+    '''
     def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True):
         self.in_channels = in_channels
         self.out_channels = out_channels
@@ -721,6 +945,41 @@ class Conv1d(Module):
         return y
 
 class Conv3d(Module):
+    ''' Applies a 3D convolution over an input signal composed of several input planes.
+
+    :param in_channels: Number of channels in the input feature map
+    :type in_channels: int
+
+    :param out_channels: Number of channels in the output feature map
+    :type out_channels: int
+
+    :param kernel_size: Size of the convolving kernel
+    :type kernel_size: int or tuple
+
+    :param stride: Stride of the convolution. Default: 1
+    :type stride: int or tuple, optional
+
+    :param padding: Padding added to all four sides of the input. Default: 0
+    :type padding: int or tuple, optional
+
+    :param dilation: Spacing between kernel elements. Default: 1
+    :type dilation: int or tuple, optional
+
+    :param groups: Number of blocked connections from input channels to output channels. Default: 1
+    :type groups: int, optional
+
+    :param bias: If True, adds a learnable bias to the output. Default: True
+    :type bias: bool, optional
+
+    Example:
+
+    >>> conv = nn.Conv3d(24, 32, 3)
+    >>> conv = nn.Conv3d(24, 32, (3,3))
+    >>> conv = nn.Conv3d(24, 32, 3, stride=2, padding=1)
+    >>> conv = nn.Conv3d(24, 32, 3, dilation=(3, 1))
+    >>> input = jt.randn(4, 24, 50, 50, 50)
+    >>> output = conv(input)
+    '''
     def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True):
         self.in_channels = in_channels
         self.out_channels = out_channels
@@ -750,6 +1009,35 @@ class Conv3d(Module):
         return conv3d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
 
 def conv2d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
+    ''' Applies a 2D convolution over an input signal composed of several input planes.
+
+    :param x: the input image
+    :type x: jt.Var
+
+    :param weight: the convolution kernel
+    :type weight: jt.Var
+
+    :param bias: the bias after convolution
+    :type bias: jt,Var, optional
+
+    :param stride: Stride of the convolution. Default: 1
+    :type stride: int or tuple, optional
+
+    :param padding: Padding added to all four sides of the input. Default: 0
+    :type padding: int or tuple, optional
+
+    :param dilation: Spacing between kernel elements. Default: 1
+    :type dilation: int or tuple, optional
+
+    :param groups: Number of blocked connections from input channels to output channels. Default: 1
+    :type groups: int, optional
+
+    Example:
+
+    >>> x = jt.randn(4, 24, 100, 100)
+    >>> w = jt.randn(32, 24, 3, 3)
+    >>> y = nn.conv2d(x, w)
+    '''
     padding = _pair(padding)
     stride = _pair(stride)
     dilation = _pair(dilation)
@@ -808,6 +1096,35 @@ def conv2d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
         return y
 
 def conv3d(x, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
+    ''' Applies a 3D convolution over an input signal composed of several input planes.
+
+    :param x: the input volume
+    :type x: jt.Var
+
+    :param weight: the convolution kernel
+    :type weight: jt.Var
+
+    :param bias: the bias after convolution
+    :type bias: jt,Var, optional
+
+    :param stride: Stride of the convolution. Default: 1
+    :type stride: int or tuple, optional
+
+    :param padding: Padding added to all four sides of the input. Default: 0
+    :type padding: int or tuple, optional
+
+    :param dilation: Spacing between kernel elements. Default: 1
+    :type dilation: int or tuple, optional
+
+    :param groups: Number of blocked connections from input channels to output channels. Default: 1
+    :type groups: int, optional
+
+    Example:
+
+    >>> x = jt.randn(4, 24, 50, 50, 50)
+    >>> w = jt.randn(32, 24, 3, 3, 3)
+    >>> y = nn.conv2d(x, w)
+    '''
     padding = _triple(padding)
     stride = _triple(stride)
     dilation = _triple(dilation)
@@ -1171,13 +1488,30 @@ class ReplicationPad2d(Module):
         ])
 
 class Embedding(Module):
+    ''' A simple lookup table that stores embeddings of a fixed dictionary and size.
+
+        :param num: size of the dictionary of embeddings
+        :type num: int
+
+        :param dim: the size of each embedding vector
+        :type dim: int
+
+        Example:
+            >>> embedding = nn.Embedding(10, 3)
+            >>> x = jt.int32([1, 2, 3, 3])
+            >>> embedding(x)
+            jt.Var([[ 1.1128596   0.19169547  0.706642]
+             [ 1.2047412   1.9668795   0.9932192]
+             [ 0.14941819  0.57047683 -1.3217674]
+             [ 0.14941819  0.57047683 -1.3217674]], dtype=float32)
+    '''
     def __init__(self, num, dim):
         self.num = num
         self.dim = dim
         self.weight = jt.init.gauss([num,dim],'float32').stop_grad()
 
     def execute(self, x):
-        res = self.weight[x].reshape([x.shape[0],self.dim])
+        res = self.weight[x.flatten()].reshape(x.shape + [self.dim])
         return res
 
 class PixelShuffle(Module):
@@ -1764,30 +2098,30 @@ ModuleList = Sequential
 
 
 class LSTMCell(jt.Module):
+    ''' A long short-term memory (LSTM) cell.
+
+    :param input_size: The number of expected features in the input
+    :type input_size: int
+
+    :param hidden_size: The number of features in the hidden state
+    :type hidden_size: int
+
+    :param bias: If False, then the layer does not use bias weights b_ih and b_hh. Default: True.
+    :type bias: bool, optional
+
+    Example:
+
+    >>> rnn = nn.LSTMCell(10, 20) # (input_size, hidden_size)
+    >>> input = jt.randn(2, 3, 10) # (time_steps, batch, input_size)
+    >>> hx = jt.randn(3, 20) # (batch, hidden_size)
+    >>> cx = jt.randn(3, 20)
+    >>> output = []
+    >>> for i in range(input.shape[0]):
+            hx, cx = rnn(input[i], (hx, cx))
+            output.append(hx)
+    >>> output = jt.stack(output, dim=0)
+    '''
     def __init__(self, input_size, hidden_size, bias=True):
-        ''' A long short-term memory (LSTM) cell.
-
-        :param input_size: The number of expected features in the input
-        :type input_size: int
-
-        :param hidden_size: The number of features in the hidden state
-        :type hidden_size: int
-
-        :param bias: If False, then the layer does not use bias weights b_ih and b_hh. Default: True.
-        :type bias: bool, optional
-
-        Example:
-
-        >>> rnn = nn.LSTMCell(10, 20) # (input_size, hidden_size)
-        >>> input = jt.randn(2, 3, 10) # (time_steps, batch, input_size)
-        >>> hx = jt.randn(3, 20) # (batch, hidden_size)
-        >>> cx = jt.randn(3, 20)
-        >>> output = []
-        >>> for i in range(input.shape[0]):
-                hx, cx = rnn(input[i], (hx, cx))
-                output.append(hx)
-        >>> output = jt.stack(output, dim=0)
-        '''
         super().__init__()
 
         self.hidden_size = hidden_size
@@ -1825,31 +2159,31 @@ class LSTMCell(jt.Module):
 
 
 class RNNCell(jt.Module):
+    ''' An Elman RNN cell with tanh or ReLU non-linearity.
+
+    :param input_size: The number of expected features in the input
+    :type input_size: int
+
+    :param hidden_size: The number of features in the hidden state
+    :type hidden_size: int
+
+    :param bias: If False, then the layer does not use bias weights b_ih and b_hh. Default: True.
+    :type bias: bool, optional
+
+    :param nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'.
+    :type nonlinearity: str, optional
+
+    Example:
+
+    >>> rnn = nn.RNNCell(10, 20)
+    >>> input = jt.randn((6, 3, 10))
+    >>> hx = jt.randn((3, 20))
+    >>> output = []
+    >>> for i in range(6):
+            hx = rnn(input[i], hx)
+            output.append(hx)
+    '''
     def __init__(self, input_size, hidden_size, bias=True, nonlinearity = "tanh"):
-        ''' An Elman RNN cell with tanh or ReLU non-linearity.
-
-        :param input_size: The number of expected features in the input
-        :type input_size: int
-
-        :param hidden_size: The number of features in the hidden state
-        :type hidden_size: int
-
-        :param bias: If False, then the layer does not use bias weights b_ih and b_hh. Default: True.
-        :type bias: bool, optional
-
-        :param nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'.
-        :type nonlinearity: str, optional
-
-        Example:
-
-        >>> rnn = nn.RNNCell(10, 20)
-        >>> input = jt.randn((6, 3, 10))
-        >>> hx = jt.randn((3, 20))
-        >>> output = []
-        >>> for i in range(6):
-                hx = rnn(input[i], hx)
-                output.append(hx)
-        '''
         super().__init__()
 
         self.hidden_size = hidden_size
@@ -1884,28 +2218,28 @@ class RNNCell(jt.Module):
 
 
 class GRUCell(jt.Module):
+    ''' A gated recurrent unit (GRU) cell.
+
+    :param input_size: The number of expected features in the input
+    :type input_size: int
+
+    :param hidden_size: The number of features in the hidden state
+    :type hidden_size: int
+
+    :param bias: If False, then the layer does not use bias weights b_ih and b_hh. Default: True.
+    :type bias: bool, optional
+
+    Example:
+
+    >>> rnn = nn.GRUCell(10, 20)
+    >>> input = jt.randn((6, 3, 10))
+    >>> hx = jt.randn((3, 20))
+    >>> output = []
+    >>> for i in range(6):
+            hx = rnn(input[i], hx)
+            output.append(hx)
+    '''
     def __init__(self, input_size, hidden_size, bias=True):
-        ''' A gated recurrent unit (GRU) cell.
-
-        :param input_size: The number of expected features in the input
-        :type input_size: int
-
-        :param hidden_size: The number of features in the hidden state
-        :type hidden_size: int
-
-        :param bias: If False, then the layer does not use bias weights b_ih and b_hh. Default: True.
-        :type bias: bool, optional
-
-        Example:
-
-        >>> rnn = nn.GRUCell(10, 20)
-        >>> input = jt.randn((6, 3, 10))
-        >>> hx = jt.randn((3, 20))
-        >>> output = []
-        >>> for i in range(6):
-                hx = rnn(input[i], hx)
-                output.append(hx)
-        '''
         super().__init__()
 
         self.hidden_size = hidden_size
diff --git a/python/jittor/optim.py b/python/jittor/optim.py
index 6313f433..6af98f02 100644
--- a/python/jittor/optim.py
+++ b/python/jittor/optim.py
@@ -122,11 +122,19 @@ class Optimizer(object):
 
         # sync grads and model if in mpi
         if jt.in_mpi:
+            dep = []
+            def add_dep(v):
+                nonlocal dep
+                v._add_dependency(dep)
+                dep = [v]
+
             for g in grads:
                 g.assign(g.mpi_all_reduce("mean"))
+                add_dep(g._input(0))
             if self.n_step % self.param_sync_iter == 0:
                 for p in params:
                     p.assign(p.mpi_broadcast())
+                    add_dep(p)
         self.n_step += 1
 
         # set up grads in param_groups
diff --git a/python/jittor/pyjt_compiler.py b/python/jittor/pyjt_compiler.py
index ced8cb03..1a7789e0 100644
--- a/python/jittor/pyjt_compiler.py
+++ b/python/jittor/pyjt_compiler.py
@@ -692,13 +692,20 @@ def compile_src(src, h, basename):
             }} catch (const std::exception& e) {{
                 if (!PyErr_Occurred()) {{
                     std::stringstream ss;
-                    ss {error_log_code};
-                    PyErr_Format(PyExc_RuntimeError, 
-                        "%s\\n%s\\nFailed reason:%s",
-                        ss.str().c_str(),
-                        R""({decs})"",
-                        e.what()
-                    );
+                    if (check_async_executor_error(e, ss)) {{
+                        PyErr_Format(PyExc_RuntimeError, 
+                            "%s",
+                            ss.str().c_str()
+                        );
+                    }} else {{
+                        ss {error_log_code};
+                        PyErr_Format(PyExc_RuntimeError, 
+                            "%s\\n%s\\nFailed reason:%s",
+                            ss.str().c_str(),
+                            R""({decs})"",
+                            e.what()
+                        );
+                    }}
                 }}
             }}
             {func_return_failed};
diff --git a/python/jittor/src/executor.cc b/python/jittor/src/executor.cc
index 8cdfa60f..c80ec0ca 100644
--- a/python/jittor/src/executor.cc
+++ b/python/jittor/src/executor.cc
@@ -29,6 +29,7 @@
 #include "misc/nan_checker.h"
 #include "memory_profiler.h"
 #include "utils/seh.h"
+#include "utils/cache_compile.h"
 
 namespace jittor {
 
@@ -101,6 +102,55 @@ void load_fused_op(FusedOp& fused_op, vector<int>& fuse_ops, vector<Op*>& ops, i
     }
 }
 
+void check_op_async_error(Op* op, bool is_fused_op, const std::exception& e, jittor::Log& logf) {
+    vector<Stack> stack;
+    if (is_fused_op) {
+        FusedOp& fused_op = *((FusedOp*)op);
+        logf >> "[OP TYPE]:" << "fused_op:(";
+        for (auto& op : fused_op.ops)
+            logf << op->name_ex() >> ",";
+        logf >> ")\n";
+        logf >> "[Input]:";
+        for (auto& vi : fused_op.vars)
+            if (vi.type == 0) logf << vi.var->dtype() >> vi.var->shape >> vi.var->name >> ",";
+        logf << "\n[Output]:";
+        Var* ov = nullptr;
+        for (auto& vi : fused_op.vars)
+            if (vi.type == 2) {
+                logf << vi.var->dtype() >> vi.var->shape >> vi.var->name >> ",";
+                ov = vi.var;
+            }
+        if (ov)
+            stack = get_node_trace(ov);
+    } else {
+        logf >> "[OP TYPE]:" << op->name_ex();
+        logf << "\n[Input]:";
+        for (auto v : op->inputs())
+            logf << v->dtype() >> v->shape >> v->name >> ",";
+        logf << "\n[Output]:";
+        Var* ov = nullptr;
+        for (auto v : op->outputs()) {
+            logf << v->dtype() >> v->shape >> v->name >> ",";
+            ov = v;
+        }
+        if (ov)
+            stack = get_node_trace(ov);
+    }
+    logf << "\n[Async Backtrace]:";
+    if (stack.size()) {
+        logf << "---";
+        for (auto& s : stack) {
+            logf << "\n    " << s.file_path >> ":" >> s.lineno;
+            if (s.module_type.size()) logf << '<' >> s.module_type >> '>';
+            if (s.module_name.size() && s.module_name.find(":") == string::npos)
+                logf << '[' >> s.module_name >> ']';
+        }
+    } else
+        logf << "not found, please set env JT_SYNC=1, trace_py_var=3";
+    logf << "\n[Reason]:" << e.what();
+    jittor::LogFatalVoidify() && logf;
+}
+
 void Executor::run_sync(vector<Var*> vars, bool device_sync) {
     auto allocator = get_allocator();
     auto temp_allocator = get_allocator(true);
@@ -531,13 +581,12 @@ void Executor::run_sync(vector<Var*> vars, bool device_sync) {
             // log jit_key and file location
             op->do_prepare(jkl);
             string jit_src_path = Op::get_filename_from_jit_key(jkl.to_cstring(), ".cc");
-            LOGe << "[Error] source file location:" << jit_src_path;
-            if (is_fused_op) {
-                LOGf << "Execute fused operator(" >> rid >> '/' >> queue.size() >> ")"
-                    << "failed:" << fused_op.ops << "\n\nReason: " >> e.what();
-            } else
-                LOGf << "Execute operator(" >> rid >> '/' >> queue.size() >> ")"
-                    << "failed:" << op << "\n\nReason: " >> e.what();
+            jittor::Log logf(__FILELINE__, 'f', 0);
+            logf << "\nExecute fused operator(" >> rid >> '/' >> queue.size() >> ")"
+                << "failed.";
+            if (jit_compiler::file_exist(jit_src_path))
+                logf << "\n[JIT Source]:" << jit_src_path << "\n";
+            check_op_async_error(op, is_fused_op, e, logf);
         }
     }
     LOGvv << "All" << op_num << "ops finished, return vars:" << vars;
diff --git a/python/jittor/src/mem/allocator.cc b/python/jittor/src/mem/allocator.cc
index 1e425b4f..a6f1821b 100644
--- a/python/jittor/src/mem/allocator.cc
+++ b/python/jittor/src/mem/allocator.cc
@@ -107,7 +107,7 @@ void migrate_to_cpu(Var* var, Allocator* allocator) {
     if (!use_cuda_managed_allocator) {
         // must be a device allocator
         Allocation a(allocator, var->size);
-        checkCudaErrors(cudaMemcpy(a.ptr, var->mem_ptr, var->size, cudaMemcpyDeviceToHost));
+        checkCudaErrors(cudaMemcpy(a.ptr, var->mem_ptr, var->size, cudaMemcpyDefault));
         var->allocator->free(var->mem_ptr, var->size, var->allocation);
         var->mem_ptr = a.ptr;
         var->allocation = a.allocation;
diff --git a/python/jittor/src/mem/mem_info.cc b/python/jittor/src/mem/mem_info.cc
index 288c438e..b042ce2e 100644
--- a/python/jittor/src/mem/mem_info.cc
+++ b/python/jittor/src/mem/mem_info.cc
@@ -128,6 +128,7 @@ void display_memory_info(const char* fileline, bool dump_var, bool red_color) {
     log << "cpu&gpu:" << FloatOutput{(double)all_total, " KMG", 1024, "B"}
         << "gpu:" << FloatOutput{(double)gpu_total, " KMG", 1024, "B"}
         << "cpu:" << FloatOutput{(double)cpu_total, " KMG", 1024, "B"} >> '\n';
+    
     size_t cpu_free = 0;
 #if defined(__linux__)
     cpu_free = get_avphys_pages() * sysconf(_SC_PAGESIZE);
@@ -185,6 +186,20 @@ void display_memory_info(const char* fileline, bool dump_var, bool red_color) {
         }
     }
     log >> "===========================\n";
+
+    if (red_color) {
+        bool gpu_overflow = (double)gpu_total>(double)mem_info.total_cuda_ram*0.95;
+        bool cpu_overflow = (double)cpu_total>(double)mem_info.total_cpu_ram*0.95;
+        if(gpu_overflow || cpu_overflow) {
+            double used = gpu_overflow ? (double)gpu_total : (double)cpu_total;
+            double total = gpu_overflow ? (double)mem_info.total_cuda_ram : (double)mem_info.total_cpu_ram;
+            log.end();
+            LOGf << "\n*******************\n"
+                >> (gpu_overflow?"GPU":"CPU") << "memory is overflow, please reduce your batch_size or data size!\nTotal:" << FloatOutput{(double)total, " KMG", 1024, "B"} << "Used:" << FloatOutput{(double)used, " KMG", 1024, "B"};
+        } else
+            return;
+    }
+
     log.end();
 }
 
diff --git a/python/jittor/src/misc/cpu_math.cc b/python/jittor/src/misc/cpu_math.cc
new file mode 100644
index 00000000..331d0a0f
--- /dev/null
+++ b/python/jittor/src/misc/cpu_math.cc
@@ -0,0 +1,57 @@
+// ***************************************************************
+// Copyright (c) 2021 Jittor. All Rights Reserved. 
+// Maintainers: Dun Liang <randonlang@gmail.com>. 
+// This file is subject to the terms and conditions defined in
+// file 'LICENSE.txt', which is part of this source code package.
+// ***************************************************************
+#include <cmath>
+#include <limits>
+#include "misc/cpu_math.h"
+
+namespace jittor {
+
+#define CENTRAL_RANGE 0.7
+
+template <typename T>
+static inline typename std::enable_if<std::is_floating_point<T>::value, T>::type
+calc_erfinv(T y) {
+/* Function to calculate inverse error function.  Rational approximation
+is used to generate an initial approximation, which is then improved to
+full accuracy by two steps of Newton's method.  Code is a direct
+translation of the erfinv m file in matlab version 2.0.
+Author:  Gary L. Pavlis, Indiana University
+Date:  February 1996
+*/
+  T x, z, num, dem; /*working variables */
+  /* coefficients in rational expansion */
+  T a[4]={ 0.886226899, -1.645349621,  0.914624893, -0.140543331};
+  T b[4]={-2.118377725,  1.442710462, -0.329097515,  0.012229801};
+  T c[4]={-1.970840454, -1.624906493,  3.429567803,  1.641345311};
+  T d[2]={ 3.543889200,  1.637067800};
+  T y_abs = std::abs(y);
+  if(y_abs > 1.0) return std::numeric_limits<T>::quiet_NaN();
+  if(y_abs == 1.0) return std::copysign(std::numeric_limits<T>::infinity(), y);
+  if(y_abs <= static_cast<T>(CENTRAL_RANGE)) {
+    z = y * y;
+    num = (((a[3]*z + a[2])*z + a[1])*z + a[0]);
+    dem = ((((b[3]*z + b[2])*z + b[1])*z +b[0]) * z + static_cast<T>(1.0));
+    x = y * num / dem;
+  }
+  else{
+    z = std::sqrt(-std::log((static_cast<T>(1.0)-y_abs)/static_cast<T>(2.0)));
+    num = ((c[3]*z + c[2])*z + c[1]) * z + c[0];
+    dem = (d[1]*z + d[0])*z + static_cast<T>(1.0);
+    x = std::copysign(num, y) / dem;
+  }
+  /* Two steps of Newton-Raphson correction */
+  x = x - (std::erf(x) - y) / ((static_cast<T>(2.0)/static_cast<T>(std::sqrt(M_PI)))*std::exp(-x*x));
+  x = x - (std::erf(x) - y) / ((static_cast<T>(2.0)/static_cast<T>(std::sqrt(M_PI)))*std::exp(-x*x));
+
+  return x;
+}
+
+float _erfinv(float y) { return calc_erfinv(y); };
+double _erfinv(double y) { return calc_erfinv(y); };
+
+}
+
diff --git a/python/jittor/src/misc/cpu_math.h b/python/jittor/src/misc/cpu_math.h
new file mode 100644
index 00000000..9558c64e
--- /dev/null
+++ b/python/jittor/src/misc/cpu_math.h
@@ -0,0 +1,16 @@
+// ***************************************************************
+// Copyright (c) 2021 Jittor. All Rights Reserved. 
+// Maintainers: Dun Liang <randonlang@gmail.com>. 
+// This file is subject to the terms and conditions defined in
+// file 'LICENSE.txt', which is part of this source code package.
+// ***************************************************************
+#pragma once
+#include "common.h"
+
+namespace jittor {
+
+float _erfinv(float y);
+double _erfinv(double y);
+
+}
+
diff --git a/python/jittor/src/misc/cuda_flags.cc b/python/jittor/src/misc/cuda_flags.cc
index 4c1c5cdc..7e58e767 100644
--- a/python/jittor/src/misc/cuda_flags.cc
+++ b/python/jittor/src/misc/cuda_flags.cc
@@ -22,7 +22,13 @@ void setter_use_cuda(int value) {
     if (value) {
         int count=0;
         cudaGetDeviceCount(&count);
-        CHECK(count>0) << "No device found.";
+        if (count == 0) {
+            if (getenv("CUDA_VISIBLE_DEVICES")) {
+                LOGf << "No device found, please unset your "
+                "enviroment variable 'CUDA_VISIBLE_DEVICES'";
+            } else
+                LOGf << "No device found";
+        }
         LOGi << "CUDA enabled.";
     } else {
         LOGv << "CUDA disabled.";
diff --git a/python/jittor/src/misc/nano_string.cc b/python/jittor/src/misc/nano_string.cc
index 142df6cb..b85bf861 100644
--- a/python/jittor/src/misc/nano_string.cc
+++ b/python/jittor/src/misc/nano_string.cc
@@ -85,7 +85,8 @@ static unordered_set<string> unary_ops = {
     "cosh",
     "acosh",
     "sigmoid",
-    "erf"
+    "erf",
+    "erfinv"
 };
 
 static unordered_set<string> unary_float_ops = {
diff --git a/python/jittor/src/misc/nano_string.h b/python/jittor/src/misc/nano_string.h
index 60f1c3d5..12feb331 100644
--- a/python/jittor/src/misc/nano_string.h
+++ b/python/jittor/src/misc/nano_string.h
@@ -80,6 +80,7 @@ constexpr int ns_max_len = 16;
     m(cosh) \
     m(acosh) \
     m(erf) \
+    m(erfinv) \
     m(sigmoid) \
     \
     m(uniform) \
diff --git a/python/jittor/src/misc/nano_vector.h b/python/jittor/src/misc/nano_vector.h
index 983d834a..ef9ccbfa 100644
--- a/python/jittor/src/misc/nano_vector.h
+++ b/python/jittor/src/misc/nano_vector.h
@@ -155,6 +155,7 @@ struct NanoVector {
         return nv;
     }
 
+    // @pyjt(__init__)
     inline NanoVector(int64 x) { push_back(x); }
 
     // @pyjt(__repr__)
diff --git a/python/jittor/src/op_compiler.cc b/python/jittor/src/op_compiler.cc
index a34d1356..5e9b56bf 100644
--- a/python/jittor/src/op_compiler.cc
+++ b/python/jittor/src/op_compiler.cc
@@ -856,14 +856,19 @@ string OpCompiler::__get_fused_src(
             string arg_name = op_name + "_output";
             string argp_name = op_name + "_outputp";
             string T = ((ArrayOp*)ops[oi])->output->dtype().to_cstring();
-            fused_kernel_args += "    ArrayOp* " + op_name + " = (ArrayOp*)(ops[" + S(oi) + "]);\n";
-            // op_name = "((ArrayOp*)(ops[" + S(oi) + "]))";
-            fused_kernel_args += "    Var* " + arg_name + " = " + op_name + "->output;\n";
 
-            fused_kernel += "    auto* " + argp_name + " = " + arg_name + "->ptr<" + T + ">();\n";
-            fused_kernel += "    " + argp_name + "[0] = " + op_name + "->ptr<" + T + ">()[0];\n";
-            fused_kernel += "    int " + arg_name + "shape0 = 1;\n";
-            fused_kernel += "    int " + arg_name + "stride0 = 1;\n";
+            fused_kernel_args += precompile({{"oi",S(oi)}, {"T", T}}, R"(
+                Var* op@oi@@_output = ((ArrayOp*)(ops[@oi]))->output;
+                @T op@oi@@_outputv = ((ArrayOp*)(ops[@oi]))->ptr<@T>()[0];
+            )");
+
+
+            fused_kernel += precompile({{"oi",S(oi)}, {"T", T}}, R"(
+                @T* op@oi@@_outputp = op@oi@@_output->ptr<@T>();
+                op@oi@@_outputp[0] = op@oi@@_outputv;
+            )");
+
+
 
             fused_includes += "#include \"ops/array_op.h\"\n";
             op_members[oi].push_back(arg_name);
diff --git a/python/jittor/src/ops/getitem_op.cc b/python/jittor/src/ops/getitem_op.cc
index 46255715..e33b9d4c 100644
--- a/python/jittor/src/ops/getitem_op.cc
+++ b/python/jittor/src/ops/getitem_op.cc
@@ -323,7 +323,7 @@ void GetitemOp::_compile_optimize(string& src) {
         }
         if (!has_zero) {
             func->push_back("int no = o_shape.size();");
-            func->push_back("int masks[no];");
+            func->push_back("STACK_ALLOC(int,masks,no);");
             func->push_back("int tdims[6];");
             func->push_back("cuda_loop_schedule(o_shape, masks, tdims);");
             func->push_back("dim3 grid_dim(tdims[3],tdims[4],tdims[5]);");
diff --git a/python/jittor/src/ops/reduce_op.cc b/python/jittor/src/ops/reduce_op.cc
index dc056af5..fd4c5f29 100644
--- a/python/jittor/src/ops/reduce_op.cc
+++ b/python/jittor/src/ops/reduce_op.cc
@@ -34,7 +34,7 @@ unordered_set<string> reduce_ops = {
 
     * [in] x:       the input jt.Var.
 
-    * [in] dim:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
 
     * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
 
@@ -65,7 +65,7 @@ unordered_set<string> reduce_ops = {
 
     * [in] x:       the input jt.Var.
 
-    * [in] dim:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
 
     * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
 
@@ -96,7 +96,7 @@ unordered_set<string> reduce_ops = {
 
     * [in] x:       the input jt.Var.
 
-    * [in] dim:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
 
     * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
 
@@ -127,7 +127,7 @@ unordered_set<string> reduce_ops = {
 
     * [in] x:       the input jt.Var.
 
-    * [in] dim:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
 
     * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
 
@@ -158,7 +158,7 @@ unordered_set<string> reduce_ops = {
 
     * [in] x:       the input jt.Var.
 
-    * [in] dim:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
 
     * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
 
@@ -189,7 +189,7 @@ unordered_set<string> reduce_ops = {
 
     * [in] x:       the input jt.Var.
 
-    * [in] dim:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
 
     * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
 
@@ -224,7 +224,7 @@ unordered_set<string> reduce_ops = {
 
     * [in] x:       the input jt.Var.
 
-    * [in] dim:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
+    * [in] dim or dims:     int or tuples of ints (optional). If specified, reduce along the given the dimension(s).
 
     * [in] keepdims: bool (optional). Whether the output has ``dim`` retained or not. Defaults to be False.
 
diff --git a/python/jittor/src/ops/unary_op.cc b/python/jittor/src/ops/unary_op.cc
index b7504591..c1ef95d2 100644
--- a/python/jittor/src/ops/unary_op.cc
+++ b/python/jittor/src/ops/unary_op.cc
@@ -5,6 +5,7 @@
 // file 'LICENSE.txt', which is part of this source code package.
 // ***************************************************************
 #include <cmath>
+#include "misc/cpu_math.h"
 #include "var.h"
 #include "ops/unary_op.h"
 #include "ops/unary_op_defs.h"
@@ -523,6 +524,7 @@ static unordered_set<string> unary_ops = {
         jt.Var([ 0.51559156  0.45739546 -0.85728306 -0.9258883 ], dtype=float32)
      */
     "erf",
+    "erfinv",
 };
 
 UnaryOp::UnaryOp(Var* x, NanoString op) : x(x) {
@@ -659,6 +661,14 @@ VarPtr UnaryOp::grad(Var* out, Var* dout, Var* v, int v_index) {
         r = make_binary(r, two_div_sqrt_pi, ns_multiply);
         return make_binary(dout, r, ns_multiply);
     }
+    // derfinv(x) = sqrt(pi) / 2 * exp(erfinv(x)^2)
+    if (ns == ns_erfinv) {
+        auto sqrt_pi_div_two = make_number(1.7724538509055159/2, x);
+        auto y2 = make_binary(y, y, ns_multiply);
+        auto r = make_unary(y2, ns_exp);
+        r = make_binary(r, sqrt_pi_div_two, ns_multiply);
+        return make_binary(dout, r, ns_multiply);
+    }
     return nullptr;
 }
 
diff --git a/python/jittor/src/ops/unary_op_defs.h b/python/jittor/src/ops/unary_op_defs.h
index 3913d480..6e8a9a99 100644
--- a/python/jittor/src/ops/unary_op_defs.h
+++ b/python/jittor/src/ops/unary_op_defs.h
@@ -43,6 +43,7 @@ namespace jittor {
 #define sigmoid(T,x) ((T) (1.0f/(1.0f+::expf((::min(T(-(x)), T(@if(@strcmp(@T,float32)==0,30,300))))))))
 
 #define erf(T,x) ((T) ::erff((x)))
+#define erfinv(T,x) ((T) ::erfinvf((T)(x)))
 
 #else
 #define abs(T,x) std::abs(x)
@@ -74,6 +75,7 @@ namespace jittor {
 #define sigmoid(T,x) ((T) (1.0f/(1.0f+std::exp(std::min(T(-(x)), T(@if(@strcmp(@T,float32)==0,30,300)))))))
 
 #define erf(T,x) ((T) std::erf((x)))
+#define erfinv(T,x) (jittor::_erfinv(x))
 
 #endif
 
diff --git a/python/jittor/src/opt/pass/loop_to_func_pass.cc b/python/jittor/src/opt/pass/loop_to_func_pass.cc
index 9136c513..9d43711d 100644
--- a/python/jittor/src/opt/pass/loop_to_func_pass.cc
+++ b/python/jittor/src/opt/pass/loop_to_func_pass.cc
@@ -56,6 +56,11 @@ void LoopToFuncPass::run() {
                 if (d->has_attr("rvalue")) {
                     auto& rvalue = d->attrs["rvalue"];
                     auto& dtype = d->attrs["dtype"];
+                    if (endswith(d->attrs["lvalue"], "_value") ||
+                        endswith(d->attrs["lvalue"], "_outputv")) {
+                        args.push_back(d.get());
+                        continue;
+                    }
                     if (rvalue.find("ops") != string::npos)
                         continue;
                     if (dtype=="Var*")
@@ -67,10 +72,6 @@ void LoopToFuncPass::run() {
                         args.push_back(d.get());
                         continue;
                     }
-                    if (endswith(d->attrs["lvalue"], "_value")) {
-                        args.push_back(d.get());
-                        continue;
-                    }
                 }
             }
             func->push_back(d->clone());
diff --git a/python/jittor/src/opt/pass/loop_var_analyze_pass.cc b/python/jittor/src/opt/pass/loop_var_analyze_pass.cc
index 65e5408d..3973fbdd 100644
--- a/python/jittor/src/opt/pass/loop_var_analyze_pass.cc
+++ b/python/jittor/src/opt/pass/loop_var_analyze_pass.cc
@@ -122,6 +122,9 @@ void LoopVarAnalyzePass::run() {
                 && (op->outputs().front()->shape.size() != max_elm_dim || 
                     std::abs(op->outputs().front()->num) != max_elm_size))
                 continue;
+            if (op->name_ex() == "array")
+                // array op should not be loop var
+                continue;
             Var* loop_var;
             if (op->type() == OpType::broadcast || op->name_ex() == "index") {
                 loop_var = op->output(0);
diff --git a/python/jittor/src/opt/pass_manager.cc b/python/jittor/src/opt/pass_manager.cc
index c925649a..3ecc1d85 100644
--- a/python/jittor/src/opt/pass_manager.cc
+++ b/python/jittor/src/opt/pass_manager.cc
@@ -69,7 +69,7 @@ void PassManager::run_passes() {
         if (oc->op->flags.get(NodeFlags::_cuda)) {
             ir.children.back()->erase();
             string type = oc->op->ops[0]->outputs().front()->dtype().to_cstring();
-            ir.push_back("kernel<<<1,1>>>(op0_outputp, op0->ptr<"+type+">()[0]);");
+            ir.push_back("kernel<<<1,1>>>(op0_outputp, op0_outputv);");
             auto jt_type = type == "bool" ? type : "jittor::" + type;
             ir.push_back("__global__ static void kernel("+jt_type+"* xp, "+jt_type+" x) { xp[0] = x; } ", &ir.before, true);
         }
diff --git a/python/jittor/src/pyjt/py_caller.cc b/python/jittor/src/pyjt/py_caller.cc
index 79d92213..698623de 100644
--- a/python/jittor/src/pyjt/py_caller.cc
+++ b/python/jittor/src/pyjt/py_caller.cc
@@ -14,7 +14,7 @@ namespace jittor {
 
 string py_caller(const string& mod_func, const vector<string>& args, const map<string,string>& kw) {
     PyObjHolder mod(PyImport_ImportModule("jittor"));
-    PyObjHolder func(PyObject_GetAttrString(mod.obj, "python_pass_warper"));
+    PyObjHolder func(PyObject_GetAttrString(mod.obj, "python_pass_wrapper"));
     PyObjHolder py_name(to_py_object<string>(mod_func));
     PyObjHolder py_args(to_py_tuple(args));
     PyObjHolder py_kw(to_py_object(kw));
diff --git a/python/jittor/src/pyjt/py_converter.h b/python/jittor/src/pyjt/py_converter.h
index a703a53c..b304f76c 100644
--- a/python/jittor/src/pyjt/py_converter.h
+++ b/python/jittor/src/pyjt/py_converter.h
@@ -786,5 +786,6 @@ DEF_IS(VarSlices, T) from_py_object(PyObject* obj, vector<unique_ptr<VarHolder>>
     }
 }
 
+EXTERN_LIB bool check_async_executor_error(const std::exception& e, std::ostream& os);
 
 } // jittor
diff --git a/python/jittor/src/utils/jit_utils.cc b/python/jittor/src/utils/jit_utils.cc
index 7db727af..34c2b64a 100644
--- a/python/jittor/src/utils/jit_utils.cc
+++ b/python/jittor/src/utils/jit_utils.cc
@@ -23,6 +23,34 @@
 
 namespace jittor {
 
+bool check_async_executor_error(const std::exception& e, std::ostream& os) {
+    if (!e.what()) return false;
+    auto s = string(e.what());
+    if (s.find("executor.cc:") == string::npos) 
+        return false;
+    os << s;
+    if (getenv("JT_SYNC") && getenv("trace_py_var"))
+        return true;
+    if (s.find("[Async Backtrace]: ---") != string::npos)
+        return true;
+    os << "\n**********\nAsync error was detected. "
+        "To locate the async backtrace and get better error report, please rerun your code with "
+        "two enviroment variables set:\n"
+        #ifdef _WIN32
+        "cmd: \n"
+        ">>> set JT_SYNC=1\n"
+        ">>> set trace_py_var=3\n"
+        "powershell: \n"
+        ">>> $env:JT_SYNC=1\n"
+        ">>> $env:trace_py_var=3\n"
+        #else
+        ">>> export JT_SYNC=1\n"
+        ">>> export trace_py_var=3\n"
+        #endif
+        ;
+    return true;
+}
+
 SEH_HOOK;
 
 void init_subprocess() {
diff --git a/python/jittor/src/utils/log.cc b/python/jittor/src/utils/log.cc
index a2e05729..910e69e9 100644
--- a/python/jittor/src/utils/log.cc
+++ b/python/jittor/src/utils/log.cc
@@ -9,6 +9,7 @@
 #include <iomanip>
 #include <thread>
 #include <unordered_map>
+#include <fstream>
 #include "utils/cross_platform.h"
 #include "utils/log.h"
 #include "utils/mwsr_list.h"
@@ -368,6 +369,17 @@ void setter_log_vprefix(string value) {
     }
     vprefix_map = move(new_map);
 }
+DEFINE_FLAG_WITH_SETTER(string, log_file, "",
+    "log to file, mpi env will add $OMPI_COMM_WORLD_RANK suffix\n");
+void setter_log_file(string value) {
+    if (value.size() == 0)
+        return;
+    auto c = getenv("OMPI_COMM_WORLD_RANK");
+    if (c) value += string("_") + c;
+    static std::ofstream out;
+    out = std::ofstream(value);
+    std::cerr.rdbuf(out.rdbuf());
+}
 
 bool check_vlog(const char* fileline, int verbose) {
     uint64_t phash=0;
diff --git a/python/jittor/src/var_holder.h b/python/jittor/src/var_holder.h
index 30e13338..b2011059 100644
--- a/python/jittor/src/var_holder.h
+++ b/python/jittor/src/var_holder.h
@@ -282,6 +282,33 @@ struct VarHolder {
      */
     // @pyjt(__get__grad)
     int grad();
+
+    // @pyjt(_input)
+    inline VarHolder* _input(int i) {
+        CHECK(!var->is_finished());
+        return new VarHolder(var->input()->input(i));
+    }
+
+    /* Add dependency, make var computed after vars
+    */
+    // @pyjt(_add_dependency)
+    // @attrs(return_self)
+    inline VarHolder* _add_dependency(vector<VarHolder*>&& vars) {
+        vector<Node*> b(vars.size());
+        for (int i=0; i<vars.size(); i++)
+            b[i] = vars[i]->var;
+        CHECK(!var->is_finished());
+        auto a = var->input();
+        var->input()->add_inputs(b);
+        auto edge = a->_inputs.end();
+        for (int i=0; i<b.size(); i++) {
+            edge = std::prev(edge);
+            // set -1 mean this is a control dependency edge
+            edge->back->index = -1;
+        }
+        return this;
+    }
+
 };
 
 // @pyjt(sync)
diff --git a/python/jittor/test/test_array.py b/python/jittor/test/test_array.py
index b68e3359..c4c52f69 100644
--- a/python/jittor/test/test_array.py
+++ b/python/jittor/test/test_array.py
@@ -193,6 +193,23 @@ class TestArray(unittest.TestCase):
                 assert str(c.dtype) == t
                 np.testing.assert_allclose(a, c)
 
+    def test_scalar_fuse_unary(self):
+        with jt.profile_scope() as rep:
+            a = jt.array([1])
+            b = -a
+            a = a.clone()
+            b = b.clone()
+            jt.sync([a, b])
+            assert a.data == 1
+            assert b.data == -1
+        assert len(rep) == 2
+        
+    @unittest.skipIf(not jt.has_cuda, "Cuda not found")
+    def test_scalar_fuse_unary_cuda(self):
+        with jt.flag_scope(use_cuda=1):
+            self.test_scalar_fuse_unary()
+
+
 
 if __name__ == "__main__":
     unittest.main()
\ No newline at end of file
diff --git a/python/jittor/test/test_binary_op.py b/python/jittor/test/test_binary_op.py
index 99104d45..6eec8c0f 100644
--- a/python/jittor/test/test_binary_op.py
+++ b/python/jittor/test/test_binary_op.py
@@ -157,6 +157,12 @@ class TestBinaryOp(unittest.TestCase):
         c = a % b
         nc = a.data % b.data
         np.testing.assert_allclose(c.data, nc.data, atol=1e-5, rtol=1e-5)
+    
+    def test_pow(self):
+        # win cuda 10.2 cannot pass
+        a = jt.random((100,))
+        b = a**3
+        b.sync()
 
 
 
diff --git a/python/jittor/test/test_error_msg.py b/python/jittor/test/test_error_msg.py
new file mode 100644
index 00000000..a2771206
--- /dev/null
+++ b/python/jittor/test/test_error_msg.py
@@ -0,0 +1,71 @@
+# ***************************************************************
+# Copyright (c) 2021 Jittor. All Rights Reserved. 
+# Maintainers: 
+#    Dun Liang <randonlang@gmail.com>. 
+# 
+# This file is subject to the terms and conditions defined in
+# file 'LICENSE.txt', which is part of this source code package.
+# ***************************************************************
+import unittest
+import jittor as jt
+import numpy as np
+
+class TestErrorMsg(unittest.TestCase):
+
+    def test_error_msg(self):
+        a = jt.array([3,2,1])
+        b = jt.code(a.shape, a.dtype, [a],
+            cpu_header="""
+                #include <algorithm>
+                @alias(a, in0)
+                @alias(b, out)
+            """,
+            cpu_src="""
+                for (int i=0; i<a_shape0; i++)
+                    @b(i) = @a(i);
+                std::sort(&@b(0), &@b(in0_shape0));
+                throw std::runtime_error("???");
+            """
+        )
+        msg = ""
+        try:
+            print(b)
+        except Exception as e:
+            msg = str(e)
+        assert "[Reason]: ???" in msg
+        assert "[Input]: int32[3,]" in msg
+        assert "[OP TYPE]: code" in msg
+        assert "[Async Backtrace]:" in msg
+
+    @jt.flag_scope(trace_py_var=3)
+    def test_error_msg_trace_py_var(self):
+        a = jt.array([3,2,1])
+        b = jt.code(a.shape, a.dtype, [a],
+            cpu_header="""
+                #include <algorithm>
+                @alias(a, in0)
+                @alias(b, out)
+            """,
+            cpu_src="""
+                for (int i=0; i<a_shape0; i++)
+                    @b(i) = @a(i);
+                std::sort(&@b(0), &@b(in0_shape0));
+                throw std::runtime_error("???");
+            """
+        )
+        msg = ""
+        try:
+            print(b)
+        except Exception as e:
+            msg = str(e)
+        print(msg)
+        assert "[Reason]: ???" in msg
+        assert "[Input]: int32[3,]" in msg
+        assert "[OP TYPE]: code" in msg
+        assert "[Async Backtrace]:" in msg
+        assert "test_error_msg.py:" in msg
+
+
+
+if __name__ == "__main__":
+    unittest.main()
\ No newline at end of file
diff --git a/python/jittor/test/test_init.py b/python/jittor/test/test_init.py
index 6fdf6c4d..875565a3 100644
--- a/python/jittor/test/test_init.py
+++ b/python/jittor/test/test_init.py
@@ -55,5 +55,42 @@ class TestInit(unittest.TestCase):
     def test_resnet(self):
         check(models.resnet152(), torchvision.models.resnet152(), rtol=5e-2, mean_atol=1e-2)
 
+from jittor import init
+from jittor import nn
+
+class TestInitFunc(unittest.TestCase):
+    def test_eye(self):
+        a = init.eye(2, "float32")
+        np.testing.assert_allclose(a.data, [[1,0],[0,1]])
+        a = init.eye((2,3), "float32")
+        np.testing.assert_allclose(a.data, [[1,0,0],[0,1,0]])
+
+        linear = nn.Linear(2,2)
+        init.eye_(linear.weight)
+        np.testing.assert_allclose(linear.weight.data, [[1,0],[0,1]])
+
+    def test_constant(self):
+        a = init.constant(2, "float32")
+        np.testing.assert_allclose(a.data, [0,0])
+        a = init.constant((2,3), value=1.)
+        np.testing.assert_allclose(a.data, [[1,1,1],[1,1,1]])
+
+        linear = nn.Linear(2,2)
+        init.constant_(linear.weight)
+        np.testing.assert_allclose(linear.weight.data, [[0,0],[0,0]])
+
+    def test_uniform(self):
+        a = init.uniform(5, "float32")
+        assert ((a>0) & (a<1)).all()
+        a = init.uniform((2,3), low=-1, high=1)
+        assert ((a>-1) & (a<1)).all()
+
+        linear = nn.Linear(2,2)
+        init.uniform_(linear.weight)
+        assert (linear.weight > 0).all()
+        linear.weight.uniform_()
+        assert (linear.weight > 0).all()
+
+
 if __name__ == "__main__":
     unittest.main()
\ No newline at end of file
diff --git a/python/jittor/test/test_resnet.py b/python/jittor/test/test_resnet.py
index 92bb65dd..8defb633 100644
--- a/python/jittor/test/test_resnet.py
+++ b/python/jittor/test/test_resnet.py
@@ -72,31 +72,31 @@ class TestResnet(unittest.TestCase):
             epoch_id = self.train_loader.epoch_id
 
             # train step
-            with jt.log_capture_scope(
-                log_silent=1,
-                log_v=1, log_vprefix="op.cc=100,exe=10",
-            ) as logs:
-                output = mnist_net(data)
-                loss = nn.cross_entropy_loss(output, target)
-                SGD.step(loss)
-                def callback(epoch_id, batch_id, loss, output, target):
-                    # print train info
-                    global prev
-                    pred = np.argmax(output, axis=1)
-                    acc = np.mean(target==pred)
-                    loss_list.append(loss[0])
-                    acc_list.append(acc)
-                    print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}\tAcc: {:.6f} \tTime:{:.3f}'
-                        .format(epoch_id, batch_id, 600,1. * batch_id / 6.0, loss[0], acc, time.time()-prev))
-                    # prev = time.time()
-                jt.fetch(epoch_id, batch_id, loss, output, target, callback)
+            # with jt.log_capture_scope(
+            #     log_silent=1,
+            #     log_v=1, log_vprefix="op.cc=100,exe=10",
+            # ) as logs:
+            output = mnist_net(data)
+            loss = nn.cross_entropy_loss(output, target)
+            SGD.step(loss)
+            def callback(epoch_id, batch_id, loss, output, target):
+                # print train info
+                global prev
+                pred = np.argmax(output, axis=1)
+                acc = np.mean(target==pred)
+                loss_list.append(loss[0])
+                acc_list.append(acc)
+                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}\tAcc: {:.6f} \tTime:{:.3f}'
+                    .format(epoch_id, batch_id, 600,1. * batch_id / 6.0, loss[0], acc, time.time()-prev))
+                # prev = time.time()
+            jt.fetch(epoch_id, batch_id, loss, output, target, callback)
             
-            log_conv = find_log_with_re(logs, 
-                "Jit op key (not )?found: ((mkl)|(cudnn))_conv.*")
-            log_matmul = find_log_with_re(logs, 
-                "Jit op key (not )?found: ((mkl)|(cublas))_matmul.*")
-            if batch_id > 2:
-                assert len(log_conv)==59 and len(log_matmul)==6, (len(log_conv), len(log_matmul))
+            # log_conv = find_log_with_re(logs, 
+            #     "Jit op key (not )?found: ((mkl)|(cudnn))_conv.*")
+            # log_matmul = find_log_with_re(logs, 
+            #     "Jit op key (not )?found: ((mkl)|(cublas))_matmul.*")
+            # if batch_id > 2:
+            #     assert len(log_conv)==59 and len(log_matmul)==6, (len(log_conv), len(log_matmul))
 
             mem_used = jt.flags.stat_allocator_total_alloc_byte \
                 -jt.flags.stat_allocator_total_free_byte
diff --git a/python/jittor/test/test_transpose_op.py b/python/jittor/test/test_transpose_op.py
index 1afa4bee..d2ff5b48 100644
--- a/python/jittor/test/test_transpose_op.py
+++ b/python/jittor/test/test_transpose_op.py
@@ -68,6 +68,10 @@ class TestTransposeOp(unittest.TestCase):
         assert a.permute().shape == [4,3,2]
         assert a.permute(0,2,1).shape == [2,4,3]
 
+    def test_transpose_3d2i(self):
+        a = jt.ones([2,3,4])
+        assert a.transpose(0,1).shape == (3,2,4)
+
     @unittest.skipIf(not jt.compiler.has_cuda, "No CUDA found")
     @jt.flag_scope(use_cuda=1)
     def test_cutt(self):
diff --git a/python/jittor/test/test_unary_op.py b/python/jittor/test/test_unary_op.py
index 5c00c2b5..24483dae 100644
--- a/python/jittor/test/test_unary_op.py
+++ b/python/jittor/test/test_unary_op.py
@@ -76,6 +76,25 @@ class TestUnaryOp(unittest.TestCase):
         da = jt.grad(b, a)
         assert (da.data == 1).all()
 
+    def test_erfinv(self):
+        from scipy import special
+        y = np.linspace(-1.0, 1.0, num=10)
+        x = special.erfinv(y)
+        y2 = jt.array(y)
+        x2 = jt.erfinv(y2)
+        np.testing.assert_allclose(y.data, y2.data)
+        
+        
+        y = np.linspace(-0.9, 0.9, num=10)
+        x = special.erfinv(y)
+        y2 = jt.array(y)
+        x2 = jt.erfinv(y2)
+        np.testing.assert_allclose(y.data, y2.data)
+        d = jt.grad(x2, y2)
+        _, (dn,) = ngrad(lambda y: special.erfinv(y).sum(), [y], 1e-8)
+        np.testing.assert_allclose(d.data, dn, atol=1e-6, rtol=1e-6)
+
+
 class TestUnaryOpCuda(TestUnaryOp, test_cuda(2)):
     pass
 
diff --git a/python/jittor/utils/gen_pyi.py b/python/jittor/utils/gen_pyi.py
new file mode 100644
index 00000000..0bf76d52
--- /dev/null
+++ b/python/jittor/utils/gen_pyi.py
@@ -0,0 +1,250 @@
+# ***************************************************************
+# Copyright (c) 2021 Jittor. All Rights Reserved.
+# Maintainers:
+#     Zheng-Ning Liu <lzhengning@com>
+#
+# This file is subject to the terms and conditions defined in
+# file 'LICENSE.txt', which is part of this source code package.
+# ***************************************************************
+
+""" This file implements generation of stub files for Jittor C extensions.
+
+In detail, autocompletion of the following functions are supported.
+- functions in __init__.py
+- functions in jittor.core.ops
+- attributes of jittor.flags
+- methods of jittor.Var
+
+Prerequisite:
+- mypy for automatic stub generation
+
+Usage: python3 -m jittor.utils.gen_pyi
+
+"""
+
+import os
+import re
+import shutil
+import jittor
+
+def add_indent(s: str, n=1):
+    for _ in range(n):
+        s = '\t' + s.replace('\n', '\n\t', s.count('\n')-1)
+    return s
+
+def ctype_to_python(type_str):
+    if type_str == "bool":
+        return "bool"
+    if type_str in ["int", "uint", "int64", "uint64", "size_t"]:
+        return "int"
+    if type_str in ["float32", "float64"]:
+        return "float"
+    if type_str in ["string", "string&&", "NanoString", "char*", "const char*"]:
+        return "str"
+    if type_str in ["vector<int>"]:
+        return "List[int]"
+    if type_str in ["vector<string>&&", "vector<NanoString>&&"]:
+        return "List[str]"
+    if type_str == "VarHolder*":
+        return "Var"
+    if type_str in ["vector<VarHolder*>", "vector<VarHolder*>&&"]:
+        return "List[Var]"
+    if type_str == "NanoVector":
+        return "Tuple[int]"
+    if type_str == "vector<NanoVector>&&":
+        return "List[Tuple[int]]"
+    if type_str in ["FetchFunc", "FetchFunc&&", "NumpyFunc&&"]:
+        return "Callable"
+    if type_str == "vector<NumpyFunc>&&":
+        return "List[Callable]"
+    if type_str == "PyObject*":
+        return "float | int | numpy.ndarray | Var"
+    if type_str == "VarSlices&&":
+        return "slice"
+    if type_str in ["ArrayArgs", "ArrayArgs&&", "DataView"]:
+        return "numpy.ndarray"
+    if type_str == 'ItemData':
+        return "float | int | bool"
+    if type_str == "void":
+        return ""
+    print(f"[warning] Unknown ctype: {type_str}, do not write type hinting")
+    return ""
+
+def cval_to_python(val_str: str):
+    if val_str == "false":
+        return "False"
+    if val_str == "true":
+        return "True"
+    if val_str.startswith("ns_"):
+        return f'"{val_str[3:]}"'
+    if val_str == "NanoVector()":
+        return "()"
+    return val_str
+
+
+def run_stubgen(jittor_path, cache_path):
+
+    # for __init__.py functions
+    stubpath = os.path.join(cache_path, 'stubs')
+    stubfile = os.path.join(stubpath, "jittor", "__init__.pyi")
+    os.system(f"stubgen -m jittor -o {stubpath} -q")
+    with open(stubfile) as f:
+        mypy_content = f.read()
+
+    f = open(stubfile, "w")
+    # Remove the follow type redirection
+    unused_content = ["ori_int = int\n",
+                      "ori_float = float\n",
+                      "ori_bool = bool\n",
+                      "int = int32\n",
+                      "float = float32\n",
+                      "double = float64\n",
+                      "\nflags: Any\n"]
+    for unused in unused_content:
+        mypy_content = mypy_content.replace(unused, "")
+    f.write(mypy_content)
+
+    shutil.move(stubfile, os.path.join(jittor_path, "__init__.pyi"))
+    shutil.rmtree(stubpath)
+    shutil.rmtree(os.path.expanduser(".mypy_cache"))
+
+def gen_ops_stub(jittor_path):
+    f = open(os.path.join(jittor_path, "__init__.pyi"), "a")
+    f.write("from typing import List, Tuple, Callable, overload\n")
+    f.write("import numpy\n")
+
+    var_hint = "class Var:\n\t'''Variable that stores multi-dimensional data.'''\n"
+    var_methods = set()
+
+    def decl_to_param_hints(decl):
+        param_decl = re.findall(r".+ [a-zA-Z_0-9]+\((.*)\)", decl)[0]
+        if not param_decl.strip():
+            return []
+        param_hints = []
+        for param_str in param_decl.split(','):
+            if "=" in param_str:
+                template = r"\s*(.+)\s+([a-zA-Z_0-9]+)\s*=\s*(.+)"
+                param_type, param_name, param_val = re.findall(template, param_str)[0]
+                param_type = ctype_to_python(param_type)
+                param_val = cval_to_python(param_val)
+            else:
+                param_type, param_name = param_str.strip().rsplit(' ', maxsplit=1)
+                param_type = ctype_to_python(param_type)
+                param_val = ""
+
+            hint = param_name
+            if param_type:
+                hint += ": " + param_type
+            if param_val:
+                hint += "=" + param_val
+            param_hints.append(hint)
+        return param_hints
+
+    def generate_var_hint(decorators, return_type, param_hints, docstring):
+        hint = add_indent(decorators) if decorators else ""
+        hint += f"\tdef {func_name}("
+        hint += ", ".join(['self'] + param_hints) + ")"
+        hint += f"-> {return_type}" if return_type else ""
+        hint += ":"
+        if docstring:
+            hint += add_indent(f"\n'''{docstring}'''\n", 2) + "\t\t...\n"
+        else:
+            hint += f" ...\n"
+        return hint
+
+    for func_name, func in jittor.ops.__dict__.items():
+        if func_name.startswith("__"):
+            continue
+        # Exclude a function that overrides the builtin bool:
+        #       def bool(x: Var) -> Var: ...
+        # It will confuse the IDE. So we ignore this function in pyi.
+        if func_name == "bool":
+            continue
+
+        docstring = func.__doc__[:func.__doc__.find("Declaration:")]
+        docstring = docstring.replace("'''", '"""').strip()
+        declarations = re.findall(r"Declaration:\n(.+)\n", func.__doc__)
+
+        for decl in declarations:
+            decorators = "@overload\n" if len(declarations) > 1 else ""
+            return_type = ctype_to_python(decl.split(' ', maxsplit=1)[0])
+            param_hints = decl_to_param_hints(decl)
+
+            func_text = decorators
+            func_text += f"def {func_name}"
+            func_text += "(" + ", ".join(param_hints) + ")"
+            func_text += f"-> {return_type}" if return_type else ""
+            func_text += ":\n"
+            if docstring:
+                func_text += add_indent(f"'''{docstring}'''\n") + "\t...\n"
+            else:
+                func_text += f" ...\n"
+
+            f.write(func_text)
+
+            if not "Var" in param_hints[0]:
+                continue
+            var_methods.add(func_name)
+            var_hint += generate_var_hint(decorators, return_type, param_hints[1:], docstring)
+
+    for func_name, func in jittor.Var.__dict__.items():
+        if func_name.startswith("__") or func_name in var_methods:
+            continue
+        if func_name in ["int", "float", "double", "bool", "long"]:
+            continue
+        if func.__doc__ is None:
+            continue
+        docstring = func.__doc__[:func.__doc__.find("Declaration:")]
+        docstring = docstring.replace("'''", '"""').strip()
+        declarations = re.findall(r"Declaration:\n(.+)\n", func.__doc__)
+
+        for decl in declarations:
+            decl = decl.replace("inline ", "")
+            decorators = "@overload\n" if len(declarations) > 1 else ""
+            return_type = re.findall(r"(.+) [a-zA-Z_0-9]+\(.*\)", decl)[0].split()[-1]
+            return_type = ctype_to_python(return_type)
+            param_hints = decl_to_param_hints(decl)
+
+            var_hint += generate_var_hint(decorators, return_type, param_hints, docstring)
+
+    f.write(var_hint)
+    f.close()
+
+def gen_flags_stub(jittor_path):
+    f = open(os.path.join(jittor_path, "__init__.pyi"), "a")
+    f.write("class Flags:\n")
+    f.write("\t'''A set of flags to configure jittor running behaviors'''\n")
+
+    for attr_name, attr in jittor.Flags.__dict__.items():
+        if attr_name.startswith("__"):
+            continue
+        docstring = attr.__doc__
+        docstring = attr.__doc__[:attr.__doc__.find("Declaration:")]
+        docbody = re.findall("\(type.+default.+\):(.+)", docstring)[0].strip()
+        docbody += "." if not docbody.endswith('.') else ""
+        attr_type, attr_val = re.findall(r"\(type:(.+), default:(.+)\)", docstring)[0]
+        attr_type = ctype_to_python(attr_type)
+        attr_type = attr_type if attr_type else "Any"
+        f.write(f"\t{attr_name}: {attr_type}\n")
+        f.write(f"\t'''{docbody} Default: {attr_val}'''\n")
+
+    f.write("flags: Flags\n")
+    f.write("'''Jittor running time flags instance'''\n")
+    f.close()
+
+def get_pyi(jittor_path=None, cache_path=None):
+    if jittor_path is None:
+        jittor_path = jittor.flags.jittor_path
+    if cache_path is None:
+        import jittor_utils
+        cache_path = jittor_utils.cache_path
+
+    run_stubgen(jittor_path, cache_path)
+    gen_ops_stub(jittor_path)
+    gen_flags_stub(jittor_path)
+
+    print(f"Generated stubfile: {os.path.join(jittor_path, '__init__.pyi')}")
+
+
+if __name__ == "__main__":
+    get_pyi()
\ No newline at end of file
diff --git a/python/jittor_utils/__init__.py b/python/jittor_utils/__init__.py
index 48d75fef..69a5e108 100644
--- a/python/jittor_utils/__init__.py
+++ b/python/jittor_utils/__init__.py
@@ -23,7 +23,7 @@ import urllib.request
 if platform.system() == 'Darwin':
     mp.set_start_method('fork')
 
-class LogWarper:
+class Logwrapper:
     def __init__(self):
         self.log_silent = int(os.environ.get("log_silent", "0"))
         self.log_v = int(os.environ.get("log_v", "0"))
@@ -237,12 +237,76 @@ def download(url, filename):
     urllib.request.urlretrieve(url, filename)
     LOG.v("Download finished")
 
+def get_jittor_version():
+    path = os.path.dirname(__file__)
+    with open(os.path.join(path, "../jittor/__init__.py"), "r", encoding='utf8') as fh:
+        for line in fh:
+            if line.startswith('__version__'):
+                version = line.split("'")[1]
+                break
+        else:
+            raise RuntimeError("Unable to find version string.")
+    return version
+
+def get_str_hash(s):
+    import hashlib
+    md5 = hashlib.md5()
+    md5.update(s.encode())
+    return md5.hexdigest()
+
+def get_cpu_version():
+    v = platform.processor()
+    try:
+        if os.name == 'nt':
+            import winreg
+            key_name = r"Hardware\Description\System\CentralProcessor\0"
+            field_name = "ProcessorNameString"
+            key = winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, key_name)
+            value = winreg.QueryValueEx(key, field_name)[0]
+            winreg.CloseKey(key)
+            v = value
+        elif platform.system() == "Darwin":
+            r, s = sp.getstatusoutput("sysctl -a sysctl machdep.cpu.brand_string")
+            if r==0:
+                v = s.split(":")[-1].strip()
+        else:
+            with open("/proc/cpuinfo", 'r') as f:
+                for l in f:
+                    if l.startswith("model name"):
+                        v = l.split(':')[-1].strip()
+                        break
+    except:
+        pass
+    return v
+    
+def short(s):
+    ss = ""
+    for c in s:
+        if str.isidentifier(c) or str.isnumeric(c) \
+            or str.isalpha(c) or c in '.-+':
+            ss += c
+    if len(ss)>14:
+        return ss[:14]+'x'+get_str_hash(ss)[:2]
+    return ss
+
 def find_cache_path():
     from pathlib import Path
     path = str(Path.home())
-    dirs = [".cache", "jittor", os.path.basename(cc_path)]
-    if os.environ.get("debug")=="1":
-        dirs[-1] += "_debug"
+    # jittor version key
+    jtv = "jt"+get_jittor_version().rsplit('.', 1)[0]
+    # cc version key
+    ccv = cc_type+get_version(cc_path)[1:-1] \
+        if cc_type != "cl" else cc_type
+    # os version key
+    osv = platform.platform() + platform.node()
+    if len(osv)>14:
+        osv = osv[:14] + 'x'+get_str_hash(osv)[:2]
+    # py version
+    pyv = "py"+platform.python_version()
+    # cpu version
+    cpuv = get_cpu_version()
+    dirs = [".cache", "jittor", jtv, ccv, pyv, osv, cpuv]
+    dirs = list(map(short, dirs))
     cache_name = "default"
     try:
         if "cache_name" in os.environ:
@@ -260,18 +324,14 @@ def find_cache_path():
         for c in " (){}": cache_name = cache_name.replace(c, "_")
     except:
         pass
+    if os.environ.get("debug")=="1":
+        dirs[-1] += "_debug"
     for name in os.path.normpath(cache_name).split(os.path.sep):
-        dirs.insert(-1, name)
+        dirs.append(name)
     os.environ["cache_name"] = cache_name
     LOG.v("cache_name: ", cache_name)
-    for d in dirs:
-        path = os.path.join(path, d)
-        if not os.path.isdir(path):
-            try:
-                os.mkdir(path)
-            except:
-                pass
-        assert os.path.isdir(path)
+    path = os.path.join(path, *dirs)
+    os.makedirs(path, exist_ok=True)
     if path not in sys.path:
         sys.path.append(path)
     return path
@@ -422,7 +482,7 @@ def get_total_mem():
 
 is_in_ipynb = in_ipynb()
 cc = None
-LOG = LogWarper()
+LOG = Logwrapper()
 
 check_msvc_install = False
 msvc_path = ""
diff --git a/python/jittor_utils/class/motd b/python/jittor_utils/class/motd
new file mode 100644
index 00000000..75bb8568
--- /dev/null
+++ b/python/jittor_utils/class/motd
@@ -0,0 +1,20 @@
+★★★★★★★★★★★★★★★★★★★★★
+Welcome to use Jittor
+Please put the file under /root directory
+★★★★★★★★★★★★★★★★★★★★★
+欢迎使用Jittor
+请将文件放置在/root目录下
+本docker已经安装好cuda环境
+相关链接：
+*  [Jittor官网](https://cg.cs.tsinghua.edu.cn/jittor/)
+*  [Jittor教程](https://cg.cs.tsinghua.edu.cn/jittor/tutorial/)
+*  [Jittor模型库](https://cg.cs.tsinghua.edu.cn/jittor/resources/)
+*  [Jittor文档](https://cg.cs.tsinghua.edu.cn/jittor/assets/docs/index.html)
+*  [Github](https://github.com/jittor/jittor)， [Gitee](https://gitee.com/jittor/jittor)
+*  [Jittor 论坛](https://discuss.jittor.org/)
+*  即时通信: QQ Group(761222083)
+
+欢迎大家star，fork并在QQ群或者论坛向我们提出宝贵的意见和建议。
+
+注意:请不要开启无密码保护的jupyter notebook或vscode server
+★★★★★★★★★★★★★★★★★★★★★
diff --git a/python/jittor_utils/class/setup.py b/python/jittor_utils/class/setup.py
new file mode 100644
index 00000000..5c575f12
--- /dev/null
+++ b/python/jittor_utils/class/setup.py
@@ -0,0 +1,16 @@
+import sys
+import os
+command = sys.argv[1]
+if (command == 'ssh'):
+    port = sys.argv[2]
+    data = open("/etc/ssh/sshd_config", "r").readlines()
+    data[12] = 'Port ' + port + '\nPermitRootLogin yes\n' 
+    f = open("/etc/ssh/sshd_config", "w")
+    f.writelines(data)
+    f.close()
+    os.system("service ssh restart")
+elif (command == 'passwd'):
+    passwd = sys.argv[2]
+    os.system("echo root:"+passwd+" | chpasswd")
+else:
+    print('command error')
diff --git a/python/jittor_utils/class/setup_env.py b/python/jittor_utils/class/setup_env.py
new file mode 100644
index 00000000..3a68f991
--- /dev/null
+++ b/python/jittor_utils/class/setup_env.py
@@ -0,0 +1,137 @@
+# ***************************************************************
+# Copyright (c) 2021 Jittor. All Rights Reserved. 
+# Maintainers:
+#     Guoye Yang <498731903@qq.com>
+#     Dun Liang <randonlang@gmail.com>.
+#
+# 
+# This file is subject to the terms and conditions defined in
+# file 'LICENSE.txt', which is part of this source code package.
+# ***************************************************************
+
+'''
+example:
+
+export class_home=/mnt/disk/cjld/class_nn
+mkdir -p $class_home
+docker pull jittor/jittor-cuda
+python3.7 -m jittor_utils.class.setup_env setup 4
+python3.7 -m jittor_utils.class.setup_env start 4
+python3.7 -m jittor_utils.class.setup_env report
+python3.7 -m jittor_utils.class.setup_env restart 4
+python3.7 -m jittor_utils.class.setup_env stop
+'''
+# export class_home
+# setup [n]          // setup for n users. including build user paths, user_info.txt and docker imgs. !!!WILL RESET SUDENT_FILES!!!
+# start [n_gpu]      // run n docker CONTAINERs with n_gpu GPUs.
+# stop               // stop n docker CONTAINERs
+# restart [n_gpu]    // restart n docker CONTAINERs with n_gpu GPUs.
+import sys
+import os 
+import json as js
+import random
+
+class_home = os.environ["class_home"]
+student_files_dir = class_home + "/student_files"
+student_files_bk_dir = class_home + "/student_files_bak"
+cwd = os.path.dirname(__file__)
+
+def run_cmd(cmd):
+    print("[CMD]:", cmd)
+    ret = os.system(cmd)
+    if ret:
+        print("[CMD] return", ret)
+    return ret
+
+def generate_random_str(randomlength):
+  random_str = ''
+  base_str = 'ABCDEFGHIGKLMNOPQRSTUVWXYZabcdefghigklmnopqrstuvwxyz0123456789'
+  length = len(base_str) - 1
+  for i in range(randomlength):
+    random_str += base_str[random.randint(0, length)]
+  return random_str
+
+def setup(n):
+    if os.path.exists(student_files_dir):
+        if os.path.exists(student_files_bk_dir):
+            run_cmd(f"rm -rf {student_files_bk_dir}")
+        run_cmd(f"mv {student_files_dir} {student_files_bk_dir}")
+    os.makedirs(student_files_dir)
+    user_info = []
+    for i in range(n): # 0 for root
+        port = 20000 + i
+        passwd = generate_random_str(8)
+        name = 'stu_'+str(i)
+        path = os.path.abspath(os.path.join(student_files_dir, name))
+        info = {'port': port,
+                'passwd': passwd,
+                'name': name,
+                'path': path}
+        user_info.append(info)
+        student_files_src = class_home + "/student_files_src"
+        if os.path.isdir(student_files_src):
+            run_cmd(f"cp -r {student_files_src} {path}")
+        else:
+            run_cmd('mkdir -p ' + path)
+    js.dump(user_info, open(student_files_dir+"/user_info.json", "w"))
+
+def start(n):
+    assert os.path.exists(student_files_dir+'/user_info.json')
+    user_info = js.load(open(student_files_dir+'/user_info.json', 'r'))
+    for i in range(len(user_info)):
+        id = i % n
+        u = user_info[i]
+        print('START', i, '/', len(user_info))
+        assert 0 == run_cmd(f'docker run -itd --shm-size=8g --network host --name {u["name"]} -v {u["path"]}:/root --gpus "device={id}" jittor/jittor-cuda bash')
+        # assert 0 == run_cmd(f'docker exec -it {u["name"]} bash -c \'apt update && apt install openssh-server -y\'')
+        assert 0 == run_cmd(f'docker cp {cwd}/setup.py {u["name"]}:/etc/ssh/setup.py')
+        assert 0 == run_cmd(f'docker cp {cwd}/motd {u["name"]}:/etc/motd')
+        assert 0 == run_cmd(f'docker exec -it {u["name"]} python3.7 /etc/ssh/setup.py passwd {u["passwd"]}')
+        assert 0 == run_cmd(f'docker exec -it {u["name"]} python3.7 /etc/ssh/setup.py ssh {u["port"]}')
+        assert 0 == run_cmd(f'docker exec -it {u["name"]} python3.7 -m pip install jittor -U')
+        assert 0 == run_cmd(f'docker exec -it {u["name"]} python3.7 -m jittor.test.test_core')
+
+def stop():
+    assert os.path.exists(student_files_dir+'/user_info.json')
+    user_info = js.load(open(student_files_dir+'/user_info.json', 'r'))
+    for i in range(len(user_info)):
+        u = user_info[i]
+        print('STOP', i, '/', len(user_info))
+        run_cmd(f'docker rm -f {u["name"]}')
+
+def report():
+    assert os.path.exists(student_files_dir+'/user_info.json')
+    user_info = js.load(open(student_files_dir+'/user_info.json', 'r'))
+    hostname = open("/etc/hostname", 'r').read().strip() + ".randonl.me"
+    for i in range(len(user_info)):
+        u = user_info[i]
+        print(f"ssh -p {u['port']} root@{hostname} # passwd: {u['passwd']}")
+
+def restart(n):
+    stop()
+    start(n)
+
+args = sys.argv[1:]
+if (args[0] == 'setup'):
+    assert(len(args) == 2)
+    assert(type(eval(args[1])) == int)
+    n = int(args[1])
+    assert(n < 999)
+    setup(n)
+elif (args[0] == 'start'):
+    assert(len(args) == 2)
+    assert(type(eval(args[1])) == int)
+    n = int(args[1])
+    start(n)
+elif (args[0] == 'stop'):
+    stop()
+elif (args[0] == 'restart'):
+    assert(len(args) == 2)
+    assert(type(eval(args[1])) == int)
+    n = int(args[1])
+    restart(n)
+elif (args[0] == 'report'):
+    report()
+else:
+    assert(False)
+
diff --git a/python/jittor_utils/clean_cache.py b/python/jittor_utils/clean_cache.py
new file mode 100644
index 00000000..72702707
--- /dev/null
+++ b/python/jittor_utils/clean_cache.py
@@ -0,0 +1,55 @@
+# ***************************************************************
+# Copyright (c) 2021 Jittor. All Rights Reserved. 
+# Maintainers: Dun Liang <randonlang@gmail.com>. 
+# This file is subject to the terms and conditions defined in
+# file 'LICENSE.txt', which is part of this source code package.
+# ***************************************************************
+import os, sys, shutil
+from pathlib import Path
+import glob
+
+cache_path = os.path.join(str(Path.home()), ".cache", "jittor")
+
+def callback(func, path, exc_info):
+    print(f"remove \"{path}\" failed.")
+
+def rmtree(path):
+    if os.path.isdir(path):
+        print(f"remove \"{path}\" recursive.")
+        shutil.rmtree(path, onerror=callback)
+
+def clean_all():
+    rmtree(cache_path)
+
+def clean_core():
+    rmtree(cache_path+"/default")
+    rmtree(cache_path+"/master")
+    fs = glob.glob(cache_path+"/jt*")
+    for f in fs: rmtree(f)
+
+def clean_cuda():
+    rmtree(cache_path+"/jtcuda")
+    rmtree(cache_path+"/cutt")
+    rmtree(cache_path+"/cub")
+    rmtree(cache_path+"/nccl")
+
+def clean_dataset():
+    rmtree(cache_path+"/dataset")
+
+def print_help():
+    msg = "|".join(keys)
+    print(f"Usage: {sys.executable} -m jittor_utils.clean_cache [{msg}]")
+    exit()
+
+
+keys = [ k[6:] for k in globals() if k.startswith("clean_") ]
+
+if __name__ == "__main__":
+    if len(sys.argv)==1:
+        print_help()
+    else:
+        for k in sys.argv[1:]:
+            if k not in keys:
+                print_help()
+            func = globals()["clean_"+k]
+            func()
\ No newline at end of file
diff --git a/python/jittor_utils/install_cuda.py b/python/jittor_utils/install_cuda.py
index bcd32432..4d28e69f 100644
--- a/python/jittor_utils/install_cuda.py
+++ b/python/jittor_utils/install_cuda.py
@@ -57,6 +57,12 @@ def install_cuda():
         if cuda_driver_version >= [11,4]:
             cuda_tgz = "cuda11.4_cudnn8_win.zip"
             md5 = "06eed370d0d44bb2cc57809343911187"
+        elif cuda_driver_version >= [11,2]:
+            cuda_tgz = "cuda11.2_cudnn8_win.zip"
+            md5 = "b5543822c21bc460c1a414af47754556"
+        elif cuda_driver_version >= [11,]:
+            cuda_tgz = "cuda11.0_cudnn8_win.zip"
+            md5 = "7a248df76ee5e79623236b0560f8d1fd"
         elif cuda_driver_version >= [10,]:
             cuda_tgz = "cuda10.2_cudnn7_win.zip"
             md5 = "7dd9963833a91371299a2ba58779dd71"
diff --git a/python/jittor_utils/lock.py b/python/jittor_utils/lock.py
index f5befbca..50b31e94 100644
--- a/python/jittor_utils/lock.py
+++ b/python/jittor_utils/lock.py
@@ -2,9 +2,14 @@ try:
     import fcntl
 except ImportError:
     fcntl = None
-    import win32file
-    import pywintypes
-    _OVERLAPPED = pywintypes.OVERLAPPED()
+    try:
+        import win32file
+        import pywintypes
+        _OVERLAPPED = pywintypes.OVERLAPPED()
+    except:
+        LOG.f("""pywin32 package not found, please install it.
+If conda is used, please install with command: 
+>>> conda install pywin32""")
 
 import os
 from jittor_utils import cache_path, LOG
diff --git a/python/jittor_utils/misc.py b/python/jittor_utils/misc.py
index efe843ae..2f21e537 100644
--- a/python/jittor_utils/misc.py
+++ b/python/jittor_utils/misc.py
@@ -47,10 +47,17 @@ def download_url_to_local(url, filename, root_folder, md5):
         return
     else:
         print('Downloading ' + url + ' to ' + file_path)
-        urllib.request.urlretrieve(
-            url, file_path,
-            reporthook=_progress()
-        )
+        try:
+            urllib.request.urlretrieve(
+                url, file_path,
+                reporthook=_progress()
+            )
+        except Exception as e:
+            msg = f"{e}\nDownload File failed, url: {url}, path: {file_path}"
+            print(msg)
+            if os.path.isfile(file_path):
+                os.remove(file_path)
+            raise RuntimeError(msg)
     if not check_file_exist(file_path, md5):
         raise RuntimeError("File downloads failed.")
 
diff --git a/python/jittor_utils/pip_publish.py b/python/jittor_utils/pip_publish.py
new file mode 100644
index 00000000..80a45dce
--- /dev/null
+++ b/python/jittor_utils/pip_publish.py
@@ -0,0 +1,34 @@
+import os
+import glob
+import shutil
+import sys
+
+home_path = os.path.join(os.path.dirname(__file__), "..", "..")
+home_path = os.path.abspath(home_path)
+
+def callback(func, path, exc_info):
+    print(f"remove \"{path}\" failed.")
+
+def rmtree(path):
+    if os.path.isdir(path):
+        print(f"remove \"{path}\" recursive.")
+        shutil.rmtree(path, onerror=callback)
+
+def remove_tmpfile():
+    dist_file = home_path+"/dist"
+    egg_file = glob.glob(home_path+"/**/*egg-info")
+    rmtree(dist_file)
+    for e in egg_file:
+        rmtree(e)
+
+def run_cmd(cmd):
+    print("[CMD]", cmd)
+    assert os.system(cmd)==0
+
+os.chdir(home_path)
+remove_tmpfile()
+
+run_cmd(f"{sys.executable} ./setup.py sdist")
+run_cmd(f"{sys.executable} -m twine upload dist/*")
+
+remove_tmpfile()
\ No newline at end of file
diff --git a/python/jittor_utils/query_cuda_cc.py b/python/jittor_utils/query_cuda_cc.py
new file mode 100644
index 00000000..75205fe4
--- /dev/null
+++ b/python/jittor_utils/query_cuda_cc.py
@@ -0,0 +1,25 @@
+import ctypes
+import os
+if "CUDA_VISIBLE_DEVICES" in os.environ:
+    del os.environ["CUDA_VISIBLE_DEVICES"]
+if os.name == 'nt':
+    cuda_driver = ctypes.CDLL("nvcuda")
+else:
+    cuda_driver = ctypes.CDLL("libcuda.so")
+driver_version = ctypes.c_int()
+r = cuda_driver.cuDriverGetVersion(ctypes.byref(driver_version))
+assert r == 0
+v = driver_version.value
+
+dcount = ctypes.c_int()
+cuda_driver.cuInit(0)
+r = cuda_driver.cuDeviceGetCount(ctypes.byref(dcount))
+
+for i in range(dcount.value):
+    dev = ctypes.c_void_p()
+    major = ctypes.c_int()
+    minor = ctypes.c_int()
+    assert 0 == cuda_driver.cuDeviceGet(ctypes.byref(dev), i)
+    assert 0 == cuda_driver.cuDeviceGetAttribute(ctypes.byref(major), 75, dev)
+    assert 0 == cuda_driver.cuDeviceGetAttribute(ctypes.byref(minor), 76, dev)
+    print(major.value*10+minor.value)
diff --git a/setup.py b/setup.py
index 0fd29a87..741165d2 100644
--- a/setup.py
+++ b/setup.py
@@ -58,7 +58,7 @@ setuptools.setup(
     python_requires='>=3.7',
 
     packages=["jittor", "jittor.test", "jittor.models", "jittor.utils", "jittor_utils"],
-    package_dir={'': os.path.join(path, 'python')},
+    package_dir={'': 'python'},
     package_data={'': ['*', '*/*', '*/*/*','*/*/*/*','*/*/*/*/*','*/*/*/*/*/*']},
     # include_package_data=True,
     install_requires=[