142 lines
9.9 KiB
Markdown
142 lines
9.9 KiB
Markdown
# CUDA Study Steps
|
||
|
||
**NOTE**
|
||
|
||
```
|
||
作为一个cuda入门学习的笔记,很惊讶也很高兴这个小项目能给大家提供帮助,这对于我来说也是第一次。(通常我都是受帮助的那个:D)
|
||
|
||
由于个人精力、能力和兴趣方向,这个项目目前没有再继续维护。希望这里能作为大家开始gpu编程的敲门砖吧。
|
||
|
||
另外:
|
||
|
||
NVIDIA在2012年 OpenMP 4.0 中提出了针对加速器的TEAMS构造,并于2013 年发布了对加速器的支持。相较于原生cuda开发,使用openmp框架可以实现更兼容的gpu加速方案(当然效果和灵活度会打上折扣)。
|
||
```
|
||
|
||
|
||
|
||
CUDA gpu 编程学习,基于 《CUDA 编程——基础与实践》(樊哲勇)。
|
||
|
||
包含章节:
|
||
|
||
1. [GPU 硬件与 CUDA 程序开发工具](./capter1/ReadMe.md)
|
||
2. [CUDA 中的线程组织](./capter2/ReadMe.md)
|
||
3. [简单 CUDA 程序的基本框架](./capter3/ReadMe.md)
|
||
4. [CUDA 程序的错误检测](./capter4/ReadMe.md)
|
||
5. [GPU 加速的关键](./capter5/ReadMe.md)
|
||
6. [CUDA 内存组织](./capter6/ReadMe.md)
|
||
7. [全局内存的合理使用](./capter7/ReadMe.md)
|
||
8. [共享内存的合理使用](./capter8/ReadMe.md)
|
||
9. [原子函数的合理使用](./capter9/ReadMe.md)
|
||
10. [线程束基本函数与协作组](./capter10/ReadMe.md)
|
||
11. [CUDA 流](./capter11/ReadMe.md)
|
||
12. [使用同一内存编程]()
|
||
13. [分子动力学模型](./capter13/ReadMe.md)
|
||
14. [CUDA 标准库](./capter14/ReadMe.md)
|
||
|
||
|
||
## CUDA 官方文档
|
||
|
||
[CUDA c++编程指南](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
|
||
[CUDA c++最佳实践指南](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
|
||
[CUDA 运行时API手册](https://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
|
||
[CUDA 数学函数库API手册](https://docs.nvidia.com/cuda/cuda-math-api/index.html)
|
||
|
||
|
||
## CUDA 编程案例
|
||
|
||
[CUDA Samples](https://github.com/NVIDIA/cuda-samples)
|
||
+ Simple Reference
|
||
基础CUDA示例,适用于初学者, 反映了运用CUDA和CUDA runtime APIs的一些基本概念.
|
||
+ Utilities Reference
|
||
演示如何查询设备能力和衡量GPU/CPU 带宽的实例程序。
|
||
+ Graphics Reference
|
||
图形化示例展现的是 CUDA, OpenGL, DirectX 之间的互通性。
|
||
+ Imaging Reference
|
||
图像处理,压缩,和数据分析。
|
||
+ Finance Reference
|
||
金融计算的并行处理。
|
||
+ Simulations Reference
|
||
展现一些运用CUDA的模拟算法。
|
||
+ Advanced Reference
|
||
用CUDA实现的一些先进的算法。
|
||
+ Cudalibraries Reference
|
||
这类示例主要告诉我们该如何使用CUDA各种函数库(NPP, CUBLAS, CUFFT,CUSPARSE, and CURAND)。
|
||
|
||
## CUDA 性能测试
|
||
|
||
[CUDA Bechmarks](https://github.com/ekondis/mixbench)
|
||
|
||
+ Four types of experiments are executed combined with global memory accesses:
|
||
Single precision Flops (multiply-additions)
|
||
Double precision Flops (multiply-additions)
|
||
Half precision Flops (multiply-additions)
|
||
Integer multiply-addition operations
|
||
|
||
+ Building is based now on CMake files. Each implementation resides in a separate folder:
|
||
CUDA implementation: mixbench-cuda
|
||
OpenCL implementation: mixbench-opencl
|
||
HIP implementation: mixbench-hip
|
||
SYCL implementation: mixbench-sycl
|
||
|
||
生成的测试结果类似:
|
||
```
|
||
mixbench/read-only (v0.03-2-gbccfd71)
|
||
------------------------ Device specifications ------------------------
|
||
Device: GeForce RTX 2070
|
||
CUDA driver version: 10.20
|
||
GPU clock rate: 1620 MHz
|
||
Memory clock rate: 3500 MHz
|
||
Memory bus width: 256 bits
|
||
WarpSize: 32
|
||
L2 cache size: 4096 KB
|
||
Total global mem: 7979 MB
|
||
ECC enabled: No
|
||
Compute Capability: 7.5
|
||
Total SPs: 2304 (36 MPs x 64 SPs/MP)
|
||
Compute throughput: 7464.96 GFlops (theoretical single precision FMAs)
|
||
Memory bandwidth: 448.06 GB/sec
|
||
-----------------------------------------------------------------------
|
||
Total GPU memory 8366784512, free 7941521408
|
||
Buffer size: 256MB
|
||
Trade-off type: compute with global memory (block strided)
|
||
Elements per thread: 8
|
||
Thread fusion degree: 4
|
||
----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------
|
||
Experiment ID, Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
|
||
Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
|
||
0, 0.250, 0.32, 104.42, 417.68, 0.125, 0.63, 53.04, 424.35, 0.500, 0.32, 211.41, 422.81, 0.250, 0.32, 105.58, 422.30
|
||
1, 0.750, 0.32, 316.34, 421.79, 0.375, 0.63, 158.69, 423.18, 1.500, 0.32, 634.22, 422.81, 0.750, 0.32, 317.30, 423.07
|
||
2, 1.250, 0.32, 528.46, 422.77, 0.625, 0.78, 215.91, 345.45, 2.500, 0.32, 1055.97, 422.39, 1.250, 0.32, 528.57, 422.86
|
||
3, 1.750, 0.32, 738.81, 422.17, 0.875, 1.08, 218.17, 249.34, 3.500, 0.32, 1478.95, 422.56, 1.750, 0.32, 740.59, 423.20
|
||
4, 2.250, 0.32, 951.33, 422.81, 1.125, 1.38, 219.57, 195.17, 4.500, 0.32, 1902.66, 422.81, 2.250, 0.32, 950.66, 422.51
|
||
5, 2.750, 0.32, 1162.74, 422.81, 1.375, 1.67, 220.38, 160.28, 5.500, 0.32, 2328.52, 423.37, 2.750, 0.32, 1162.74, 422.81
|
||
6, 3.250, 0.32, 1374.56, 422.94, 1.625, 1.97, 220.99, 135.99, 6.500, 0.32, 2756.62, 424.10, 3.250, 0.32, 1375.81, 423.32
|
||
7, 3.750, 0.32, 1592.45, 424.65, 1.875, 2.27, 221.38, 118.07, 7.500, 0.32, 3169.50, 422.60, 3.750, 0.32, 1585.55, 422.81
|
||
8, 4.250, 0.32, 1796.95, 422.81, 2.125, 2.57, 221.71, 104.33, 8.500, 0.32, 3587.76, 422.09, 4.250, 0.37, 1545.63, 363.68
|
||
9, 4.750, 0.32, 2006.34, 422.39, 2.375, 2.87, 221.85, 93.41, 9.500, 0.32, 3995.38, 420.57, 4.750, 0.32, 1998.29, 420.69
|
||
10, 5.250, 0.32, 2209.52, 420.86, 2.625, 3.17, 222.02, 84.58, 10.500, 0.32, 4439.54, 422.81, 5.250, 0.32, 2220.44, 422.94
|
||
11, 5.750, 0.32, 2434.12, 423.32, 2.875, 3.47, 222.17, 77.28, 11.500, 0.32, 4855.01, 422.17, 5.750, 0.32, 2426.77, 422.05
|
||
12, 6.250, 0.32, 2638.06, 422.09, 3.125, 3.78, 222.18, 71.10, 12.500, 0.32, 5227.20, 418.18, 6.250, 0.38, 2202.15, 352.34
|
||
13, 6.750, 0.32, 2841.95, 421.03, 3.375, 4.08, 222.30, 65.87, 13.500, 0.32, 5712.58, 423.15, 6.750, 0.32, 2850.54, 422.30
|
||
14, 7.250, 0.32, 3065.39, 422.81, 3.625, 4.37, 222.45, 61.36, 14.500, 0.32, 6135.74, 423.15, 7.250, 0.32, 3065.08, 422.77
|
||
15, 7.750, 0.33, 3143.40, 405.60, 3.875, 4.67, 222.57, 57.44, 15.500, 0.32, 6546.34, 422.34, 7.750, 0.32, 3268.89, 421.79
|
||
16, 8.250, 0.32, 3482.59, 422.13, 4.125, 4.98, 222.57, 53.96, 16.500, 0.32, 6957.48, 421.67, 8.250, 0.39, 2803.68, 339.84
|
||
17, 8.750, 0.32, 3693.66, 422.13, 4.375, 5.28, 222.53, 50.86, 17.500, 0.32, 7396.24, 422.64, 8.750, 0.32, 3694.77, 422.26
|
||
18, 9.250, 0.32, 3901.58, 421.79, 4.625, 5.58, 222.58, 48.12, 18.500, 0.32, 7786.72, 420.90, 9.250, 0.32, 3897.66, 421.37
|
||
20, 10.250, 0.32, 4312.53, 420.73, 5.125, 6.18, 222.66, 43.45, 20.500, 0.32, 8640.66, 421.50, 10.250, 0.41, 3374.54, 329.22
|
||
22, 11.250, 0.32, 4729.94, 420.44, 5.625, 6.78, 222.74, 39.60, 22.500, 0.32, 9452.31, 420.10, 11.250, 0.32, 4734.21, 420.82
|
||
24, 12.250, 0.32, 5148.83, 420.31, 6.125, 7.36, 223.51, 36.49, 24.500, 0.32,10346.40, 422.30, 12.250, 0.42, 3900.12, 318.38
|
||
28, 14.250, 0.32, 6009.94, 421.75, 7.125, 8.53, 224.23, 31.47, 28.500, 0.32,11975.32, 420.19, 14.250, 0.44, 4368.11, 306.53
|
||
32, 16.250, 0.32, 6795.36, 418.18, 8.125, 9.72, 224.31, 27.61, 32.500, 0.32,13605.64, 418.64, 16.250, 0.45, 4797.12, 295.21
|
||
40, 20.250, 0.34, 7899.43, 390.10, 10.125, 12.11, 224.50, 22.17, 40.500, 0.33,16371.37, 404.23, 20.250, 0.50, 5464.85, 269.87
|
||
48, 24.250, 0.41, 8029.04, 331.09, 12.125, 14.49, 224.58, 18.52, 48.500, 0.40,16468.89, 339.56, 24.250, 0.54, 5986.22, 246.85
|
||
56, 28.250, 0.47, 8114.58, 287.24, 14.125, 16.88, 224.65, 15.90, 56.500, 0.46,16443.12, 291.03, 28.250, 0.60, 6342.42, 224.51
|
||
64, 32.250, 0.53, 8154.47, 252.85, 16.125, 19.26, 224.72, 13.94, 64.500, 0.52,16536.22, 256.38, 32.250, 0.66, 6591.93, 204.40
|
||
80, 40.250, 0.66, 8242.80, 204.79, 20.125, 24.03, 224.79, 11.17, 80.500, 0.65,16644.88, 206.77, 40.250, 0.78, 6909.54, 171.67
|
||
96, 48.250, 0.78, 8321.35, 172.46, 24.125, 28.80, 224.85, 9.32, 96.500, 0.78,16685.23, 172.90, 48.250, 0.91, 7108.62, 147.33
|
||
128, 64.250, 1.03, 8337.22, 129.76, 32.125, 38.34, 224.91, 7.00, 128.500, 1.03,16775.65, 130.55, 64.250, 1.18, 7295.18, 113.54
|
||
192, 96.250, 1.54, 8414.49, 87.42, 48.125, 57.42, 224.97, 4.67, 192.500, 1.53,16847.93, 87.52, 96.250, 1.74, 7431.64, 77.21
|
||
256, 128.250, 2.06, 8362.01, 65.20, 64.125, 76.50, 225.02, 3.51, 256.500, 2.06,16693.65, 65.08, 128.250, 2.30, 7477.75, 58.31
|
||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||
```
|