add cuda samples and cuda benchmarks

This commit is contained in:
qinzhaoyu 2022-03-11 14:55:55 +08:00
parent c5337cdf43
commit d6d28ed7e0
2 changed files with 101 additions and 2 deletions

100
ReadMe.md
View File

@ -20,10 +20,108 @@ CUDA gpu 编程学习,基于 《CUDA 编程——基础与实践》(樊哲
14. [CUDA 标准库](./capter14/ReadMe.md) 14. [CUDA 标准库](./capter14/ReadMe.md)
CUDA 官方文档: ## CUDA 官方文档
[CUDA c++编程指南](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) [CUDA c++编程指南](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
[CUDA c++最佳实践指南](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html) [CUDA c++最佳实践指南](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
[CUDA 运行时API手册](https://docs.nvidia.com/cuda/cuda-runtime-api/index.html) [CUDA 运行时API手册](https://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
[CUDA 数学函数库API手册](https://docs.nvidia.com/cuda/cuda-math-api/index.html) [CUDA 数学函数库API手册](https://docs.nvidia.com/cuda/cuda-math-api/index.html)
## CUDA 编程案例
[CUDA Samples](https://github.com/NVIDIA/cuda-samples)
+ Simple Reference
基础CUDA示例适用于初学者 反映了运用CUDA和CUDA runtime APIs的一些基本概念.
+ Utilities Reference
演示如何查询设备能力和衡量GPU/CPU 带宽的实例程序。
+ Graphics Reference
图形化示例展现的是 CUDA, OpenGL, DirectX 之间的互通性。
+ Imaging Reference
图像处理,压缩,和数据分析。
+ Finance Reference
金融计算的并行处理。
+ Simulations Reference
展现一些运用CUDA的模拟算法。
+ Advanced Reference
用CUDA实现的一些先进的算法。
+ Cudalibraries Reference
这类示例主要告诉我们该如何使用CUDA各种函数库(NPP, CUBLAS, CUFFT,CUSPARSE, and CURAND)。
## CUDA 性能测试
[CUDA Bechmarks](https://github.com/ekondis/mixbench)
+ Four types of experiments are executed combined with global memory accesses:
Single precision Flops (multiply-additions)
Double precision Flops (multiply-additions)
Half precision Flops (multiply-additions)
Integer multiply-addition operations
+ Building is based now on CMake files. Each implementation resides in a separate folder:
CUDA implementation: mixbench-cuda
OpenCL implementation: mixbench-opencl
HIP implementation: mixbench-hip
SYCL implementation: mixbench-sycl
生成的测试结果类似:
```
mixbench/read-only (v0.03-2-gbccfd71)
------------------------ Device specifications ------------------------
Device: GeForce RTX 2070
CUDA driver version: 10.20
GPU clock rate: 1620 MHz
Memory clock rate: 3500 MHz
Memory bus width: 256 bits
WarpSize: 32
L2 cache size: 4096 KB
Total global mem: 7979 MB
ECC enabled: No
Compute Capability: 7.5
Total SPs: 2304 (36 MPs x 64 SPs/MP)
Compute throughput: 7464.96 GFlops (theoretical single precision FMAs)
Memory bandwidth: 448.06 GB/sec
-----------------------------------------------------------------------
Total GPU memory 8366784512, free 7941521408
Buffer size: 256MB
Trade-off type: compute with global memory (block strided)
Elements per thread: 8
Thread fusion degree: 4
----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------
Experiment ID, Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
0, 0.250, 0.32, 104.42, 417.68, 0.125, 0.63, 53.04, 424.35, 0.500, 0.32, 211.41, 422.81, 0.250, 0.32, 105.58, 422.30
1, 0.750, 0.32, 316.34, 421.79, 0.375, 0.63, 158.69, 423.18, 1.500, 0.32, 634.22, 422.81, 0.750, 0.32, 317.30, 423.07
2, 1.250, 0.32, 528.46, 422.77, 0.625, 0.78, 215.91, 345.45, 2.500, 0.32, 1055.97, 422.39, 1.250, 0.32, 528.57, 422.86
3, 1.750, 0.32, 738.81, 422.17, 0.875, 1.08, 218.17, 249.34, 3.500, 0.32, 1478.95, 422.56, 1.750, 0.32, 740.59, 423.20
4, 2.250, 0.32, 951.33, 422.81, 1.125, 1.38, 219.57, 195.17, 4.500, 0.32, 1902.66, 422.81, 2.250, 0.32, 950.66, 422.51
5, 2.750, 0.32, 1162.74, 422.81, 1.375, 1.67, 220.38, 160.28, 5.500, 0.32, 2328.52, 423.37, 2.750, 0.32, 1162.74, 422.81
6, 3.250, 0.32, 1374.56, 422.94, 1.625, 1.97, 220.99, 135.99, 6.500, 0.32, 2756.62, 424.10, 3.250, 0.32, 1375.81, 423.32
7, 3.750, 0.32, 1592.45, 424.65, 1.875, 2.27, 221.38, 118.07, 7.500, 0.32, 3169.50, 422.60, 3.750, 0.32, 1585.55, 422.81
8, 4.250, 0.32, 1796.95, 422.81, 2.125, 2.57, 221.71, 104.33, 8.500, 0.32, 3587.76, 422.09, 4.250, 0.37, 1545.63, 363.68
9, 4.750, 0.32, 2006.34, 422.39, 2.375, 2.87, 221.85, 93.41, 9.500, 0.32, 3995.38, 420.57, 4.750, 0.32, 1998.29, 420.69
10, 5.250, 0.32, 2209.52, 420.86, 2.625, 3.17, 222.02, 84.58, 10.500, 0.32, 4439.54, 422.81, 5.250, 0.32, 2220.44, 422.94
11, 5.750, 0.32, 2434.12, 423.32, 2.875, 3.47, 222.17, 77.28, 11.500, 0.32, 4855.01, 422.17, 5.750, 0.32, 2426.77, 422.05
12, 6.250, 0.32, 2638.06, 422.09, 3.125, 3.78, 222.18, 71.10, 12.500, 0.32, 5227.20, 418.18, 6.250, 0.38, 2202.15, 352.34
13, 6.750, 0.32, 2841.95, 421.03, 3.375, 4.08, 222.30, 65.87, 13.500, 0.32, 5712.58, 423.15, 6.750, 0.32, 2850.54, 422.30
14, 7.250, 0.32, 3065.39, 422.81, 3.625, 4.37, 222.45, 61.36, 14.500, 0.32, 6135.74, 423.15, 7.250, 0.32, 3065.08, 422.77
15, 7.750, 0.33, 3143.40, 405.60, 3.875, 4.67, 222.57, 57.44, 15.500, 0.32, 6546.34, 422.34, 7.750, 0.32, 3268.89, 421.79
16, 8.250, 0.32, 3482.59, 422.13, 4.125, 4.98, 222.57, 53.96, 16.500, 0.32, 6957.48, 421.67, 8.250, 0.39, 2803.68, 339.84
17, 8.750, 0.32, 3693.66, 422.13, 4.375, 5.28, 222.53, 50.86, 17.500, 0.32, 7396.24, 422.64, 8.750, 0.32, 3694.77, 422.26
18, 9.250, 0.32, 3901.58, 421.79, 4.625, 5.58, 222.58, 48.12, 18.500, 0.32, 7786.72, 420.90, 9.250, 0.32, 3897.66, 421.37
20, 10.250, 0.32, 4312.53, 420.73, 5.125, 6.18, 222.66, 43.45, 20.500, 0.32, 8640.66, 421.50, 10.250, 0.41, 3374.54, 329.22
22, 11.250, 0.32, 4729.94, 420.44, 5.625, 6.78, 222.74, 39.60, 22.500, 0.32, 9452.31, 420.10, 11.250, 0.32, 4734.21, 420.82
24, 12.250, 0.32, 5148.83, 420.31, 6.125, 7.36, 223.51, 36.49, 24.500, 0.32,10346.40, 422.30, 12.250, 0.42, 3900.12, 318.38
28, 14.250, 0.32, 6009.94, 421.75, 7.125, 8.53, 224.23, 31.47, 28.500, 0.32,11975.32, 420.19, 14.250, 0.44, 4368.11, 306.53
32, 16.250, 0.32, 6795.36, 418.18, 8.125, 9.72, 224.31, 27.61, 32.500, 0.32,13605.64, 418.64, 16.250, 0.45, 4797.12, 295.21
40, 20.250, 0.34, 7899.43, 390.10, 10.125, 12.11, 224.50, 22.17, 40.500, 0.33,16371.37, 404.23, 20.250, 0.50, 5464.85, 269.87
48, 24.250, 0.41, 8029.04, 331.09, 12.125, 14.49, 224.58, 18.52, 48.500, 0.40,16468.89, 339.56, 24.250, 0.54, 5986.22, 246.85
56, 28.250, 0.47, 8114.58, 287.24, 14.125, 16.88, 224.65, 15.90, 56.500, 0.46,16443.12, 291.03, 28.250, 0.60, 6342.42, 224.51
64, 32.250, 0.53, 8154.47, 252.85, 16.125, 19.26, 224.72, 13.94, 64.500, 0.52,16536.22, 256.38, 32.250, 0.66, 6591.93, 204.40
80, 40.250, 0.66, 8242.80, 204.79, 20.125, 24.03, 224.79, 11.17, 80.500, 0.65,16644.88, 206.77, 40.250, 0.78, 6909.54, 171.67
96, 48.250, 0.78, 8321.35, 172.46, 24.125, 28.80, 224.85, 9.32, 96.500, 0.78,16685.23, 172.90, 48.250, 0.91, 7108.62, 147.33
128, 64.250, 1.03, 8337.22, 129.76, 32.125, 38.34, 224.91, 7.00, 128.500, 1.03,16775.65, 130.55, 64.250, 1.18, 7295.18, 113.54
192, 96.250, 1.54, 8414.49, 87.42, 48.125, 57.42, 224.97, 4.67, 192.500, 1.53,16847.93, 87.52, 96.250, 1.74, 7431.64, 77.21
256, 128.250, 2.06, 8362.01, 65.20, 64.125, 76.50, 225.02, 3.51, 256.500, 2.06,16693.65, 65.08, 128.250, 2.30, 7477.75, 58.31
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
```

View File

@ -68,7 +68,8 @@ CUDA 基于 CUDA 事件的计时方法:
>> nvprof ./bin/clock >> nvprof ./bin/clock
(没有输出结果)。 如果没有输出结果,需要将`nvprof`的目录包含到环境环境变量中不支持7.5 以上计算能力的显卡)。
推荐采用一代性能分析工具: [Nvidia Nsight Systems](https://developer.nvidia.com/zh-cn/nsight-systems).
------ ------