nccl-tests

Commit Graph

Author	SHA1	Message	Date
David Addison	fae7cb4727	Merge pull request #316 from martin-belanger/print-program-name Print the name of the program being executed before and after test output	2025-07-24 14:58:54 -07:00
David Addison	6edafa0a9c	Add extra reserved space during maxBytes calculation Also, don't allow minBytes > maxBytes	2025-07-23 16:19:37 -07:00
David Addison	def2d3689c	Minor fix to Makefile Move comments to separate lines	2025-07-23 16:04:30 -07:00
David Addison	97ee098516	Add Turing (SM75) support to CUDA 13.0 builds	2025-06-04 17:54:58 -07:00
David Addison	e7c8825b0b	Wrap ncclCommWindowRegister() calls within ncclGroup	2025-06-03 10:36:53 -07:00
Martin Belanger	dafb70408d	Print the name of the program being executed One thing missing from the stdout of each performance test is the name of the test that is actually being run. This patch adds 2 new messages to the stdout. At the beginning of the execution of a test (e.g. sendrecv_perf) we will now see this message: Collective test starting: sendrecv_perf And at the end, we will now see this: Collective test concluded: sendrecv_perf This is needed when running several tests consecutively and we're trying to parse the stdout to collect the results. For example, using a Python script to parse the stdout, one could retrieve the results for each test and plot them on a graph. This patch makes it easier to implement such a script. Signed-off-by: Martin Belanger <martin.belanger@dell.com>	2025-06-03 11:43:02 -04:00
David Addison	5290298ab6	Reinstate Pascal suppport for CUDA 12.8+ builds	2025-06-02 09:29:52 -07:00
David Addison	8bc16f4e01	Need to drop Volta (sm_70) support from CUDA 13.0	2025-05-30 18:04:25 -07:00
David Addison	0c60e6a8e4	Fix formatting errors in README.md	2025-05-30 17:43:30 -07:00
David Addison	a5c539e68b	Add support for Symmetric Memory Registration From NCCL 2.27.x we can now use the Symmetric Memory APIs (-R 2)	2025-05-30 17:31:34 -07:00
David Addison	e041d901e6	Re-add sm_70 support for CUDA 12.8+ and 13.0 builds	2025-05-07 10:30:59 -07:00
David Addison	1021260ca9	Make verifiable a DSO and add NAME_SUFFIX support Build option DSO=1 generates libverifiable.so which can be used to reduce the combined binary size. Build option NAME_SUFFIX can be used to a add suffix to all generated binaries. e.g. NAME_SUFFIX=_mpi Added new make target: clean_intermediates	2025-04-23 17:07:24 -07:00
David Addison	501a149d57	Add support for FP8 datatypes Added new datatypes: f8e4m3, f8e5m2 Only supported on H100+ architectures and NCCL versions >= 2.24.0	2025-04-18 19:20:59 -07:00
David Addison	b4300cc79d	Add PCI domain and device ID for GPU device BDF display	2025-02-28 13:25:51 -08:00
Sylvain Jeaugey	903918fc54	Add NCCL_TESTS_SPLIT documentation in the README	2025-02-06 14:10:07 +01:00
Junyu Ma	a89cf07fe8	Perftests: Introduce NCCL_TESTS_SPLIT env `NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators. Will be overrided by `NCCL_TESTS_SPLIT_MASK`. Examples: NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node. NCCL_TESTS_SPLIT="AND 0x7" # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7 NCCL_TESTS_SPLIT="MOD 72" # color = rank % 72. One GPU per NVLink domain on an NVL72 system. NCCL_TESTS_SPLIT="DIV 72" # color = rank / 72. Intra NVLink domain on NVL72. You can also use: "%" "&" "\|" "/" for short. Extra spaces in the middle will be automatically ignored. Not case sensitive. The followings are all equivalent: NCCL_TESTS_SPLIT="%0x7" NCCL_TESTS_SPLIT="%0b111" NCCL_TESTS_SPLIT="AND 7" NCCL_TESTS_SPLIT="and 0x7"	2025-02-04 15:18:09 -08:00
David Addison	cb6a46fdd6	Update CUDA gencodes Add support for Blackwell sm100 and sm120 from CUDA 12.8 Add support for Hopper sm90 from CUDA 12.0	2025-01-25 17:32:16 -08:00
John Bachan	29f4114f02	Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes. This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.	2024-12-18 11:20:28 -08:00
Sylvain Jeaugey	8dfeab9eb9	Merge pull request #259 from NVIDIA/fix-ncclstringtotype Future-proof ncclstringtotype	2024-10-24 10:28:02 -07:00
Kamil Iskra	34d6d53910	Future-proof ncclstringtotype Ensure that ncclstringtotype iterates only over data types known to nccl-tests (as indicated by test_typenum), not over a potentially larger set of all NCCL types.	2024-10-24 09:21:37 -07:00
David Addison	9d26b8422b	Merge pull request #226 from netgroup/master improve parsing of stepbytes (increment size) argument	2024-07-30 14:58:54 -07:00
David Addison	0d86b5a6e7	Added some missing command line options to README.md Also updated single and multi-node examples.	2024-07-30 14:50:45 -07:00
David Addison	d2d40cc824	Added -N,--run_cycles option	2024-07-25 22:00:23 -07:00
David Addison	3a3f790efd	Merge pull request #240 from OrenLeung/patch-1 doc: add all2all factor	2024-07-25 22:00:06 -07:00
Oren	c6eb15875f	doc: add all2all factor	2024-07-24 22:55:00 -04:00
Stefano Salsano	746549b28d	improve parsing of stepbytes (increment size) argument	2024-06-14 11:28:55 +02:00
Kaiming Ouyang	d028efcf35	Change ncclCommRegister size to maxBytes in serial comm init	2024-06-06 06:54:48 -07:00
Giuseppe Congiu	a1efb427e7	Add -R option to register user buffers	2024-06-03 01:04:58 -07:00
David Addison	c6afef0b6f	Added missing MPI_Comm_free() call before MPI_Finalize()	2024-02-05 08:53:54 -08:00
David Addison	1292b25553	Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue	2023-10-12 16:53:32 -07:00
David Addison	6c46206a47	Make the -c option be a datacheck iteration count parameter Default is 1	2023-09-13 14:03:38 -07:00
Sylvain Jeaugey	1a5f551ffd	Merge pull request #146 from yangxingwu/master makefile: remove extra space	2023-06-06 11:58:24 +02:00
yangxingwu	52ea1b2148	makefile: remove extra space	2023-06-06 09:47:50 +00:00
Sylvain Jeaugey	e98ef24bc0	Merge pull request #135 from aavbsouza/fix_nvcc_variable_handling fix handling of variable NVCC.	2023-03-27 11:14:10 +02:00
alan.souza	7ccda3c97b	fix handling of variable NVCC. Permit overriding the variable using environment variables	2023-03-25 16:56:16 -03:00
David Addison	e76e36e9a9	Merge pull request #134 from flx42/patch-1 Update README.md to fix -i default increment value.	2023-03-23 09:53:15 -07:00
Felix Abecassis	17d0a42d5a	Update README.md	2023-03-23 09:05:41 -07:00
Sylvain Jeaugey	2cbb968101	Update README.md Improve MPI example to avoid confusion of number of processes / total number of GPUs. https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369	2023-01-03 08:47:43 +01:00
David Addison	0b4c4cb99f	Add boot_id to the hostname hash due to collisions on Azure Fixes #60	2022-12-12 01:16:46 -08:00
Jithin Jose	0aeba157db	Use DJB2a hash algorithm in getHostHash()	2022-12-12 01:16:38 -08:00
David Addison	24fcf64ed1	Call cudaFreeHost() on wrongPerGpu not cudaFree()	2022-11-22 11:18:37 -08:00
David Addison	3bd2bd292b	Add fflush(stdout) before perf output	2022-11-22 11:16:47 -08:00
Sylvain Jeaugey	365b92a1ea	Fix build on RHEL7 with GCC 4.8 Add -std=c++11 to CXXFLAGS. Fixes #116.	2022-10-12 01:24:14 -07:00
Sylvain Jeaugey	d313d20a26	Update NCCL tests	2022-09-23 01:13:29 -07:00
David Addison	749573f2d6	Fix preprocessor version check for ncclGetLastError() ncclGetLastError() was added in NCCL 2.13.0	2022-09-07 16:10:41 -07:00
David Addison	afa4c56b6a	Fix an issue with the last commit when data checking is disabled	2022-09-07 11:23:49 -07:00
David Addison	a0a14911ee	Display N/A for error count in AlltoAll in-place test AlltoAll does not support in-place buffers	2022-09-06 13:17:15 -07:00
John Bachan	bc5f7cfb0a	Changed top-level Makefile behavior so that BUILDDIR is interpreted as relative to top-level directory. This done is by abspath'ing it before passing it to subdirectory Makefile's. The old behavior had two cases: with and without BUILDDIR being set by the user. With BUILDDIR not set, the build dir would be named "build" in the top-level directory. If BUILDDIR was set, then the build dir would be placed at "src/${BUILDDIR}". The new behavior is simpler, if BUILDDIR is not set then it defaults to "build", and the directory holding the final build is always at just "${BUILDDIR}" in the top level.	2022-08-23 10:08:49 -07:00
John Bachan	51af5572bf	Resync with NCCL 2.13 * Added "verifiable", a suite of kernels for generating and verifying reduction input and output arrays in a bit-precise way. * Data corruption errors now reported in number of wrong elements instead of max deviation. * Use ncclGetLastError. * Don't run hypercube on non-powers of 2 ranks. * Fix to hypercube data verification. * Use "thread local" as the defaut CUDA capture mode. * Replaced pthread_yield -> sched_yield() * Bugfix to the cpu-side barrier/allreduce implementations.	2022-08-22 17:51:06 -07:00
David Addison	8274cb47b6	Merge pull request #96 from NVIDIA/nersc-linkage-fix Add option to statically link cudart	2022-05-26 16:54:44 -07:00

1 2

100 Commits All Branches Search

100 Commits

All Branches