llama.cpp

Commit Graph

Author	SHA1	Message	Date
bandoti	19f65187cb	cmake: add ggml find package (#11369 ) * Add initial ggml cmake package * Add build numbers to ggml find-package * Expand variables with GGML_ prefix * Guard against adding to cache variable twice * Add git to msys2 workflow * Handle ggml-cpu-* variants * Link ggml/ggml-base libraries to their targets * Replace main-cmake-pkg with simple-cmake-pkg * Interface features require c_std_90 * Fix typo * Removed unnecessary bracket from status message * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-26 12:07:48 -04:00
Georgi Gerganov	00c24acb2a	ci : fix line breaks on windows builds (#11409 ) * ci : fix line breaks on windows builds * cont : another try * ci : fix powershell line breaks	2025-01-25 13:36:48 +02:00
jiahao su	466ea66f33	CANN: Add Ascend CANN build ci (#10217 ) * CANN: Add Ascend CANN build ci * Update build.yml * Modify cann image version * Update build.yml * Change to run on x86 system * Update build.yml * Update build.yml * Modify format error * Update build.yml * Add 'Ascend NPU' label restrictions * Exclude non PR event Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org> * Update build.yml --------- Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>	2025-01-25 00:26:01 +01:00
Georgi Gerganov	9755129c27	release : pack /lib in the packages (#11392 ) * release : pack /lib and /include in the packages * cmake : put libs in /bin * TMP : push artifacts * Revert "TMP : push artifacts" This reverts commit `4decf2c4df`. * ci : fix HIP cmake compiler options to be on first line * ci : restore the original HIP commands * ci : change ubuntu build from latest to 20.04 * ci : try to fix macos build rpaths * ci : remove obsolete MacOS build * TMP : push artifacts * ci : change back to ubuntu latest * ci : macos set build rpath to "@loader_path" * ci : fix typo * ci : change ubuntu package to 22.04 * Revert "TMP : push artifacts" This reverts commit `537b09e70f`.	2025-01-24 18:41:30 +02:00
Eric Curtin	a1649cc13f	Adding linenoise.cpp to llama-run (#11252 ) This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows: https://github.com/ericcurtin/linenoise.cpp Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-18 14:42:31 +00:00
Georgi Gerganov	4dd34ff831	cmake : add sanitizer flags for llama.cpp (#11279 ) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-18 16:18:15 +02:00
Xuan Son Nguyen	f7cd13301c	ci : use actions from ggml-org (#11140 )	2025-01-08 16:09:20 +01:00
Georgi Gerganov	0d52a69e4b	ci : fix cmake option (#11125 )	2025-01-08 11:29:34 +02:00
Georgi Gerganov	f66f582927	llama : refactor `src/llama.cpp` (#10902 ) * llama : scatter llama.cpp into multiple modules (wip) * llama : control-vector -> adapter * llama : arch * llama : mmap ggml-ci * ci : remove BUILD_SHARED_LIBS=OFF ggml-ci * llama : arch (cont) ggml-ci * llama : chat ggml-ci * llama : model ggml-ci * llama : hparams ggml-ci * llama : adapter ggml-ci * examples : fix ggml-ci * rebase ggml-ci * minor * llama : kv cache ggml-ci * llama : impl ggml-ci * llama : batch ggml-ci * cont ggml-ci * llama : context ggml-ci * minor * llama : context (cont) ggml-ci * llama : model loader ggml-ci * common : update lora ggml-ci * llama : quant ggml-ci * llama : quant (cont) ggml-ci * minor [no ci]	2025-01-03 10:18:53 +02:00
Eve	7b1ec53f56	vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809 ) * ensure mul mat shaders work on systems with subgroup size less than 32 more fixes add test * only s_warptile_mmq needs to be run with 32 threads or more	2024-12-17 06:52:55 +01:00
lhez	a76c56fa1a	Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693 ) * [cl][adreno] Add Adreno GPU support Add new OpenCL backend to support Adreno GPUs --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com> * [cl][ci] Add workflow for CL * [cl][adreno] Fix memory leak for non SMALL_ALLOC path * opencl: integrate backend dyn.load interface and fix compiler and format warnings * opencl: remove small-alloc support and fix build errors for non-opencl platforms * opencl: fixed merge conflict (MUSA added twice in cmake) * opencl-ci: use RUNNER_TEMP instead of github.workspace * opencl: fix embed tool invocation with python3 * opencl: CI workflow fixes * opencl: Clean up small-alloc in CMake files * opencl: cleanup ggml-opencl2 header file * opencl: use ulong for offsets and strides in ADD kernel * opencl: use cl_ulong for all offsets * opencl: use cl_ulong for sizes and strides * opencl: use `GGML_LOG_xxx` instead of `fprintf(stderr, ...)` * opencl: rename backend `opencl2` -> `opencl` * opencl: rename kernel files `ggml-opencl2` -> `ggml-opencl` * opencl: make OpenCL required, remove redundant lib and inc directories * `ggml-base`, `..` and `.` are added by `ggml_add_backend_library` * opencl: rename backend - funcs, structs, etc `opencl2` -> `opencl` * opencl: remove copyright marker since main license already covers * opencl: replace some more OPENCL2 leftovers * opencl: remove limits on `tensor_extra` * opencl: use pools for `tensor_extra` * opencl: fix compiler warnings with GCC and Clang Still getting the warning about clCreateCmdQueue being obsolete. Will fix that separately. * opencl: fail gracefully if opencl devices are not available Also for unsupported GPUs. * opencl: fix MSVC builds (string length error) * opencl: check for various requirements, allow deprecated API * opencl: update log message for unsupported GPUs --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2024-12-13 12:23:52 -08:00
Diego Devesa	43ed389a3f	llama : use cmake for swift build (#10525 ) * llama : use cmake for swift build * swift : <> -> "" * ci : remove make * ci : disable ios build * Revert "swift : <> -> """ This reverts commit `d39ffd9556`. * ci : try fix ios build * ci : cont * ci : cont --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-08 13:14:54 +02:00
Georgi Gerganov	8648c52101	make : deprecate (#10514 ) * make : deprecate ggml-ci * ci : disable Makefile builds ggml-ci * docs : remove make references [no ci] * ci : disable swift build ggml-ci * docs : remove obsolete make references, scripts, examples ggml-ci * basic fix for compare-commits.sh * update build.md * more build.md updates * more build.md updates * more build.md updates * Update Makefile Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-02 21:22:53 +02:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Diego Devesa	e90688edd0	ci : fix tag name in cuda and hip releases (#10566 )	2024-11-28 15:58:54 +01:00
Diego Devesa	46c69e0e75	ci : faster CUDA toolkit installation method and use ccache (#10537 ) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master	2024-11-27 11:03:25 +01:00
Diego Devesa	c9b00a70b0	ci : fix cuda releases (#10532 )	2024-11-26 22:12:10 +01:00
Neo Zhang Jianyu	0bbd2262a3	restore the condistion to build & update pacakge when merge (#10507 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-26 21:43:47 +08:00
Diego Devesa	c6807b3f28	ci : add ubuntu cuda build, build with one arch on windows (#10456 )	2024-11-26 13:05:07 +01:00
Neo Zhang Jianyu	5a8987793f	[SYCL] Fix building Win package for oneAPI 2025.0 update (#10483 ) * fix build package for 2025.0 * debug * debug * fix * rm debug --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-25 17:31:10 +08:00
蕭澧邦	6dfcfef078	ci: Update oneAPI runtime dll packaging (#10428 ) This is the minimum runtime dll dependencies for oneAPI 2025.0	2024-11-22 10:44:08 +01:00
R0CKSTAR	f0204a0ec7	ci: build test musa with cmake (#10298 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-15 12:47:25 +01:00
Romain Biessy	5a54af4d4f	sycl: Use syclcompat::dp4a (#10267 ) * sycl: Use syclcompat::dp4a * Using the syclcompat version allow the compiler to optimize the operation with native function * Update news section * Update CI Windows oneAPI version to 2025.0 * Reword doc * Call syclcompat::dp4a inside dpct::dp4a This reverts commit `90cb61d692`.	2024-11-15 11:09:12 +08:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00
Georgi Gerganov	ec450d3bbf	metal : opt-in compile flag for BF16 (#10218 ) * metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci	2024-11-08 21:59:46 +02:00
Eve	3407364776	Q6_K AVX improvements (#10118 ) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86	2024-11-04 23:06:31 +01:00
Xuan Son Nguyen	f3fdcfaa79	ci : fine-grant permission (#9710 )	2024-10-04 11:47:19 +02:00
serhii-nakon	6f1d9d71f4	Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641 ) * Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS * Set ROCM_DOCKER_ARCH as string due it incorrectly build and cause OOM exit code	2024-09-30 20:57:12 +02:00
Neo Zhang Jianyu	95bc82fbc0	[SYCL] add missed dll file in package (#9577 ) * update oneapi to 2024.2 * use 2024.1 --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-09-26 17:38:31 +08:00
Huang Qi	e948a7da7a	CI: Provide prebuilt windows binary for hip (#9467 )	2024-09-21 02:39:41 +02:00
Georgi Gerganov	6262d13e0b	common : reimplement logging (#9418 ) https://github.com/ggerganov/llama.cpp/pull/9418	2024-09-15 20:46:12 +03:00
Huang Qi	4dc4f5f14a	ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329 )	2024-09-12 14:28:43 +03:00
Trivikram Kamat	3c26a1644d	ci : bump actions/checkout to v4 (#9377 )	2024-09-12 14:27:45 +03:00
awatuna	32b2ec88bc	Update build.yml (#9184 ) build rpc-server for windows cuda	2024-09-06 00:34:36 +02:00
Radoslav Gerganov	1f67436c5e	ci : enable RPC in all of the released builds (#9006 ) ref: #8912	2024-08-12 19:17:03 +03:00
Johannes Gäßler	6eeaeba126	cmake: use 1 more thread for non-ggml in CI (#8740 )	2024-07-28 22:32:44 +02:00
Johannes Gäßler	69c487f4ed	CUDA: MMQ code deduplication + iquant support (#8495 ) * CUDA: MMQ code deduplication + iquant support * 1 less parallel job for CI build	2024-07-20 22:25:26 +02:00
bandoti	17eb6aa8a9	vulkan : cmake integration (#8119 ) * Add Vulkan to CMake pkg * Add Sycl to CMake pkg * Add OpenMP to CMake pkg * Split generated shader file into separate translation unit * Add CMake target for Vulkan shaders * Update README.md * Add make target for Vulkan shaders * Use pkg-config to locate vulkan library * Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow * Clean up tabs * Move sudo to apt-key invocation * Forward GGML_EXTRA_LIBS to CMake config pkg * Update vulkan obj file paths * Add shaderc to nix pkg * Add python3 to Vulkan nix build * Link against ggml in cmake pkg * Remove Python dependency from Vulkan build * code review changes * Remove trailing newline * Add cflags from pkg-config to fix w64devkit build * Update README.md * Remove trailing whitespace * Update README.md * Remove trailing whitespace * Fix doc heading * Make glslc required Vulkan component * remove clblast from nix pkg	2024-07-13 18:12:39 +02:00
Olivier Chafik	8748d8ac6f	json: attempt to skip slow tests when running under emulator (#8189 )	2024-06-28 18:02:05 +01:00
loonerin	558f44bf83	CI: fix release build (Ubuntu+Mac) (#8170 ) * CI: fix release build (Ubuntu) PR #8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <loonerin@users.noreply.github.com>	2024-06-27 21:01:23 +02:00
slaren	ae5d0f4b89	ci : publish new docker images only when the files change (#8142 )	2024-06-26 21:59:28 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
olexiyb	f8ec8877b7	ci : fix macos x86 build (#7940 ) In order to use old `macos-latest` we should use `macos-12` Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975	2024-06-14 20:28:34 +03:00
Olivier Chafik	1c641e6aac	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 ) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>	2024-06-13 00:41:52 +01:00
slaren	c2ce6c47e4	fix CUDA CI by using a windows-2019 image (#7861 ) * try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019	2024-06-11 08:59:20 +03:00
Georgi Gerganov	554c247caf	ggml : remove OpenCL (#7735 ) ggml-ci	2024-06-04 21:23:20 +03:00
Masaya, Kato	a5735e4426	ggml : use OpenMP as a thread pool (#7606 ) * ggml: Added OpenMP for multi-threads processing * ggml : Limit the number of threads used to avoid deadlock * update shared state n_threads in parallel region * clear numa affinity for main thread even with openmp * enable openmp by default * fix msvc build * disable openmp on macos * ci : disable openmp with thread sanitizer * Update ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-03 17:14:15 +02:00
slaren	d359f30921	llama : remove MPI backend (#7395 )	2024-05-20 01:17:03 +02:00
Georgi Gerganov	059031b8c4	ci : re-enable sanitizer runs (#7358 ) * Revert "ci : temporary disable sanitizer builds (#6128)" This reverts commit `4f6d1337ca`. * ci : trigger	2024-05-18 18:55:54 +03:00
Gavin Zhao	82ca83db3c	ROCm: use native CMake HIP support (#5966 ) Supercedes #4024 and #4813. CMake's native HIP support has become the recommended way to add HIP code into a project (see [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)). This PR makes the following changes: 1. The environment variable `HIPCXX` or CMake option `CMAKE_HIP_COMPILER` should be used to specify the HIP compiler. Notably this shouldn't be `hipcc`, but ROCm's clang, which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. Note that since native CMake HIP support is not yet available on Windows, on Windows we fall back to the old behavior. 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the GPU architectures to build for. Previously this was controled by `GPU_TARGETS`. 3. Updated the Nix recipe to account for these new changes. 4. The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs. 5. Added CI checks for HIP on both Linux and Windows. On Linux, we test both the new and old behavior. The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang. ~~Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still uses `GPU_TARGETS` to control architectures to build for, but I feel like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're calling `make`.~~ Makefile used `GPU_TARGETS` but the README says to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`. Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS` to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new HIP support. Signed-off-by: Gavin Zhao <git@gzgz.dev>	2024-05-17 17:03:03 +02:00
Max Krasnyansky	172b78210a	ci: fix bin/Release path for windows-arm64 builds (#7317 ) Switch to Ninja Multi-Config CMake generator to resurect bin/Release path that broke artifact packaging in CI.	2024-05-16 15:36:43 +10:00
Max Krasnyansky	13ad16af12	Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191 ) * logging: add proper checks for clang to avoid errors and warnings with VA_ARGS * build: add CMake Presets and toolchian files for Windows ARM64 * matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings * ci: add support for optimized Windows ARM64 builds with MSVC and LLVM * matmul-int8: fixed typos in q8_0_q8_0 matmuls Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * matmul-int8: remove unnecessary casts in q8_0_q8_0 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-16 12:47:36 +10:00
Radoslav Gerganov	5e31828d3e	ggml : add RPC backend (#6829 ) * ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos	2024-05-14 14:27:19 +03:00
Neo Zhang	cbf75894d2	[SYCL] Add oneapi runtime dll files to win release package (#7241 ) * add oneapi running time dlls to release package * fix path * fix path * fix path * fix path * fix path --------- Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:04:29 +08:00
Neo Zhang	0d5cef78ae	[SYCL] update CI with oneapi 2024.1 (#7235 ) Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:02:55 +08:00
Przemysław Pawełczyk	ca7f29f568	ci : add building in MSYS2 environments (Windows) (#6967 )	2024-04-29 15:59:47 +03:00
loonerin	0e4802b2ec	ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748 )	2024-04-19 19:03:35 +02:00
Georgi Gerganov	9ed2737acc	ci : disable Metal for macOS-latest-cmake-x64 (#6628 )	2024-04-12 11:15:05 +03:00
Hugo Roussel	1bbdaf6ecd	ci: download artifacts to release directory (#6612 ) When action download-artifact was updated to v4, the default download path changed. This fix binaries not being uploaded to releases.	2024-04-11 19:52:21 +02:00
Pierrick Hymbert	b804b1ef77	eval-callback: Example how to use eval callback for debugging (#6576 ) * gguf-debug: Example how to use ggml callback for debugging * gguf-debug: no mutex, verify type, fix stride. * llama: cv eval: move cb eval field in common gpt_params * ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. * ggml_debug: ci: add tests * ggml_debug: EOL in CMakeLists.txt * ggml_debug: Remove unused param n_batch, no batching here * ggml_debug: fix trailing spaces * ggml_debug: fix trailing spaces * common: fix cb_eval and user data not initialized * ci: build revert label * ggml_debug: add main test label * doc: add a model: add a link to ggml-debug * ggml-debug: add to make toolchain * ggml-debug: tests add the main label * ggml-debug: ci add test curl label * common: allow the warmup to be disabled in llama_init_from_gpt_params * ci: add curl test * ggml-debug: better tensor type support * gitignore : ggml-debug * ggml-debug: printing also the sum of each tensor * ggml-debug: remove block size * eval-callback: renamed from ggml-debug * eval-callback: fix make toolchain --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-11 14:51:07 +02:00
Minsoo Cheong	7dda1b727e	ci: exempt master branch workflows from getting cancelled (#6486 ) * ci: exempt master branch workflows from getting cancelled * apply to bench.yml	2024-04-04 18:30:53 +02:00
Ewout ter Hoeven	c666ba26c3	build CI: Name artifacts (#6482 ) Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP. It might be possible to further simplify the packing step (in future PRs).	2024-04-04 17:08:55 +02:00
Ewout ter Hoeven	9f62c0173d	ci : update checkout, setup-python and upload-artifact to latest (#6456 ) * CI: Update actions/checkout to v4 * CI: Update actions/setup-python to v5 * CI: Update actions/upload-artifact to v4	2024-04-03 21:01:13 +03:00
Neo Zhang Jianyu	a4f569e8a3	[SYCL] fix no file in win rel (#6314 )	2024-03-27 09:47:06 +08:00
slaren	280345968d	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
Neo Zhang Jianyu	d03224ac98	Support build win release for SYCL (#6241 ) * support release win * fix value * fix value * fix value * fix error * fix error * fix format	2024-03-24 09:44:01 +08:00
fraxy-v	92397d87a4	convert-llama2c-to-ggml : enable conversion of GQA models (#6237 ) * convert-llama2c-to-ggml: enable conversion of multiqueries, #5608 * add test in build action * Update build.yml * Update build.yml * Update build.yml * gg patch	2024-03-22 20:49:06 +02:00
Minsoo Cheong	ee804f6223	ci: apply concurrency limit for github workflows (#6243 )	2024-03-22 19:15:06 +02:00
Olivier Chafik	f77a8ffd3b	tests : conditional python & node json schema tests (#6207 ) * json: only attempt python & node schema conversion tests if their bins are present Tests introduced in https://github.com/ggerganov/llama.cpp/pull/5978 disabled in https://github.com/ggerganov/llama.cpp/pull/6198 * json: orange warnings when tests skipped * json: ensure py/js schema conv tested on ubuntu-focal-make * json: print env vars in test	2024-03-22 15:09:07 +02:00
Vaibhav Srivastav	b2075fd6a5	ci : add CURL flag for the mac builds (#6214 )	2024-03-22 09:53:43 +02:00
Vaibhav Srivastav	1943c01981	ci : fix indentation error (#6195 )	2024-03-21 11:30:40 +02:00
Vaibhav Srivastav	5e43ba8742	build : add mac pre-build binaries (#6182 ) * Initial commit - add mac prebuilds. * forward contribution credits for building the workflow. * minor : remove trailing whitespaces --------- Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-21 11:13:12 +02:00
Georgi Gerganov	4f6d1337ca	ci : temporary disable sanitizer builds (#6128 )	2024-03-18 13:45:27 +02:00
Pierrick Hymbert	d01b3c4c32	common: llama_load_model_from_url using --model-url (#6098 ) * common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-17 19:12:37 +01:00
Georgi Gerganov	381da2d9f0	metal : build metallib + fix embed path (#6015 ) * metal : build metallib + fix embed path ggml-ci * metal : fix embed build + update library load logic ggml-ci * metal : fix embeded library build ggml-ci * ci : fix iOS builds to use embedded library	2024-03-14 11:55:23 +02:00
Michael Podvitskiy	3202361c5b	ggml, ci : Windows ARM runner and build fixes (#5979 ) * windows arm ci * fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64 * fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned` * fix `error C2065: '__fp16': undeclared identifier`	2024-03-11 11:28:51 +02:00
Eve	6ea0f010ff	ci : add Ubuntu 22 Vulkan CI run (#5789 )	2024-03-01 10:54:53 +02:00
Radosław Gryta	abbabc5e51	ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (#5711 ) * [ggml-quants] Provide ggml_vqtbl1q_u8 for 64bit compatibility vqtbl1q_u8 is not part of arm v7 neon library * [android-example] Remove abi filter after arm v7a fix * [github-workflows] Do not skip Android armeabi-v7a build	2024-02-25 20:43:00 +02:00
Ananta Bastola	6e4e973b26	ci : add an option to fail on compile warning (#3952 ) * feat(ci): add an option to fail on compile warning * Update CMakeLists.txt * minor : fix compile warnings ggml-ci * ggml : fix unreachable code warnings ggml-ci * ci : disable fatal warnings for windows, ios and tvos * ggml : fix strncpy warning * ci : disable fatal warnings for MPI build * ci : add fatal warnings to ggml-ci ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-17 23:03:14 +02:00
Abhilash Majumder	6e99f2a04f	Fix f16_sycl cpy call from Arc (#5411 ) * fix f16_sycl cpy call * rm old logic * add fp16 build CI * use macro * format fix	2024-02-08 22:39:10 +05:30
Eve	1cfb5372cf	Fix broken Vulkan Cmake (properly) (#5230 ) * build vulkan as object * vulkan ci	2024-01-31 20:21:55 +01:00
Neo Zhang Jianyu	01684139c3	support SYCL backend windows build (#5208 ) * support SYCL backend windows build * add windows build in CI * add for win build CI * correct install oneMKL * fix install issue * fix ci * fix install cmd * fix install cmd * fix install cmd * fix install cmd * fix install cmd * fix win build * fix win build * fix win build * restore other CI part * restore as base * rm no new line * fix no new line issue, add -j * fix grammer issue * allow to trigger manually, fix format issue * fix format * add newline * fix format * fix format * fix format issuse --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-01-31 08:08:07 +05:30
Jared Van Bortel	fbf1ddec69	Nomic Vulkan backend (#4456 ) Signed-off-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: niansa <anton-sa@web.de> Co-authored-by: Adam Treat <treat.adam@gmail.com> Co-authored-by: Aaron Miller <apage43@ninjawhale.com> Co-authored-by: ToKiNoBug <tokinobug@163.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-01-29 15:50:50 -05:00
Abhilash Majumder	0f648573dd	ggml : add unified SYCL backend for Intel GPUs (#2690 ) * first update for migration * update init_cublas * add debug functio, commit all help code * step 1 * step 2 * step3 add fp16, slower 31->28 * add GGML_LIST_DEVICE function * step 5 format device and print * step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue * support main device is non-zero * step7 add debug for code path, rm log * step 8, rename all macro & func from cuda by sycl * fix error of select non-zero device, format device list * ren ggml-sycl.hpp -> ggml-sycl.h * clear CMAKE to rm unused lib and options * correct queue: rm dtct:get_queue * add print tensor function to debug * fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481 * summary dpct definition in one header file to replace folder:dpct * refactor device log * mv dpct definition from folder dpct to ggml-sycl.h * update readme, refactor build script * fix build with sycl * set nthread=1 when sycl, increase performance * add run script, comment debug code * add ls-sycl-device tool * add ls-sycl-device, rm unused files * rm rear space * dos2unix * Update README_sycl.md * fix return type * remove sycl version from include path * restore rm code to fix hang issue * add syc and link for sycl readme * rm original sycl code before refactor * fix code err * add know issue for pvc hang issue * enable SYCL_F16 support * align pr4766 * check for sycl blas, better performance * cleanup 1 * remove extra endif * add build&run script, clean CMakefile, update guide by review comments * rename macro to intel hardware * editor config format * format fixes * format fixes * editor format fix * Remove unused headers * skip build sycl tool for other code path * replace tab by space * fix blas matmul function * fix mac build * restore hip dependency * fix conflict * ren as review comments * mv internal function to .cpp file * export funciton print_sycl_devices(), mv class dpct definition to source file * update CI/action for sycl code, fix CI error of repeat/dup * fix action ID format issue * rm unused strategy * enable llama_f16 in ci * fix conflict * fix build break on MacOS, due to CI of MacOS depend on external ggml, instead of internal ggml * fix ci cases for unsupported data type * revert unrelated changed in cuda cmake remove useless nommq fix typo of GGML_USE_CLBLAS_SYCL * revert hip cmake changes * fix indent * add prefix in func name * revert no mmq * rm cpu blas duplicate * fix no_new_line * fix src1->type==F16 bug. * pass batch offset for F16 src1 * fix batch error * fix wrong code * revert sycl checking in test-sampling * pass void as arguments of ggml_backend_sycl_print_sycl_devices * remove extra blank line in test-sampling * revert setting n_threads in sycl * implement std::isinf for icpx with fast math. * Update ci/run.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/sycl/run-llama2.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/sycl/run-llama2.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add copyright and MIT license declare * update the cmd example --------- Co-authored-by: jianyuzh <jianyu.zhang@intel.com> Co-authored-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-28 17:56:23 +02:00
crasm	413e7b0559	ci : add model tests + script wrapper (#4586 ) * scripts : add lib.sh and lib_test.sh * scripts : stub out new ci-run.sh script * scripts : switch to PascalCase for functions This looks a little odd at first, but I find it very useful as a convention to know if a command is part of our code vs a builtin. * scripts : add some fancy conversion from snake_case to PascalCase * Add venv to ci/run.sh * Revert scripts work * scripts : add wrapper script for local use of ci/run.sh * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * Update test-model-load-cancel * ci : add ctest_with_model for debug and release ggml-ci * Fix gg_get_model function ggml-ci * got stuck on CMake * Add get_model.cpp to tests/CMakeLists.txt ggml-ci * Fix README.md output for ctest_with_model ggml-ci * workflows : use `-L main` for all ctest ggml-ci * Fixes * GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE * Always show warning rather than failing if model file variable is not set * scripts : update usage text for ci-run.sh	2024-01-26 14:18:00 +02:00
bobqianic	57744932c6	ci : fix Windows CI by updating Intel SDE version (#5053 )	2024-01-22 10:55:05 +02:00
Neuman Vong	862f5e41ab	android : introduce starter project example (#4926 ) * Introduce starter project for Android Based on examples/llama.swiftui. * Add github workflow * Set NDK version * Only build arm64-v8a in CI * Sync bench code * Rename CI prop to skip-armeabi-v7a * Remove unused tests	2024-01-16 15:47:34 +02:00
Someone Serge	7adedecbe3	workflows: nix-ci: init; build flake outputs	2023-12-31 13:14:58 -08:00
Georgi Gerganov	fe680e3d10	sync : ggml (new ops, tests, backend, etc.) (#4359 ) * sync : ggml (part 1) * sync : ggml (part 2, CUDA) * sync : ggml (part 3, Metal) * ggml : build fixes ggml-ci * cuda : restore lost changes * cuda : restore lost changes (StableLM rope) * cmake : enable separable compilation for CUDA ggml-ci * ggml-cuda : remove device side dequantize * Revert "cmake : enable separable compilation for CUDA" This reverts commit `09e35d04b1`. * cuda : remove assert for rope * tests : add test-backend-ops * ggml : fix bug in ggml_concat * ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()` * ci : try to fix macOS * ggml-backend : remove backend self-registration * ci : disable Metal for macOS cmake build ggml-ci * metal : fix "supports family" call * metal : fix assert * metal : print resource path ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-12-07 22:26:54 +02:00
Bailey Chittle	bb03290c17	examples : iOS example with swift ui (#4159 ) * copy to llama.cpp as subdir * attempt enabling metal, fails * ggml metal compiles! * Update README.md * initial conversion to new format, utf8 errors? * bug fixes, but now has an invalid memory access :( * added O3, now has insufficient memory access * begin sync with master * update to match latest code, new errors * fixed it! * fix for loop conditionals, increase result size * fix current workflow errors * attempt a llama.swiftui workflow * Update .github/workflows/build.yml Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-27 16:56:52 +02:00
Eve	a7fac013cf	ci : use intel sde when ci cpu doesn't support avx512 (#3949 )	2023-11-05 09:46:44 +02:00
Zane Shannon	24ba3d829e	examples : add batched.swift + improve CI for swift (#3562 )	2023-10-11 06:14:05 -05:00
Georgi Gerganov	94e502dfb7	ci : enable on obj-c changes + fix metal build (#3540 )	2023-10-08 11:24:50 +03:00
Jhen-Jie Hong	04b2f4386e	ci : fix xcodebuild destinations (#3491 ) * ci : fix xcodebuild destinations * ci : add .swift to paths	2023-10-06 13:36:43 +03:00
Jhen-Jie Hong	0745384449	ci : add swift build via xcodebuild (#3482 )	2023-10-05 16:56:21 +03:00
Eve	017efe899d	cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273 ) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert `8283237` and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>	2023-10-03 19:53:15 +03:00
Eve	0512d66670	ci : multithreaded builds (#3311 ) * mac and linux threads * windows * Update build.yml * Update build.yml * Update build.yml * automatically get thread count * windows syntax * try to fix freebsd * Update build.yml * Update build.yml * Update build.yml	2023-09-28 22:31:04 +03:00
Georgi Gerganov	2619109ad5	ci : disable freeBSD builds due to lack of VMs (#3381 )	2023-09-28 19:36:36 +03:00
Alon	a40f2b656f	CI: FreeBSD fix (#3258 ) * - freebsd ci: use qemu	2023-09-20 14:06:36 +02:00
Erik Scholz	7ddf185537	ci : switch cudatoolkit install on windows to networked (#3236 )	2023-09-18 02:21:47 +02:00

1 2 3 4

192 Commits