llama.cpp

Commit Graph

Author	SHA1	Message	Date
Johannes Gäßler	bbd0f91779	server-bench: make seed choice configurable (#14929 ) * server-bench: make seed choice configurable * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix error formatting * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-07-29 10:40:50 +02:00
Georgi Gerganov	1f45f2890e	sync : ggml	2025-07-28 08:15:01 +03:00
Aman Gupta	446595b9b3	Docs: add instructions for adding backends (#14889 )	2025-07-27 09:36:43 +08:00
Georgi Gerganov	2df255da3c	sync : ggml ggml-ci	2025-07-24 20:27:23 +03:00
Georgi Gerganov	b17230917c	sync : ggml	2025-07-19 11:46:50 +03:00
Johannes Gäßler	5cae766541	scripts: synthetic prompt mode for server-bench.py (#14695 )	2025-07-16 09:33:28 +02:00
Johannes Gäßler	494c5899cb	scripts: benchmark for HTTP server throughput (#14668 ) * scripts: benchmark for HTTP server throughput * fix server connection reset	2025-07-14 13:14:30 +02:00
Georgi Gerganov	8eff95544e	sync : ggml	2025-07-12 16:13:27 +03:00
Georgi Gerganov	215535701d	sync : ggml ggml-ci	2025-07-12 14:25:44 +03:00
Aman Gupta	11ee0fea2a	Docs: script to auto-generate ggml operations docs (#14598 ) * Docs: script to auto-generate ggml operations docs * Review: formatting changes + change github action * Use built-in types instead of typing * docs : add BLAS and Metal ops --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-07-10 23:29:01 +08:00
Georgi Gerganov	d4cdd9c1c3	ggml : remove kompute backend (#14501 ) ggml-ci	2025-07-03 07:48:32 +03:00
Georgi Gerganov	e17991c466	sync : ggml ggml-ci	2025-07-02 20:08:45 +03:00
Georgi Gerganov	f61c05d4b1	sync : ggml ggml-ci	2025-07-01 11:06:39 +03:00
Vedran Miletić	e9b6350e61	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
Georgi Gerganov	06cbedfca1	sync : ggml ggml-ci	2025-06-20 21:02:47 +03:00
Georgi Gerganov	d03172cc79	sync : ggml ggml-ci	2025-06-18 09:59:21 +03:00
Aman Gupta	2e42be42bd	compare-llama-bench: add option to plot (#14169 ) * compare llama-bench: add option to plot * Address review comments: convert case + add type hints * Add matplotlib to requirements * fix tests * Improve comment and fix assert condition for test * Add back default test_name, add --plot_log_scale * use log_scale regardless of x_values	2025-06-14 10:34:20 +02:00
Georgi Gerganov	ae92c1855b	sync : ggml ggml-ci	2025-06-10 18:39:33 +03:00
Georgi Gerganov	b8e2194efc	sync : ggml ggml-ci	2025-06-10 09:21:56 +03:00
Georgi Gerganov	f3a4b1659c	sync : ggml ggml-ci	2025-06-01 13:43:57 +03:00
Georgi Gerganov	53f925074d	sync : vendor (#13901 ) * sync : vendor ggml-ci * cont : fix httplib version ggml-ci * cont : fix lint * cont : fix lint * vendor : move to common folder /vendor ggml-ci * cont : fix lint * cont : move httplib to /vendor + use json_fwd.hpp ggml-ci * cont : fix server build ggml-ci * cont : add missing headers ggml-ci * cont : header clean-up ggml-ci	2025-05-30 16:25:45 +03:00
Georgi Gerganov	1c49c70d07	sync : ggml	2025-05-27 18:05:33 +03:00
Georgi Gerganov	a26c4cc11e	scripts : add option to compare commits in Debug (#13806 ) * scripts : add option to compare commits in Debug * cont : reuse existing CMAKE_OPTS	2025-05-26 22:24:01 +03:00
Olivier Chafik	f5cd27b71d	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 ) * add common_json w/ support for truncated json healing * add common_chat_msg_diff * partial common_chat_parse * refactor parser w/ optionals * server: wire chat diffs in stream mode * fix trigger of thinking models (must happen after thoughts are closed) * fix functionary v3.2 raw python! * rename: common_chat_syntax (now contains format) * rm common_regex.at_start * don't return empty <think></think> * accommodate yet another deepseek r1 distill fantasy syntax (`<｜tool▁calls｜>`) * fix QwQ 32B tool call parsing after thoughts (hermes2) * better logs for grammar triggers * consume spaces after parse_json_tool_calls * fix required tool calls w/ thinking models that have pre-opened thinking tags * fix thinking model's initial trigger + test qwq's template * run most test_tool_call tests in stream + non-stream modes * make functionary v3.2 parsing more strict (differentiate first match from others) * send final diff from server, to close off raw python arguments * support partial content streaming in Generic mode * tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5) * Update function-calling.md * Update tool_bench.py * chat-parser: remove input from exception (llm output may contain PII) --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>	2025-05-25 01:48:08 +01:00
Georgi Gerganov	d30cb5a7fa	sync : ggml ggml-ci	2025-05-19 13:29:56 +03:00
Sigbjørn Skjæret	be1d4a13db	scripts : fix compare-llama-bench.py show parameter (#13514 )	2025-05-14 08:41:01 +02:00
Sigbjørn Skjæret	bf79371120	scripts : support arbitrary input file formats in compare-llama-bench.py (#13455 )	2025-05-13 15:31:12 +02:00
Georgi Gerganov	1e2809bc4b	sync : ggml	2025-05-13 14:02:28 +03:00
Sigbjørn Skjæret	09232370fc	scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451 )	2025-05-11 16:20:39 +02:00
Georgi Gerganov	d879433824	sync : ggml ggml-ci	2025-05-07 17:28:36 +03:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00
Georgi Gerganov	b34443923c	sync : ggml (#13268 ) * vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204) * vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW) * review: remove src_x/y < 0 checks; add performance tests * sync : ggml ggml-ci * vulkan : fix lint (#0) --------- Co-authored-by: Acly <aclysia@gmail.com>	2025-05-02 20:54:30 +03:00
Georgi Gerganov	b1dd4d08e8	sync : ggml ggml-ci	2025-05-01 20:15:34 +03:00
Georgi Gerganov	8d33d740c3	sync : ggml	2025-05-01 10:00:39 +03:00
Johannes Gäßler	19e899ce21	scripts: n_depth for compare-llama-bench [no ci] (#13201 )	2025-04-29 23:32:04 +02:00
Georgi Gerganov	63b4911494	sync : ggml ggml-ci	2025-04-24 17:32:47 +03:00
Georgi Gerganov	526739b879	sync : ggml ggml-ci	2025-04-14 09:26:15 +03:00
Georgi Gerganov	47ba87d0a4	sync : ggml	2025-04-11 00:17:47 +03:00
Georgi Gerganov	eb420e1148	sync : ggml ggml-ci	2025-04-11 00:17:47 +03:00
Georgi Gerganov	e4bf72d631	scripts : fix sync-ggml-am.sh	2025-04-11 00:17:47 +03:00
Georgi Gerganov	a4e46e28f9	sync : ggml ggml-ci	2025-04-07 18:44:17 +03:00
Georgi Gerganov	0114a32da0	sync : ggml ggml-ci	2025-03-31 15:07:32 +03:00
Georgi Gerganov	d3f1f0acfb	sync : ggml ggml-ci	2025-03-30 08:33:31 +03:00
Georgi Gerganov	029c693fdc	sync : ggml ggml-ci	2025-03-27 10:09:29 +02:00
Georgi Gerganov	771d84371c	scripts : update sync + fix cmake merge ggml-ci	2025-03-27 10:09:29 +02:00
Georgi Gerganov	df0665a483	sync : ggml ggml-ci	2025-03-27 09:04:38 +02:00
Georgi Gerganov	102ac1891d	sync : ggml ggml-ci	2025-03-07 14:49:44 +02:00
Olivier Chafik	669912d9a5	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 ) * sampler: turn lazy grammar trigger words to regexes * add scripts/tool_bench.sh & .py * constrain llama json output regardless of function name if matches at beginning * update relaxed newline space rule in grammar tests * support add_generation_prompt query parameter (useful for /apply_template) * Update src/llama-grammar.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-05 13:05:13 +00:00
Daniel Bevenius	a057897ad4	llama : add xcframework build script (#11996 ) * llama : add xcframework build script This commit adds a script to build an XCFramework for Apple ios, macos, visionos, and tvos platforms. The generated XCFramework can then be added to a project and used in the same way as a regular framework. The llama.swiftui example project has been updated to use the XCFramework and can be started using the following command: ```console $ open examples/llama.swiftui/llama.swiftui.xcodeproj/ ``` Refs: https://github.com/ggml-org/llama.cpp/issues/10747 * examples : remove llama.cpp (source dir ref) from project.pbxproj This commit removes the reference to llama.cpp from the project.pbxproj file since Package.swift has been removed. * ci : updated build.yml to use build-xcframework.sh * ci : add xcframework build to github releases This commit adds the ability to create a GitHub release with the xcframework build artifact. * scripts : add apple app validation scripts This commit adds scripts that can validate the iOS, macOS, tvOS, and VisionOS applications. The scripts create a simple test app project, copy the llama.xcframework to the test project, build and archive the app, create an IPA from the archive, and validate the IPA using altool. The motivation for this is to provide some basic validation and hopefully avoid having to manually validate apps in Xcode. * llama : remove Package.swift This commit removes the Package.swift file, as we are now building an XCFramework for the project. * llama : remove Sources and spm-headers directories * llama : use TargetConditionals.h for visionOS/tvOS	2025-03-05 06:30:31 +01:00
Georgi Gerganov	dfd6b2c0be	sync : ggml ggml-ci	2025-03-03 18:18:11 +02:00

1 2 3 4 5 ...

255 Commits