llama.cpp/common
compilade 90083283ec
imatrix : use GGUF to store importance matrices (#9400)
* imatrix : allow processing multiple chunks per batch

* perplexity : simplify filling the batch

* imatrix : fix segfault when using a single chunk per batch

* imatrix : use GGUF to store imatrix data

* imatrix : fix conversion problems

* imatrix : use FMA and sort tensor names

* py : add requirements for legacy imatrix convert script

* perplexity : revert changes

* py : include imatrix converter requirements in toplevel requirements

* imatrix : avoid using designated initializers in C++

* imatrix : remove unused n_entries

* imatrix : allow loading mis-ordered tensors

Sums and counts tensors no longer need to be consecutive.

* imatrix : more sanity checks when loading multiple imatrix files

* imatrix : use ggml_format_name instead of std::string concatenation

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* quantize : use unused imatrix chunk_size with LLAMA_TRACE

* common : use GGUF for imatrix output by default

* imatrix : two-way conversion between old format and GGUF

* convert : remove imatrix to gguf python script

* imatrix : use the function name in more error messages

* imatrix : don't use FMA explicitly

This should make comparisons between the formats easier
because this matches the behavior of the previous version.

* imatrix : avoid returning from void function save_imatrix

* imatrix : support 3d tensors with MUL_MAT

* quantize : fix dataset name loading from gguf imatrix

* common : move string_remove_suffix from quantize and imatrix

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* imatrix : add warning when legacy format is written

* imatrix : warn when writing partial data, to help guess dataset coverage

Also make the legacy format store partial data
by using neutral values for missing data.
This matches what is done at read-time for the new format,
and so should get the same quality in case the old format is still used.

* imatrix : avoid loading model to convert or combine imatrix

* imatrix : avoid using imatrix.dat in README

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-19 12:51:22 -04:00
..
CMakeLists.txt cmake : do not search for curl libraries by ourselves (#14613) 2025-07-10 15:29:05 +03:00
arg.cpp llama : add high-throughput mode (#14363) 2025-07-16 16:35:42 +03:00
arg.h common : add common_remote_get_content (#13123) 2025-04-26 22:58:12 +02:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-parser.cpp llama-chat : Do not throw when tool parsing fails (#14012) 2025-06-14 17:25:15 +01:00
chat-parser.h llama-chat : Do not throw when tool parsing fails (#14012) 2025-06-14 17:25:15 +01:00
chat.cpp server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196) 2025-06-29 20:02:53 +02:00
chat.h server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196) 2025-06-29 20:02:53 +02:00
common.cpp imatrix : use GGUF to store importance matrices (#9400) 2025-07-19 12:51:22 -04:00
common.h imatrix : use GGUF to store importance matrices (#9400) 2025-07-19 12:51:22 -04:00
console.cpp console : utf-8 fix for windows stdin (#9690) 2024-09-30 11:23:42 +03:00
console.h gguf : new file format with flexible meta data (beta) (#2398) 2023-08-21 23:07:43 +03:00
json-partial.cpp sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-partial.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp common : use std::string_view now that we target c++17 (#14319) 2025-06-22 08:37:43 +03:00
json-schema-to-grammar.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
llguidance.cpp llguidance : set tokenizer slices to default (#13424) 2025-05-10 17:19:52 +02:00
log.cpp Fix: Compile failure due to Microsoft STL breaking change (#11836) 2025-02-12 21:36:11 +01:00
log.h cleanup: fix compile warnings associated with gnu_printf (#11811) 2025-02-12 10:06:53 -04:00
ngram-cache.cpp ggml : portability fixes for VS 2017 (#12150) 2025-03-04 18:53:26 +02:00
ngram-cache.h llama : use LLAMA_TOKEN_NULL (#11062) 2025-01-06 10:52:15 +02:00
regex-partial.cpp `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp `server`: streaming of tool calls and thoughts when `--jinja` is on (#12379) 2025-05-25 01:48:08 +01:00
sampling.h sampling : support for llguidance grammars (#10224) 2025-02-02 09:55:32 +02:00
speculative.cpp llama : deprecate llama_kv_self_ API (#14030) 2025-06-06 14:11:15 +03:00
speculative.h speculative : update default params (#11954) 2025-02-19 13:29:42 +02:00