Comparing ba932dfb50...a5b1943912 - llama.cpp - Gitea: Git with a cup of tea

nwpie/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2026-04-23 16:37:33 +03:00

Author	SHA1	Message	Date
Francis Couture-Harpin	a5b1943912	ggml-quants : fix some edge cases in make_qkxh_nl_quants	2025-03-23 17:59:37 -04:00
Francis Couture-Harpin	8b8b88f3de	ggml-quants : restore Q2_K use of make_qp_quants Weirdly, it seems like in practice replacing this instance is not better. This is probably because of its interaction with make_qkx3_quants.	2025-03-22 18:47:56 -04:00
Francis Couture-Harpin	a41139723d	Merge branch 'master' into compilade/optimal-rounding	2025-03-22 15:05:11 -04:00
Francis Couture-Harpin	af23abd3cb	ggml-quants : remove slower qsort-based cumulative search	2025-03-22 12:08:42 -04:00
Francis Couture-Harpin	3e4b675c9f	ggml-quants : use a max-heap for TQ1_0 and TQ2_0 quantization	2025-03-22 12:03:37 -04:00
Francis Couture-Harpin	f86b8ff210	ggml-quants : use qkxh in more places	2025-03-21 14:05:58 -04:00
Francis Couture-Harpin	3be115100f	ggml-quants : use a max-heap for linear quants like Q3_K Slightly faster than the previous method.	2025-03-20 19:21:45 -04:00
Francis Couture-Harpin	30ad9c2873	ggml-quants : faster exhaustive IQ4_NL rounding with k_heap	2025-03-15 12:57:44 -04:00
Francis Couture-Harpin	0c9e442489	ggml-quants : remove some commented code	2025-03-15 10:29:47 -04:00
Francis Couture-Harpin	f27c1afc40	ggml-quants : improve TQ2_0 imatrix	2025-03-07 12:54:56 -05:00
Francis Couture-Harpin	6f7fe74946	ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0	2025-02-21 18:47:09 -05:00
Francis Couture-Harpin	d0060fc498	ggml-quants : better and faster make_qkxs_quants	2025-02-21 15:11:21 -05:00
Francis Couture-Harpin	dd6b8408c9	ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K	2025-02-21 13:49:18 -05:00

1 changed files with 804 additions and 93 deletions

897

ggml/src/ggml-quants.c

View File

File diff suppressed because it is too large Load Diff