Compare commits

...

13 Commits

Author SHA1 Message Date
Francis Couture-Harpin
a5b1943912 ggml-quants : fix some edge cases in make_qkxh_nl_quants 2025-03-23 17:59:37 -04:00
Francis Couture-Harpin
8b8b88f3de ggml-quants : restore Q2_K use of make_qp_quants
Weirdly, it seems like in practice replacing this instance is not better.
This is probably because of its interaction with make_qkx3_quants.
2025-03-22 18:47:56 -04:00
Francis Couture-Harpin
a41139723d Merge branch 'master' into compilade/optimal-rounding 2025-03-22 15:05:11 -04:00
Francis Couture-Harpin
af23abd3cb ggml-quants : remove slower qsort-based cumulative search 2025-03-22 12:08:42 -04:00
Francis Couture-Harpin
3e4b675c9f ggml-quants : use a max-heap for TQ1_0 and TQ2_0 quantization 2025-03-22 12:03:37 -04:00
Francis Couture-Harpin
f86b8ff210 ggml-quants : use qkxh in more places 2025-03-21 14:05:58 -04:00
Francis Couture-Harpin
3be115100f ggml-quants : use a max-heap for linear quants like Q3_K
Slightly faster than the previous method.
2025-03-20 19:21:45 -04:00
Francis Couture-Harpin
30ad9c2873 ggml-quants : faster exhaustive IQ4_NL rounding with k_heap 2025-03-15 12:57:44 -04:00
Francis Couture-Harpin
0c9e442489 ggml-quants : remove some commented code 2025-03-15 10:29:47 -04:00
Francis Couture-Harpin
f27c1afc40 ggml-quants : improve TQ2_0 imatrix 2025-03-07 12:54:56 -05:00
Francis Couture-Harpin
6f7fe74946 ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0 2025-02-21 18:47:09 -05:00
Francis Couture-Harpin
d0060fc498 ggml-quants : better and faster make_qkxs_quants 2025-02-21 15:11:21 -05:00
Francis Couture-Harpin
dd6b8408c9 ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K 2025-02-21 13:49:18 -05:00

File diff suppressed because it is too large Load Diff