Commit Graph

  • affe132f53 use row_split when Br >= 4, change reductions to use shared memory if row_split == 1 0cc4m/vulkan-fa-scalar-opt Ruben Ortlam 2026-02-05 12:51:59 +01:00
  • b828e18c75 docker : fix vulkan build (#19352) master Sigbjørn Skjæret 2026-02-05 11:10:39 +01:00
  • a4ea7a188f vendor : update BoringSSL to 0.20260204.0 (#19333) Adrien Gallouët 2026-02-05 09:53:35 +01:00
  • fd56915a9d cont : minor gg/metal-fa-mask-zero-opt Georgi Gerganov 2026-02-05 10:11:27 +02:00
  • 7a4f97d196 metal : add diag (#19330) b7946 Georgi Gerganov 2026-02-05 10:08:45 +02:00
  • a498c75ad1 vulkan: fix GPU deduplication logic. (#19222) b7945 Oleksandr Kuvshynov 2026-02-05 03:06:59 -05:00
  • 3409ab842d vulkan: Set k_load_shmem to false when K is too large (#19301) b7944 Jeff Bolz 2026-02-05 01:48:33 -06:00
  • c342c3b93d vulkan: fix non-contig rope (#19299) b7943 Jeff Bolz 2026-02-05 01:38:59 -06:00
  • af252d0758 metal : add missing includes (#19348) b7942 will-lms 2026-02-05 01:05:09 -05:00
  • 11fb327bf3 vendor : add missing llama_add_compile_flags (#19322) b7941 Sigbjørn Skjæret 2026-02-05 02:27:38 +01:00
  • e6e934c5ea vendor: update cpp-httplib version (#19313) b7940 Aaron Teo 2026-02-05 05:15:03 +08:00
  • b536eb0233 codeowners : add danbev for examples/debug (#19332) Daniel Bevenius 2026-02-04 20:20:40 +01:00
  • e0c93af2a0 debug: make common_debug_print_tensor readable (#19331) b7938 Xuan-Son Nguyen 2026-02-04 17:55:31 +01:00
  • 4815a66990 metal : skip loading all-zero mask Georgi Gerganov 2026-02-04 16:54:26 +02:00
  • 423bee462b ci : fix sanitize workflow to enable ggml sanitizers too (#19323) Georgi Gerganov 2026-02-04 15:12:03 +02:00
  • 8abcc70a74 model: (qwen3next) correct vectorized key_gdiff calculation (#19324) b7936 Xuan-Son Nguyen 2026-02-04 13:09:58 +01:00
  • 46c3bb1691 spec : check if the target context is compatible for spec decoding gg/spec-disable-for-recurrent Georgi Gerganov 2026-02-04 13:20:35 +02:00
  • 1f8d0c848b Revert "llama : add llama_memory_can_rm_suffix()" Georgi Gerganov 2026-02-04 13:11:50 +02:00
  • 8479be0ee5 cont : try fix python init gg/ci-server-add-metal Georgi Gerganov 2026-02-04 12:55:19 +02:00
  • eaba92c3dc tests : add non-cont, inplace rope tests (#19296) b7935 Georgi Gerganov 2026-02-04 12:45:21 +02:00
  • 6ab881b7c3 model-conversion : add tensor-info.py utility (#18954) Daniel Bevenius 2026-02-04 10:40:53 +01:00
  • d838c22bb3 spec : fix the check-rate logic of ngram-simple (#19261) b7933 Georgi Gerganov 2026-02-04 10:39:53 +02:00
  • 1213a03564 qwen3next : fix chunking gg/qwen3next-fix-chunking Georgi Gerganov 2026-02-04 10:06:38 +02:00
  • 25f40ca65f completion : simplify batch (embd) processing (#19286) b7932 Daniel Bevenius 2026-02-04 05:43:28 +01:00
  • 015deb9048 ggml-virtgpu: make the code thread safe (#19204) b7931 Kevin Pouget 2026-02-04 03:46:18 +01:00
  • 2ceda3f662 ggml-cpu: use LUT for converting e8->f32 scales on x86 (#19288) b7930 Aman Gupta 2026-02-04 09:43:29 +08:00
  • 44008ce8f9 metal : add solve_tri (#19302) b7929 Georgi Gerganov 2026-02-03 23:43:14 +02:00
  • 6a9bf2f788 ci : add sanitizer runs for server (#19291) b7928 Georgi Gerganov 2026-02-03 22:41:20 +02:00
  • faa1bc26ee sampling : delegate input allocation to the scheduler (#19266) b7927 Georgi Gerganov 2026-02-03 22:16:16 +02:00
  • 32b17abdb0 vulkan: disable coopmat1 fa on Nvidia Turing (#19290) b7926 Ruben Ortlam 2026-02-03 17:37:32 +01:00
  • 8bece2eb20 CUDA: use mmvq for mul-mat-id for small batch sizes (#18958) b7925 Aman Gupta 2026-02-03 23:31:23 +08:00
  • a6fd8ca1fe models : remove unnecessary cont in openelm (#19289) b7924 Sigbjørn Skjæret 2026-02-03 14:20:57 +01:00
  • 617ec152e1 ci : add metal server workflows Georgi Gerganov 2026-02-03 14:15:45 +02:00
  • c55bce4159 metal : minor cleanup (#19251) b7923 Georgi Gerganov 2026-02-03 13:43:29 +02:00
  • 7f58ccaed4 graph : compute backend samplers only if needed Georgi Gerganov 2026-02-03 13:00:43 +02:00
  • 1f1e57f2bf CUDA: Fix loop unrolling for BW in mul_mat_q_stream_k_fixup (#19053) b7922 Oliver Simons 2026-02-03 11:33:14 +01:00
  • 343d285b98 split rows inside of subgroups for faster synchronization Ruben Ortlam 2026-02-03 09:08:35 +01:00
  • d4a3bac6be vulkan: allow using fp16 in coopmat1 flash attention shader Ruben Ortlam 2026-02-03 07:59:55 +01:00
  • e9a859db3c ggml: added cleanups in ggml_quantize_free (#19278) b7921 George 2026-02-03 08:43:39 +02:00
  • 41e3f02647 cuda : revert CUDA_SCALE_LAUNCH_QUEUES override until investigated (#19227) b7920 Gaurav Garg 2026-02-03 12:11:02 +05:30
  • 1efb5f7ae1 vocab: add Falcon-H1-Tiny-Coder FIM tokens (#19249) b7919 Alexey Dubrov 2026-02-03 09:31:01 +03:00
  • aeb827a3cc spec : simplify time measurement using common_time_meas (#19262) b7918 Georgi Gerganov 2026-02-03 08:20:15 +02:00
  • 91ea44e89b opencl: refactor some ops, concat, repeat, tanh and scale (#19226) b7917 lhez 2026-02-02 15:54:43 -08:00
  • 3754239e43 eval : support multiple dataset runs gg/scripts-eval Georgi Gerganov 2026-02-02 22:34:25 +02:00
  • 0dfcd3b607 jinja : add missing 'in' test to template engine (#19004) (#19239) Sid Mohan 2026-02-02 12:00:55 -08:00
  • 07a7412a3b mtmd: add min/max pixels gguf metadata (#19273) Xuan-Son Nguyen 2026-02-02 20:59:06 +01:00
  • c965abbe6e sim : fix answer matching Georgi Gerganov 2026-02-02 19:45:04 +02:00
  • 9f682fb640 ggml-cpu: FA split across kv for faster TG (#19209) Aman Gupta 2026-02-03 01:19:55 +08:00
  • 98e9eabbf4 test : fix path Georgi Gerganov 2026-02-02 19:13:37 +02:00
  • a3fa035822 server: print actual model name in 'model not found" error (#19117) b7913 Matthieu Coudron 2026-02-02 16:55:27 +01:00
  • 15818ac44c ci: add test-backend-ops test for CPU (#19268) Aman Gupta 2026-02-02 22:40:28 +08:00
  • d30e59b62a llama : add llama_memory_can_rm_suffix() Georgi Gerganov 2026-02-02 15:18:01 +02:00
  • bf38346d13 Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nvidia & AMD GPU is unavailable: download/installation channels are out of work. (#19246) b7911 Neo Zhang 2026-02-02 21:06:21 +08:00
  • 4d5e972673 sycl: implement GGML_OP_TOP_K (#19242) b7910 Tamar 2026-02-02 15:05:51 +02:00
  • 6fdddb4987 metal : support virtual devices (#18919) b7909 Georgi Gerganov 2026-02-02 14:29:44 +02:00
  • e0d4d45666 sampling : delegate input allocation to the scheduler Georgi Gerganov 2026-02-02 14:22:59 +02:00
  • 6156ae5111 model-conversion : add debug option to conversion script (#19265) Daniel Bevenius 2026-02-02 11:29:57 +01:00
  • 59377a6c87 ggml-backend: fix async set/get fallback sync (#19179) b7907 Johannes Gäßler 2026-02-02 10:00:05 +01:00
  • 1239267cc4 authors : update (#19263) Georgi Gerganov 2026-02-02 08:51:25 +02:00
  • 7a4ca3cbd9 docs : Minor cleanups (#19252) b7905 Christian Kastner 2026-02-02 07:38:55 +01:00
  • b4d05a3d2f spec : various improvements ton ngram-map + docs (#19253) b7904 Sascha Rogmann 2026-02-02 07:26:58 +01:00
  • 2dc3ce2166 Remove pipeline cache mutexes (#19195) b7903 Nikhil Jain 2026-02-01 18:47:29 -08:00
  • 3bc8d2cf23 Bump cmake max version (needed for Windows on Snapdragon builds) (#19188) b7902 Max Krasnyansky 2026-02-01 14:13:38 -08:00
  • 8a98ba4582 nix: fix allowUnfreePredicate for packages with multiple licenses (#19237) Alexis Williams 2026-02-01 12:10:48 -08:00
  • 2634ed207a create test.sh to enhance the parameters for testing, update the guide, rm useless script (#19243) Neo Zhang 2026-02-01 18:24:00 +08:00
  • f61e6af1cf eval : add prompts Georgi Gerganov 2026-01-31 22:37:57 +02:00
  • bb58f1e67d eval : print progress Georgi Gerganov 2026-01-31 19:33:37 +02:00
  • b7786174b6 examples: add task summary table to llama-eval-new.py Georgi Gerganov 2026-01-31 18:58:27 +02:00
  • 41ea26144e nix: fix nix develop .#python-scripts (#19218) b7899 Matthieu Coudron 2026-01-31 17:01:46 +01:00
  • fc541d0532 docs: update llama-eval-discussion.md with threading and model parameter updates Georgi Gerganov 2026-01-31 16:58:36 +02:00
  • ce6d66b0c4 examples: add threading support and model parameter to llama-eval-new.py Georgi Gerganov 2026-01-31 16:56:56 +02:00
  • 1e79722596 docs: update llama-eval-discussion.md with session work summary Georgi Gerganov 2026-01-31 16:41:55 +02:00
  • fbccf28275 examples: use cached dataset path in simulator to avoid HF Hub requests Georgi Gerganov 2026-01-31 16:39:51 +02:00
  • 43d9ba7c93 examples: use cached dataset path to avoid HF Hub requests Georgi Gerganov 2026-01-31 16:38:46 +02:00
  • c00cd35d92 examples: remove HF_HUB_OFFLINE to allow dataset download Georgi Gerganov 2026-01-31 16:33:45 +02:00
  • eb55a20d58 examples: use HF_HUB_OFFLINE to avoid HF Hub warnings Georgi Gerganov 2026-01-31 16:32:39 +02:00
  • 12fe3d2f34 examples: implement flexible grader system for answer validation Georgi Gerganov 2026-01-31 16:31:46 +02:00
  • 316f043a04 docs: remove README.md from llama-eval Georgi Gerganov 2026-01-31 16:17:43 +02:00
  • b441963b11 examples: add simplified llama-eval-new.py for AIME evaluation Georgi Gerganov 2026-01-31 16:17:06 +02:00
  • 1dcc180095 docs: update llama-eval-discussion.md with session work summary Georgi Gerganov 2026-01-31 15:49:43 +02:00
  • f3582a6630 examples: refactor test-simulator.sh for better readability Georgi Gerganov 2026-01-31 15:45:47 +02:00
  • 4a6e59c363 examples: add llama-server simulator for testing eval scripts Georgi Gerganov 2026-01-31 15:37:31 +02:00
  • 5b01d8575d examples : add compare-mlx gg/compare-mlx Georgi Gerganov 2025-08-30 16:08:00 +03:00
  • 89f10baad5 ggml-hexagon: flash-attention and reduce-sum optimizations (#19141) b7898 nullname 2026-01-31 13:14:20 +08:00
  • 3dd95914d0 quantize: add option --tensor-type-file to llama-quantize (#18572) b7897 EugeoSynthesisThirtyTwo 2026-01-31 04:39:21 +01:00
  • ec6c7421e4 mtmd: support MiniCPM-o 4.5(vision only) (#19211) b7896 tc-mb 2026-01-31 06:19:30 +08:00
  • 1488339138 lookup, lookahead: fix crash when n_ctx not specified (#18729) b7895 Daniele Pinna 2026-01-30 21:10:24 +01:00
  • 4927795810 ngram-mod : fix build [no ci] (#19216) Georgi Gerganov 2026-01-30 21:27:27 +02:00
  • 971facc38e opencl: add optimized q8_0 mm kernel for adreno (#18871) shaofeiqi 2026-01-30 10:19:27 -08:00
  • d9a2a4bcaa sync : ggml Georgi Gerganov 2026-01-30 16:27:14 +02:00
  • dfd6106c84 cuda : fix compile warnings (whisper/0) Georgi Gerganov 2026-01-30 15:56:15 +02:00
  • bbada8bfb9 server : wrap around the "id_slot" parameter (#19207) Georgi Gerganov 2026-01-30 19:46:10 +02:00
  • 13f3ebfae1 Correctly fetch q8_1 quantize pipeline in test as needed by 8a3519b (#19194) Simon Redman 2026-01-30 11:27:16 -05:00
  • dabaa2e77a spec : add ngram-mod (#19164) Georgi Gerganov 2026-01-30 18:21:48 +02:00
  • 2e916f996a jinja : add unordered_map include to value.h [no ci] (#19205) Marcello Seri 2026-01-30 16:09:44 +01:00
  • f3bc98890c memory : clarify comments for r_l and s_l tensors [no ci] (#19203) Daniel Bevenius 2026-01-30 15:18:41 +01:00
  • c3b87cebff tests : add GQA=20 FA test (#19095) b7885 Georgi Gerganov 2026-01-30 13:52:57 +02:00
  • 0562503154 convert : add missing return statement for GraniteMoeModel (#19202) Daniel Bevenius 2026-01-30 11:12:53 +01:00
  • 83bcdf7217 memory : remove unused tmp_buf (#19199) b7883 Daniel Bevenius 2026-01-30 10:37:06 +01:00
  • b316895ff9 docs: Add LlamaLib to UI projects (#19181) Antonis Makropoulos 2026-01-30 08:54:28 +02:00