Compare commits

..

26 Commits

Author SHA1 Message Date
0cc4m
722f9defe9 vulkan: intel mmv fix attempt 2025-11-23 10:13:19 +01:00
Jeff Bolz
54d83bbe85 vulkan: remove a couple unnecessary switches (#17419) 2025-11-23 06:29:40 +01:00
Adrien Gallouët
4949ac0f18 ci : switch to BoringSSL on Server workflow (#17441)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-22 21:38:19 +01:00
Masato Nakasaka
3f3a4fb9c3 Revive MUL_MAT_ID to perf testing (#17397) 2025-11-22 10:55:43 +01:00
yulo
028f93ef98 HIP: RDNA4 tensor core support for MMF (#17077)
* mmf for rdna4

* align the padding for rdna4

* forbit mul_mat_f for rdna4

* fix as comment

* remove device kernels

* add constexpr for early return

* update based on review comment

* change based on the review comment

* pass compile error

* keep code consistency

---------

Co-authored-by: zhang hui <you@example.com>
2025-11-22 00:03:24 +01:00
lhez
8e9ddba610 opencl: refine condition for kqv mm (#17392) 2025-11-21 14:34:48 -08:00
ubergarm
23bc779a6e model : detect GigaChat3-10-A1.8B as deepseek lite (#17420)
* Detect GigaChat3-10-A1.8B as deepseek lite

Hardcodes checking number of layers to detect if lite version of deepseek.

* Add commnent identifying deepseek lite variants

deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B
2025-11-21 14:51:38 +01:00
Adrien Gallouët
28175f857d cmake : add option to build and link BoringSSL (#17205)
* cmake: add option to build and link BoringSSL

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : fix typo

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : disable boringssl test and asm by default

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : skip bssl

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : disable fips

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : fix cmake --install

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* ci : use boringssl for windows and mac

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-21 11:46:45 +01:00
Adrien Gallouët
9cc4080441 ci : start using OpenSSL (#17235)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-21 11:45:00 +01:00
Jeff Bolz
f1ffbba68e vulkan: disable async for older Intel devices (#17369)
* vulkan: disable async for older Intel devices

* update detection logic

* use name string for detection
2025-11-21 09:58:17 +01:00
Raul Torres
2370665e56 CANN: Refactor evaluate_and_capture_cann_graph (#17333)
* CANN: Refactor `evaluate_and_capture_cann_graph`

**Description of the problem**

* `matched_graph` is obtained even if graph mode is disabled.
* End of graph capture and graph replay are unnecessarily placed in different `if` blocks.

**Proposed solution**

* Obtain `matched_graph` only if graph mode is enabled.
* Place end of graph capture and graph reply inside the same `if` block.
* Unify graph related comments.

* Remove trailing whitespace
2025-11-21 16:23:29 +08:00
nullname
21d31e0810 ggml-hexagon: fix swiglu failure at test-backend-ops (#17344)
* refactor: use hvx_vec_exp_fp32_guard_inf for overflow handling in hvx_exp_f32

* feat: add fast sigmoid function with overflow guard for fp32

* refactor: replace hvx_vec_inverse_fp32 with hvx_vec_inverse_fp32_guard_inf for improved overflow handling

* feat: enhance hvx_add_scalar_f32 with overflow handling using infinity guard

* wip

* add HVX_Vector_Alias

wip

* wip

* fix: improve handling of src1 tensor in glu_swiglu_fp32_per_thread function

* fix nc

* wip

* wip

* handle nan at inverse

* wip

* fix neg

* wip

* rename

* fix hvx_vec_inverse_fp32_guard_inf to handle infinity and NaN cases correctly

* wip

* fix hvx_vec_inverse_fp32_guard_inf to handle NaN cases correctly

* wip

* wip

* wip

* fix output sign
2025-11-20 15:45:05 -08:00
Daniel Han
dd0f321941 readme : add Unsloth exporting to GGUF in tools (#17411) 2025-11-20 20:07:36 +01:00
Xuan-Son Nguyen
054a45c3d3 grammar: fix regression caused by #17381 (#17412)
* grammar: fix regression caused by #17381

* more readable
2025-11-20 18:35:10 +01:00
Aleksander Grygier
4c91f2633f Improved file naming & structure for UI components (#17405)
* refactor: Component iles naming & structure

* chore: update webui build output

* refactor: Dialog titles + components namig

* chore: update webui build output

* refactor: Imports

* chore: update webui build output
2025-11-20 14:07:31 +01:00
Piotr Wilkin (ilintar)
92c0b387a9 grammar : fix integer overflow (#17381)
* Fix DoS / integer overflow

* Remove optional, use INT64_MAX instead as placeholder value (it's technically -1, so it fits :)

* White space

* Actually, since it's unsigned, use UINT64_MAX
2025-11-20 14:47:04 +02:00
Georgi Gerganov
2286a360ff sync : ggml 2025-11-20 14:10:44 +02:00
YangLe
1d321e592b metal : fix compile on macos 11 (whisper/3533) 2025-11-20 14:10:44 +02:00
Georgi Gerganov
196f5083ef common : more accurate sampling timing (#17382)
* common : more accurate sampling timing

* eval-callback : minor fixes

* cont : add time_meas impl

* cont : fix log msg [no ci]

* cont : fix multiple definitions of time_meas

* llama-cli : exclude chat template init from time measurement

* cont : print percentage of unaccounted time

* cont : do not reset timings
2025-11-20 13:40:10 +02:00
o7si
5088b435d4 convert : fix TypeError when loading base model remotely in convert_lora_to_gguf (#17385)
* fix: TypeError when loading base model remotely in convert_lora_to_gguf

* refactor: simplify base model loading using cache_dir from HuggingFace

* Update convert_lora_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* feat: add remote_hf_model_id to trigger lazy mode in LoRA converter

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-20 12:30:12 +01:00
Piotr Wilkin (ilintar)
845f200b28 ggml : Fix transposed SOLVE_TRI result (#17323)
* Did someone transpose the SOLVE_TRI result matrix? Perhaps...

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-20 12:58:21 +02:00
Scott Fudally
a7784a8b1d DGX Spark: UMA support (#17368)
* DGX Spark: UMA support

* Updates from PR feedback

* More PR feedback cleanup

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Remove trailing whitespace

* Update ggml/src/ggml-cuda/ggml-cuda.cu

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-20 12:32:02 +02:00
Adrien Gallouët
79bb743512 ggml : remove useless and error-prone variadic macros (#17399)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-20 11:18:27 +01:00
sudhiarm
3ae282a06f kleidiai: fix zero-size array declaration (#17240) 2025-11-20 11:45:49 +02:00
ixgbe
5be353ec4a ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (#17314)
* ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

* fix comment

* fix comment 2

---------

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-20 08:09:18 +02:00
Giuseppe Scrivano
7d77f07325 vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (#17319)
* vulkan: initialize array

* vulkan: implement ADD1

* vulkan: implement ARANGE

* vulkan: implement FILL

* vulkan: implement SOFTPLUS

* vulkan: implement STEP

* vulkan: implement ROUND

* vulkan: implement CEIL

* vulkan: implement FLOOR

* vulkan: implement TRUNC

* docs: update Vulkan ops

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-11-19 17:29:45 +01:00
82 changed files with 2209 additions and 1465 deletions

View File

@@ -69,13 +69,6 @@ jobs:
key: macOS-latest-cmake-arm64
evict-old-files: 1d
- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update
brew install curl
- name: Build
id: cmake_build
run: |
@@ -83,6 +76,8 @@ jobs:
cmake -B build \
-DCMAKE_BUILD_RPATH="@loader_path" \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_CURL=OFF \
-DLLAMA_BUILD_BORINGSSL=ON \
-DGGML_METAL_USE_BF16=ON \
-DGGML_METAL_EMBED_LIBRARY=OFF \
-DGGML_METAL_SHADER_DEBUG=ON \
@@ -110,13 +105,6 @@ jobs:
key: macOS-latest-cmake-x64
evict-old-files: 1d
- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update
brew install curl
- name: Build
id: cmake_build
run: |
@@ -126,6 +114,8 @@ jobs:
cmake -B build \
-DCMAKE_BUILD_RPATH="@loader_path" \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_CURL=OFF \
-DLLAMA_BUILD_BORINGSSL=ON \
-DGGML_METAL=OFF \
-DGGML_RPC=ON \
-DCMAKE_OSX_DEPLOYMENT_TARGET=13.3
@@ -151,13 +141,6 @@ jobs:
key: macOS-latest-cmake-arm64-webgpu
evict-old-files: 1d
- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update
brew install curl
- name: Dawn Dependency
id: dawn-depends
run: |
@@ -217,7 +200,7 @@ jobs:
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
python3 python3-pip python3-dev \
libjpeg-dev build-essential libcurl4-openssl-dev \
libjpeg-dev build-essential libssl-dev \
git-lfs
- name: Python Dependencies
@@ -238,6 +221,8 @@ jobs:
id: cmake_build
run: |
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DLLAMA_FATAL_WARNINGS=ON \
-DGGML_RPC=ON
cmake --build build --config Release -j $(nproc)
@@ -294,13 +279,15 @@ jobs:
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential libcurl4-openssl-dev
sudo apt-get install build-essential libssl-dev
- name: Build
id: cmake_build
if: ${{ matrix.sanitizer != 'THREAD' }}
run: |
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }}
@@ -311,6 +298,8 @@ jobs:
if: ${{ matrix.sanitizer == 'THREAD' }}
run: |
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \
@@ -335,7 +324,7 @@ jobs:
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential libcurl4-openssl-dev
sudo apt-get install build-essential libssl-dev
- name: Build
id: cmake_build
@@ -343,6 +332,8 @@ jobs:
mkdir build
cd build
cmake .. \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_LLGUIDANCE=ON
cmake --build . --config Release -j $(nproc)
@@ -373,12 +364,14 @@ jobs:
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential libcurl4-openssl-dev
sudo apt-get install build-essential libssl-dev
- name: Build
id: cmake_build
run: |
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DGGML_RPC=ON
cmake --build build --config Release -j $(nproc)
@@ -405,12 +398,14 @@ jobs:
- name: Dependencies
id: depends
run: |
sudo apt-get install -y glslc libvulkan-dev libcurl4-openssl-dev
sudo apt-get install -y glslc libvulkan-dev libssl-dev
- name: Configure
id: cmake_configure
run: |
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DGGML_BACKEND_DL=ON \
-DGGML_CPU_ALL_VARIANTS=ON \
@@ -440,7 +435,7 @@ jobs:
run: |
sudo add-apt-repository -y ppa:kisak/kisak-mesa
sudo apt-get update -y
sudo apt-get install -y build-essential mesa-vulkan-drivers libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev libcurl4-openssl-dev
sudo apt-get install -y build-essential mesa-vulkan-drivers libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev libssl-dev
- name: Get latest Vulkan SDK version
id: vulkan_sdk_version
@@ -466,6 +461,8 @@ jobs:
run: |
source ./vulkan_sdk/setup-env.sh
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DGGML_VULKAN=ON
cmake --build build --config Release -j $(nproc)
@@ -497,7 +494,7 @@ jobs:
run: |
sudo add-apt-repository -y ppa:kisak/kisak-mesa
sudo apt-get update -y
sudo apt-get install -y build-essential mesa-vulkan-drivers libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev libcurl4-openssl-dev
sudo apt-get install -y build-essential mesa-vulkan-drivers libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev libssl-dev
- name: Get latest Vulkan SDK version
id: vulkan_sdk_version
@@ -537,7 +534,10 @@ jobs:
id: cmake_build
run: |
export Dawn_DIR=dawn/lib64/cmake/Dawn
cmake -B build -DGGML_WEBGPU=ON
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DGGML_WEBGPU=ON
cmake --build build --config Release -j $(nproc)
- name: Test
@@ -560,7 +560,7 @@ jobs:
id: depends
run: |
sudo apt-get update
sudo apt-get install -y build-essential git cmake rocblas-dev hipblas-dev libcurl4-openssl-dev rocwmma-dev
sudo apt-get install -y build-essential git cmake rocblas-dev hipblas-dev libssl-dev rocwmma-dev
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
@@ -572,6 +572,8 @@ jobs:
id: cmake_build
run: |
cmake -B build -S . \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DCMAKE_HIP_COMPILER="$(hipconfig -l)/clang" \
-DGGML_HIP_ROCWMMA_FATTN=ON \
-DGGML_HIP=ON
@@ -590,7 +592,7 @@ jobs:
id: depends
run: |
apt-get update
apt-get install -y build-essential git cmake libcurl4-openssl-dev
apt-get install -y build-essential git cmake libssl-dev
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
@@ -602,6 +604,8 @@ jobs:
id: cmake_build
run: |
cmake -B build -S . \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DGGML_MUSA=ON
cmake --build build --config Release -j $(nproc)
@@ -626,7 +630,7 @@ jobs:
shell: bash
run: |
sudo apt update
sudo apt install intel-oneapi-compiler-dpcpp-cpp libcurl4-openssl-dev
sudo apt install intel-oneapi-compiler-dpcpp-cpp libssl-dev
- name: install oneAPI MKL library
shell: bash
@@ -648,6 +652,8 @@ jobs:
run: |
source /opt/intel/oneapi/setvars.sh
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx
@@ -674,7 +680,7 @@ jobs:
shell: bash
run: |
sudo apt update
sudo apt install intel-oneapi-compiler-dpcpp-cpp libcurl4-openssl-dev
sudo apt install intel-oneapi-compiler-dpcpp-cpp libssl-dev
- name: install oneAPI MKL library
shell: bash
@@ -696,6 +702,8 @@ jobs:
run: |
source /opt/intel/oneapi/setvars.sh
cmake -B build \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
@@ -722,12 +730,6 @@ jobs:
key: macOS-latest-cmake-ios
evict-old-files: 1d
- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update
- name: Build
id: cmake_build
run: |
@@ -759,12 +761,6 @@ jobs:
key: macOS-latest-cmake-tvos
evict-old-files: 1d
- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update
- name: Build
id: cmake_build
run: |
@@ -790,12 +786,6 @@ jobs:
id: checkout
uses: actions/checkout@v4
- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update
- name: Build
id: cmake_build
run: |
@@ -838,12 +828,6 @@ jobs:
name: llama-xcframework
path: build-apple/llama.xcframework/
- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update
- name: Build llama.cpp with CMake
id: cmake_build
run: |
@@ -995,21 +979,12 @@ jobs:
-DCMAKE_INSTALL_PREFIX="$env:RUNNER_TEMP/opencl-arm64-release"
cmake --build build-arm64-release --target install --config release
- name: libCURL
id: get_libcurl
uses: ./.github/actions/windows-setup-curl
with:
architecture: ${{ matrix.arch == 'x64' && 'win64' || 'win64a' }}
- name: Build
id: cmake_build
env:
CURL_PATH: ${{ steps.get_libcurl.outputs.curl_path }}
run: |
cmake -S . -B build ${{ matrix.defines }} `
-DCURL_LIBRARY="$env:CURL_PATH/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="$env:CURL_PATH/include"
-DLLAMA_CURL=OFF -DLLAMA_BUILD_BORINGSSL=ON
cmake --build build --config Release -j ${env:NUMBER_OF_PROCESSORS}
cp $env:CURL_PATH/bin/libcurl-*.dll build/bin/Release
- name: Add libopenblas.dll
id: add_libopenblas_dll
@@ -1053,7 +1028,7 @@ jobs:
DEBIAN_FRONTEND: noninteractive
run: |
apt update
apt install -y cmake build-essential ninja-build libgomp1 git libcurl4-openssl-dev
apt install -y cmake build-essential ninja-build libgomp1 git libssl-dev
- name: ccache
uses: ggml-org/ccache-action@v1.2.16
@@ -1064,10 +1039,12 @@ jobs:
- name: Build with CMake
run: |
cmake -S . -B build -G Ninja \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DLLAMA_FATAL_WARNINGS=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CUDA_ARCHITECTURES=89-real \
-DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined \
-DLLAMA_FATAL_WARNINGS=ON \
-DGGML_NATIVE=OFF \
-DGGML_CUDA=ON
cmake --build build
@@ -1101,25 +1078,20 @@ jobs:
run: |
choco install ninja
- name: libCURL
id: get_libcurl
uses: ./.github/actions/windows-setup-curl
- name: Build
id: cmake_build
shell: cmd
env:
CURL_PATH: ${{ steps.get_libcurl.outputs.curl_path }}
run: |
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
cmake -S . -B build -G "Ninja Multi-Config" ^
-DLLAMA_BUILD_SERVER=ON ^
-DLLAMA_CURL=OFF ^
-DLLAMA_BUILD_BORINGSSL=ON ^
-DGGML_NATIVE=OFF ^
-DGGML_BACKEND_DL=ON ^
-DGGML_CPU_ALL_VARIANTS=ON ^
-DGGML_CUDA=ON ^
-DGGML_RPC=ON ^
-DCURL_LIBRARY="%CURL_PATH%/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="%CURL_PATH%/include"
-DGGML_RPC=ON
set /A NINJA_JOBS=%NUMBER_OF_PROCESSORS%-1
cmake --build build --config Release -j %NINJA_JOBS% -t ggml
cmake --build build --config Release
@@ -1151,7 +1123,7 @@ jobs:
run: |
scripts/install-oneapi.bat $WINDOWS_BASEKIT_URL $WINDOWS_DPCPP_MKL
# TODO: add libcurl support ; we will also need to modify win-build-sycl.bat to accept user-specified args
# TODO: add ssl support ; we will also need to modify win-build-sycl.bat to accept user-specified args
- name: Build
id: cmake_build
@@ -1208,14 +1180,8 @@ jobs:
key: ${{ github.job }}
evict-old-files: 1d
- name: libCURL
id: get_libcurl
uses: ./.github/actions/windows-setup-curl
- name: Build
id: cmake_build
env:
CURL_PATH: ${{ steps.get_libcurl.outputs.curl_path }}
run: |
$env:HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path)
$env:CMAKE_PREFIX_PATH="${env:HIP_PATH}"
@@ -1224,11 +1190,12 @@ jobs:
-DCMAKE_CXX_COMPILER="${env:HIP_PATH}\bin\clang++.exe" `
-DCMAKE_CXX_FLAGS="-I$($PWD.Path.Replace('\', '/'))/opt/rocm-${{ env.ROCM_VERSION }}/include/" `
-DCMAKE_BUILD_TYPE=Release `
-DLLAMA_CURL=OFF `
-DLLAMA_BUILD_BORINGSSL=ON `
-DROCM_DIR="${env:HIP_PATH}" `
-DGGML_HIP=ON `
-DGGML_HIP_ROCWMMA_FATTN=ON `
-DGGML_RPC=ON `
-DCURL_LIBRARY="$env:CURL_PATH/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="$env:CURL_PATH/include"
-DGGML_RPC=ON
cmake --build build -j ${env:NUMBER_OF_PROCESSORS}
ios-xcode-build:

View File

@@ -56,7 +56,7 @@ jobs:
curl \
wget \
language-pack-en \
libcurl4-openssl-dev
libssl-dev
- name: Clone
id: checkout
@@ -242,7 +242,7 @@ jobs:
curl \
wget \
language-pack-en \
libcurl4-openssl-dev
libssl-dev
- name: Clone
id: checkout
@@ -283,6 +283,8 @@ jobs:
run: |
cmake -B build \
-DGGML_NATIVE=OFF \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DLLAMA_BUILD_SERVER=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON \
@@ -295,6 +297,8 @@ jobs:
run: |
cmake -B build \
-DGGML_NATIVE=OFF \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DLLAMA_BUILD_SERVER=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON ;
@@ -306,6 +310,8 @@ jobs:
run: |
cmake -B build \
-DGGML_NATIVE=OFF \
-DLLAMA_CURL=OFF \
-DLLAMA_OPENSSL=ON \
-DLLAMA_BUILD_SERVER=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} ;
cmake --build build --config ${{ matrix.build_type }} -j $(nproc) --target llama-server
@@ -345,16 +351,10 @@ jobs:
fetch-depth: 0
ref: ${{ github.event.inputs.sha || github.event.pull_request.head.sha || github.sha || github.head_ref || github.ref_name }}
- name: libCURL
id: get_libcurl
uses: ./.github/actions/windows-setup-curl
- name: Build
id: cmake_build
env:
CURL_PATH: ${{ steps.get_libcurl.outputs.curl_path }}
run: |
cmake -B build -DCURL_LIBRARY="$env:CURL_PATH/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="$env:CURL_PATH/include"
cmake -B build -DLLAMA_CURL=OFF -DLLAMA_BUILD_BORINGSSL=ON
cmake --build build --config Release -j ${env:NUMBER_OF_PROCESSORS} --target llama-server
- name: Python setup
@@ -368,13 +368,6 @@ jobs:
run: |
pip install -r tools/server/tests/requirements.txt
- name: Copy Libcurl
id: prepare_libcurl
env:
CURL_PATH: ${{ steps.get_libcurl.outputs.curl_path }}
run: |
cp $env:CURL_PATH/bin/libcurl-x64.dll ./build/bin/Release/libcurl-x64.dll
- name: Tests
id: server_integration_tests
if: ${{ !matrix.disabled_on_pr || !github.event.pull_request }}

View File

@@ -242,6 +242,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
- [crashr/gppm](https://github.com/crashr/gppm) launch llama.cpp instances utilizing NVIDIA Tesla P40 or P100 GPUs with reduced idle power consumption
- [gpustack/gguf-parser](https://github.com/gpustack/gguf-parser-go/tree/main/cmd/gguf-parser) - review/check the GGUF file and estimate the memory usage
- [Styled Lines](https://marketplace.unity.com/packages/tools/generative-ai/styled-lines-llama-cpp-model-292902) (proprietary licensed, async wrapper of inference part for game development in Unity3d with pre-built Mobile and Web platform wrappers and a model example)
- [unslothai/unsloth](https://github.com/unslothai/unsloth) 🦥 exports/saves fine-tuned and trained models to GGUF (Apache-2.0)
</details>

View File

@@ -26,7 +26,6 @@
#include <sstream>
#include <string>
#include <thread>
#include <unordered_map>
#include <unordered_set>
#include <vector>
@@ -60,6 +59,14 @@
#pragma warning(disable: 4244 4267) // possible loss of data
#endif
common_time_meas::common_time_meas(int64_t & t_acc, bool disable) : t_start_us(disable ? -1 : ggml_time_us()), t_acc(t_acc) {}
common_time_meas::~common_time_meas() {
if (t_start_us >= 0) {
t_acc += ggml_time_us() - t_start_us;
}
}
//
// CPU utils
//

View File

@@ -2,17 +2,15 @@
#pragma once
#include "ggml-opt.h"
#include "llama-cpp.h"
#include <set>
#include <sstream>
#include <string>
#include <string_view>
#include <vector>
#include <map>
#include <sstream>
#include <cmath>
#include "ggml-opt.h"
#include "llama-cpp.h"
#ifdef _WIN32
#define DIRECTORY_SEPARATOR '\\'
@@ -30,6 +28,15 @@
#define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf"
struct common_time_meas {
common_time_meas(int64_t & t_acc, bool disable = false);
~common_time_meas();
const int64_t t_start_us;
int64_t & t_acc;
};
struct common_adapter_lora_info {
std::string path;
float scale;

View File

@@ -3,9 +3,10 @@
#include "common.h"
#include "log.h"
#include <cmath>
#include <unordered_map>
#include <algorithm>
#include <cmath>
#include <cstring>
#include <unordered_map>
// the ring buffer works similarly to std::deque, but with a fixed capacity
// TODO: deduplicate with llama-impl.h
@@ -112,6 +113,13 @@ struct common_sampler {
llama_token_data_array cur_p;
void reset() {
prev.clear();
llama_sampler_reset(grmr);
llama_sampler_reset(chain);
}
void set_logits(struct llama_context * ctx, int idx) {
const auto * logits = llama_get_logits_ith(ctx, idx);
@@ -128,6 +136,12 @@ struct common_sampler {
cur_p = { cur.data(), cur.size(), -1, false };
}
common_time_meas tm() {
return common_time_meas(t_total_us, params.no_perf);
}
mutable int64_t t_total_us = 0;
};
std::string common_params_sampling::print() const {
@@ -298,6 +312,8 @@ void common_sampler_free(struct common_sampler * gsmpl) {
}
void common_sampler_accept(struct common_sampler * gsmpl, llama_token token, bool accept_grammar) {
const auto tm = gsmpl->tm();
if (accept_grammar) {
llama_sampler_accept(gsmpl->grmr, token);
}
@@ -308,9 +324,7 @@ void common_sampler_accept(struct common_sampler * gsmpl, llama_token token, boo
}
void common_sampler_reset(struct common_sampler * gsmpl) {
llama_sampler_reset(gsmpl->grmr);
llama_sampler_reset(gsmpl->chain);
gsmpl->reset();
}
struct common_sampler * common_sampler_clone(common_sampler * gsmpl) {
@@ -327,16 +341,54 @@ struct common_sampler * common_sampler_clone(common_sampler * gsmpl) {
void common_perf_print(const struct llama_context * ctx, const struct common_sampler * gsmpl) {
// TODO: measure grammar performance
const double t_sampling_ms = gsmpl ? 1e-3*gsmpl->t_total_us : 0;
llama_perf_sampler_data data_smpl;
llama_perf_context_data data_ctx;
memset(&data_smpl, 0, sizeof(data_smpl));
memset(&data_ctx, 0, sizeof(data_ctx));
if (gsmpl) {
llama_perf_sampler_print(gsmpl->chain);
auto & data = data_smpl;
data = llama_perf_sampler(gsmpl->chain);
// note: the sampling time includes the samplers time + extra time spent in common/sampling
LOG_INF("%s: sampling time = %10.2f ms\n", __func__, t_sampling_ms);
LOG_INF("%s: samplers time = %10.2f ms / %5d tokens\n", __func__, data.t_sample_ms, data.n_sample);
}
if (ctx) {
llama_perf_context_print(ctx);
auto & data = data_ctx;
data = llama_perf_context(ctx);
const double t_end_ms = 1e-3 * ggml_time_us();
const double t_total_ms = t_end_ms - data.t_start_ms;
const double t_unacc_ms = t_total_ms - (t_sampling_ms + data.t_p_eval_ms + data.t_eval_ms);
const double t_unacc_pc = 100.0 * t_unacc_ms / t_total_ms;
LOG_INF("%s: load time = %10.2f ms\n", __func__, data.t_load_ms);
LOG_INF("%s: prompt eval time = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)\n",
__func__, data.t_p_eval_ms, data.n_p_eval, data.t_p_eval_ms / data.n_p_eval, 1e3 / data.t_p_eval_ms * data.n_p_eval);
LOG_INF("%s: eval time = %10.2f ms / %5d runs (%8.2f ms per token, %8.2f tokens per second)\n",
__func__, data.t_eval_ms, data.n_eval, data.t_eval_ms / data.n_eval, 1e3 / data.t_eval_ms * data.n_eval);
LOG_INF("%s: total time = %10.2f ms / %5d tokens\n", __func__, (t_end_ms - data.t_start_ms), (data.n_p_eval + data.n_eval));
LOG_INF("%s: unaccounted time = %10.2f ms / %5.1f %% (total - sampling - prompt eval - eval) / (total)\n", __func__, t_unacc_ms, t_unacc_pc);
LOG_INF("%s: graphs reused = %10d\n", __func__, data.n_reused);
llama_memory_breakdown_print(ctx);
}
}
llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx, bool grammar_first) {
llama_synchronize(ctx);
// start measuring sampling time after the llama_context synchronization in order to not measure any ongoing async operations
const auto tm = gsmpl->tm();
gsmpl->set_logits(ctx, idx);
auto & grmr = gsmpl->grmr;
@@ -428,6 +480,8 @@ uint32_t common_sampler_get_seed(const struct common_sampler * gsmpl) {
// helpers
llama_token_data_array * common_sampler_get_candidates(struct common_sampler * gsmpl, bool do_sort) {
const auto tm = gsmpl->tm();
auto * res = &gsmpl->cur_p;
if (do_sort && !res->sorted) {

View File

@@ -277,10 +277,15 @@ def parse_args() -> argparse.Namespace:
return parser.parse_args()
def load_hparams_from_hf(hf_model_id: str) -> dict[str, Any]:
def load_hparams_from_hf(hf_model_id: str) -> tuple[dict[str, Any], Path | None]:
from huggingface_hub import try_to_load_from_cache
# normally, adapter does not come with base model config, we need to load it from AutoConfig
config = AutoConfig.from_pretrained(hf_model_id)
return config.to_dict()
cache_dir = try_to_load_from_cache(hf_model_id, "config.json")
cache_dir = Path(cache_dir).parent if isinstance(cache_dir, str) else None
return config.to_dict(), cache_dir
if __name__ == '__main__':
@@ -325,13 +330,13 @@ if __name__ == '__main__':
# load base model
if base_model_id is not None:
logger.info(f"Loading base model from Hugging Face: {base_model_id}")
hparams = load_hparams_from_hf(base_model_id)
hparams, dir_base_model = load_hparams_from_hf(base_model_id)
elif dir_base_model is None:
if "base_model_name_or_path" in lparams:
model_id = lparams["base_model_name_or_path"]
logger.info(f"Loading base model from Hugging Face: {model_id}")
try:
hparams = load_hparams_from_hf(model_id)
hparams, dir_base_model = load_hparams_from_hf(model_id)
except OSError as e:
logger.error(f"Failed to load base model config: {e}")
logger.error("Please try downloading the base model and add its path to --base")
@@ -480,6 +485,7 @@ if __name__ == '__main__':
dir_lora_model=dir_lora,
lora_alpha=alpha,
hparams=hparams,
remote_hf_model_id=base_model_id,
)
logger.info("Exporting model...")

View File

@@ -17,12 +17,12 @@ Legend:
| ABS | ❌ | ✅ | ✅ | 🟡 | 🟡 | ❌ | ✅ | 🟡 | ❌ |
| ACC | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| ADD | ❌ | ✅ | ✅ | ✅ | 🟡 | 🟡 | ✅ | ✅ | ❌ |
| ADD1 | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | | ❌ |
| ADD1 | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | | ❌ |
| ADD_ID | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
| ARANGE | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | | ❌ |
| ARANGE | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | | ❌ |
| ARGMAX | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| ARGSORT | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 🟡 | ❌ |
| CEIL | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | 🟡 | | ❌ |
| CEIL | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | ❌ |
| CLAMP | ❌ | ✅ | ✅ | ✅ | 🟡 | 🟡 | 🟡 | 🟡 | ❌ |
| CONCAT | ❌ | ✅ | ✅ | 🟡 | ✅ | 🟡 | ✅ | ✅ | ❌ |
| CONT | ❌ | 🟡 | ✅ | ✅ | ✅ | 🟡 | 🟡 | 🟡 | ❌ |
@@ -43,9 +43,9 @@ Legend:
| ELU | ❌ | ✅ | ✅ | 🟡 | 🟡 | ❌ | ✅ | ❌ | ❌ |
| EXP | ❌ | ✅ | ✅ | 🟡 | 🟡 | ❌ | ✅ | 🟡 | ❌ |
| EXPM1 | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | ❌ | ❌ | ❌ |
| FILL | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | | ❌ |
| FILL | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | | ❌ |
| FLASH_ATTN_EXT | ❌ | 🟡 | ✅ | 🟡 | 🟡 | ❌ | ❌ | 🟡 | ❌ |
| FLOOR | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | 🟡 | | ❌ |
| FLOOR | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | ❌ |
| GATED_LINEAR_ATTN | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| GEGLU | ❌ | ✅ | ✅ | ✅ | 🟡 | ✅ | ✅ | 🟡 | ❌ |
| GEGLU_ERF | ❌ | ✅ | ✅ | ✅ | 🟡 | ✅ | ✅ | 🟡 | ❌ |
@@ -87,7 +87,7 @@ Legend:
| ROLL | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
| ROPE | ❌ | 🟡 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| ROPE_BACK | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
| ROUND | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | 🟡 | | ❌ |
| ROUND | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | ❌ |
| RWKV_WKV6 | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| RWKV_WKV7 | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| SCALE | ❌ | 🟡 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
@@ -99,7 +99,7 @@ Legend:
| SILU_BACK | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
| SIN | ❌ | ✅ | ✅ | ✅ | 🟡 | ❌ | 🟡 | 🟡 | ❌ |
| SOFTCAP | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| SOFTPLUS | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | ❌ | | ❌ |
| SOFTPLUS | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | ❌ | 🟡 | ❌ |
| SOFT_MAX | ❌ | 🟡 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| SOFT_MAX_BACK | ❌ | ❌ | 🟡 | 🟡 | ❌ | ❌ | 🟡 | ✅ | ❌ |
| SOLVE_TRI | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
@@ -107,7 +107,7 @@ Legend:
| SQRT | ❌ | ✅ | ✅ | ✅ | 🟡 | ❌ | 🟡 | 🟡 | ❌ |
| SSM_CONV | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |
| SSM_SCAN | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | 🟡 | ❌ |
| STEP | ❌ | ✅ | ✅ | 🟡 | 🟡 | ❌ | ✅ | | ❌ |
| STEP | ❌ | ✅ | ✅ | 🟡 | 🟡 | ❌ | ✅ | 🟡 | ❌ |
| SUB | ❌ | ✅ | ✅ | ✅ | 🟡 | 🟡 | ✅ | ✅ | ❌ |
| SUM | ❌ | ✅ | ✅ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | ❌ |
| SUM_ROWS | ❌ | ✅ | ✅ | 🟡 | ✅ | ✅ | 🟡 | ✅ | ❌ |
@@ -116,6 +116,6 @@ Legend:
| TANH | ❌ | ✅ | ✅ | 🟡 | 🟡 | ✅ | ✅ | 🟡 | ❌ |
| TIMESTEP_EMBEDDING | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| TRI | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| TRUNC | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | 🟡 | | ❌ |
| TRUNC | ❌ | ❌ | ✅ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | ❌ |
| UPSCALE | ❌ | 🟡 | ✅ | ✅ | 🟡 | ✅ | 🟡 | ✅ | ❌ |
| XIELU | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |

View File

@@ -5,8 +5,8 @@
"Vulkan0","SGN","type=f16,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","NEG","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","NEG","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","STEP","type=f16,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","STEP","type=f16,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","STEP","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","STEP","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","TANH","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","TANH","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","ELU","type=f16,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
@@ -29,18 +29,18 @@
"Vulkan0","EXP","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","EXPM1","type=f16,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","EXPM1","type=f16,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","SOFTPLUS","type=f16,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","SOFTPLUS","type=f16,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","SOFTPLUS","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","SOFTPLUS","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","GELU_ERF","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","GELU_ERF","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","FLOOR","type=f16,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f16,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","CEIL","type=f16,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","CEIL","type=f16,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","ROUND","type=f16,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","ROUND","type=f16,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","TRUNC","type=f16,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","TRUNC","type=f16,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","FLOOR","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","CEIL","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","CEIL","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","ROUND","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","ROUND","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","TRUNC","type=f16,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","TRUNC","type=f16,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","ABS","type=f16,ne_a=[128,2,2,2],v=1","support","0","no","Vulkan"
"Vulkan0","ABS","type=f16,ne_a=[5,7,11,13],v=1","support","0","no","Vulkan"
"Vulkan0","SGN","type=f16,ne_a=[128,2,2,2],v=1","support","0","no","Vulkan"
@@ -89,8 +89,8 @@
"Vulkan0","SGN","type=f32,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","NEG","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","NEG","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","STEP","type=f32,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","STEP","type=f32,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","STEP","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","STEP","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","TANH","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","TANH","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","ELU","type=f32,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
@@ -113,18 +113,18 @@
"Vulkan0","EXP","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","EXPM1","type=f32,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","EXPM1","type=f32,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","SOFTPLUS","type=f32,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","SOFTPLUS","type=f32,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","SOFTPLUS","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","SOFTPLUS","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","GELU_ERF","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","GELU_ERF","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","FLOOR","type=f32,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f32,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","CEIL","type=f32,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","CEIL","type=f32,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","ROUND","type=f32,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","ROUND","type=f32,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","TRUNC","type=f32,ne_a=[128,2,2,2],v=0","support","0","no","Vulkan"
"Vulkan0","TRUNC","type=f32,ne_a=[5,7,11,13],v=0","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","FLOOR","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","CEIL","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","CEIL","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","ROUND","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","ROUND","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","TRUNC","type=f32,ne_a=[128,2,2,2],v=0","support","1","yes","Vulkan"
"Vulkan0","TRUNC","type=f32,ne_a=[5,7,11,13],v=0","support","1","yes","Vulkan"
"Vulkan0","ABS","type=f32,ne_a=[128,2,2,2],v=1","support","0","no","Vulkan"
"Vulkan0","ABS","type=f32,ne_a=[5,7,11,13],v=1","support","0","no","Vulkan"
"Vulkan0","SGN","type=f32,ne_a=[128,2,2,2],v=1","support","0","no","Vulkan"
@@ -5654,7 +5654,7 @@
"Vulkan0","SUB","type=f32,ne=[64,262144,1,1],nr=[1,1,1,1],nf=1","support","1","yes","Vulkan"
"Vulkan0","MUL","type=f32,ne=[64,262144,1,1],nr=[1,1,1,1],nf=1","support","1","yes","Vulkan"
"Vulkan0","DIV","type=f32,ne=[64,262144,1,1],nr=[1,1,1,1],nf=1","support","1","yes","Vulkan"
"Vulkan0","ADD1","type=f32,ne=[10,5,4,3]","support","0","no","Vulkan"
"Vulkan0","ADD1","type=f32,ne=[10,5,4,3]","support","1","yes","Vulkan"
"Vulkan0","SCALE","type=f32,ne=[10,10,10,10],scale=2.000000,bias=0.000000,inplace=0","support","1","yes","Vulkan"
"Vulkan0","SCALE","type=f32,ne=[10,10,10,10],scale=2.000000,bias=1.000000,inplace=0","support","1","yes","Vulkan"
"Vulkan0","SCALE","type=f32,ne=[10,10,10,10],scale=2.000000,bias=1.000000,inplace=1","support","1","yes","Vulkan"
@@ -8632,10 +8632,10 @@
"Vulkan0","COS","type=f16,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","CLAMP","type=f16,ne=[10,5,4,3],min=-0.500000,max=0.500000","support","0","no","Vulkan"
"Vulkan0","LEAKY_RELU","type=f16,ne_a=[10,5,4,3],negative_slope=0.100000","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f16,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","CEIL","type=f16,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","ROUND","type=f16,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","TRUNC","type=f16,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f16,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","CEIL","type=f16,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","ROUND","type=f16,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","TRUNC","type=f16,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","SQR","type=f16,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","SQRT","type=f16,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","LOG","type=f16,ne=[7,1,5,3]","support","1","yes","Vulkan"
@@ -8643,10 +8643,10 @@
"Vulkan0","COS","type=f16,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","CLAMP","type=f16,ne=[7,1,5,3],min=-0.500000,max=0.500000","support","0","no","Vulkan"
"Vulkan0","LEAKY_RELU","type=f16,ne_a=[7,1,5,3],negative_slope=0.100000","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f16,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","CEIL","type=f16,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","ROUND","type=f16,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","TRUNC","type=f16,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f16,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","CEIL","type=f16,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","ROUND","type=f16,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","TRUNC","type=f16,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","SQR","type=f32,ne=[10,5,4,3]","support","1","yes","Vulkan"
"Vulkan0","SQRT","type=f32,ne=[10,3,3,2]","support","1","yes","Vulkan"
"Vulkan0","LOG","type=f32,ne=[10,5,4,3]","support","1","yes","Vulkan"
@@ -8654,10 +8654,10 @@
"Vulkan0","COS","type=f32,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","CLAMP","type=f32,ne=[10,5,4,3],min=-0.500000,max=0.500000","support","1","yes","Vulkan"
"Vulkan0","LEAKY_RELU","type=f32,ne_a=[10,5,4,3],negative_slope=0.100000","support","1","yes","Vulkan"
"Vulkan0","FLOOR","type=f32,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","CEIL","type=f32,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","ROUND","type=f32,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","TRUNC","type=f32,ne=[10,2,2,2]","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f32,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","CEIL","type=f32,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","ROUND","type=f32,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","TRUNC","type=f32,ne=[10,2,2,2]","support","1","yes","Vulkan"
"Vulkan0","SQR","type=f32,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","SQRT","type=f32,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","LOG","type=f32,ne=[7,1,5,3]","support","1","yes","Vulkan"
@@ -8665,10 +8665,10 @@
"Vulkan0","COS","type=f32,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","CLAMP","type=f32,ne=[7,1,5,3],min=-0.500000,max=0.500000","support","1","yes","Vulkan"
"Vulkan0","LEAKY_RELU","type=f32,ne_a=[7,1,5,3],negative_slope=0.100000","support","1","yes","Vulkan"
"Vulkan0","FLOOR","type=f32,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","CEIL","type=f32,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","ROUND","type=f32,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","TRUNC","type=f32,ne=[7,1,5,3]","support","0","no","Vulkan"
"Vulkan0","FLOOR","type=f32,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","CEIL","type=f32,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","ROUND","type=f32,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","TRUNC","type=f32,ne=[7,1,5,3]","support","1","yes","Vulkan"
"Vulkan0","DIAG_MASK_INF","type=f32,ne=[10,10,1,1],n_past=5","support","1","yes","Vulkan"
"Vulkan0","DIAG_MASK_INF","type=f32,ne=[10,10,3,1],n_past=5","support","1","yes","Vulkan"
"Vulkan0","DIAG_MASK_INF","type=f32,ne=[10,10,3,2],n_past=5","support","1","yes","Vulkan"
@@ -9478,7 +9478,7 @@
"Vulkan0","PAD_REFLECT_1D","type=f32,ne_a=[512,34,2,1],pad_0=10,pad_1=9","support","0","no","Vulkan"
"Vulkan0","PAD_REFLECT_1D","type=f32,ne_a=[3000,384,4,1],pad_0=10,pad_1=9","support","0","no","Vulkan"
"Vulkan0","ROLL","shift0=3,shift1=-2,shift3=1,shift4=-1","support","1","yes","Vulkan"
"Vulkan0","ARANGE","type=f32,start=0.000000,stop=10.000000,step=1.000000","support","0","no","Vulkan"
"Vulkan0","ARANGE","type=f32,start=0.000000,stop=10.000000,step=1.000000","support","1","yes","Vulkan"
"Vulkan0","TIMESTEP_EMBEDDING","type=f32,ne_a=[2,1,1,1],dim=320,max_period=10000","support","1","yes","Vulkan"
"Vulkan0","LEAKY_RELU","type=f32,ne_a=[10,5,4,3],negative_slope=0.100000","support","1","yes","Vulkan"
"Vulkan0","CUMSUM","type=f32,ne=[10,5,4,3]","support","0","no","Vulkan"
@@ -9487,9 +9487,9 @@
"Vulkan0","TRI","type=f32,ne=[10,10,4,3],tri_type=2","support","0","no","Vulkan"
"Vulkan0","TRI","type=f32,ne=[10,10,4,3],tri_type=1","support","0","no","Vulkan"
"Vulkan0","TRI","type=f32,ne=[10,10,4,3],tri_type=0","support","0","no","Vulkan"
"Vulkan0","FILL","type=f32,ne=[10,10,4,3],c=0.000000","support","0","no","Vulkan"
"Vulkan0","FILL","type=f32,ne=[303,207,11,3],c=2.000000","support","0","no","Vulkan"
"Vulkan0","FILL","type=f32,ne=[800,600,4,4],c=-152.000000","support","0","no","Vulkan"
"Vulkan0","FILL","type=f32,ne=[10,10,4,3],c=0.000000","support","1","yes","Vulkan"
"Vulkan0","FILL","type=f32,ne=[303,207,11,3],c=2.000000","support","1","yes","Vulkan"
"Vulkan0","FILL","type=f32,ne=[800,600,4,4],c=-152.000000","support","1","yes","Vulkan"
"Vulkan0","SOLVE_TRI","type=f32,ne_lhs=[10,10,4,3],ne_rhs=[3,10,4,3]","support","0","no","Vulkan"
"Vulkan0","SOLVE_TRI","type=f32,ne_lhs=[11,11,1,1],ne_rhs=[5,11,1,1]","support","0","no","Vulkan"
"Vulkan0","SOLVE_TRI","type=f32,ne_lhs=[17,17,2,4],ne_rhs=[9,17,2,4]","support","0","no","Vulkan"
Can't render this file because it is too large.

View File

@@ -4,10 +4,10 @@
#include "llama.h"
#include "ggml.h"
#include <cmath>
#include <cstdio>
#include <string>
#include <vector>
#include <numeric>
/**
* This the arbitrary data which will be passed to each callback.
@@ -37,23 +37,23 @@ static inline float ggml_compute_bf16_to_fp32(ggml_bf16_t h) {
return u.f;
}
static float ggml_get_float_value(uint8_t * data, ggml_type type, const size_t * nb, size_t i0, size_t i1, size_t i2, size_t i3) {
static float ggml_get_float_value(const uint8_t * data, ggml_type type, const size_t * nb, size_t i0, size_t i1, size_t i2, size_t i3) {
size_t i = i3 * nb[3] + i2 * nb[2] + i1 * nb[1] + i0 * nb[0];
float v;
if (type == GGML_TYPE_F16) {
v = ggml_fp16_to_fp32(*(ggml_fp16_t *) &data[i]);
v = ggml_fp16_to_fp32(*(const ggml_fp16_t *) &data[i]);
} else if (type == GGML_TYPE_F32) {
v = *(float *) &data[i];
v = *(const float *) &data[i];
} else if (type == GGML_TYPE_I64) {
v = (float) *(int64_t *) &data[i];
v = (float) *(const int64_t *) &data[i];
} else if (type == GGML_TYPE_I32) {
v = (float) *(int32_t *) &data[i];
v = (float) *(const int32_t *) &data[i];
} else if (type == GGML_TYPE_I16) {
v = (float) *(int16_t *) &data[i];
v = (float) *(const int16_t *) &data[i];
} else if (type == GGML_TYPE_I8) {
v = (float) *(int8_t *) &data[i];
v = (float) *(const int8_t *) &data[i];
} else if (type == GGML_TYPE_BF16) {
v = ggml_compute_bf16_to_fp32(*(ggml_bf16_t *) &data[i]);
v = ggml_compute_bf16_to_fp32(*(const ggml_bf16_t *) &data[i]);
} else {
GGML_ABORT("fatal error");
}

View File

@@ -2246,8 +2246,7 @@ static void evaluate_and_capture_cann_graph(ggml_backend_cann_context * cann_ctx
bool & use_cann_graph,
bool & cann_graph_update_required) {
#ifdef USE_ACL_GRAPH
ggml_cann_graph * matched_graph = cann_ctx->graph_lru_cache.cache_list.front();
if (use_cann_graph && cann_graph_update_required) {
if (use_cann_graph && cann_graph_update_required) { // Begin CANN graph capture
ACL_CHECK(aclmdlRICaptureBegin(cann_ctx->stream(), ACL_MODEL_RI_CAPTURE_MODE_GLOBAL));
}
#endif // USE_ACL_GRAPH
@@ -2271,12 +2270,14 @@ static void evaluate_and_capture_cann_graph(ggml_backend_cann_context * cann_ctx
}
#ifdef USE_ACL_GRAPH
if (use_cann_graph && cann_graph_update_required) { // End CANN graph capture
ACL_CHECK(aclmdlRICaptureEnd(cann_ctx->stream(), &matched_graph->graph));
}
if (use_cann_graph) {
// Execute graph
ggml_cann_graph * matched_graph = cann_ctx->graph_lru_cache.cache_list.front();
if (cann_graph_update_required) { // End CANN graph capture
ACL_CHECK(aclmdlRICaptureEnd(cann_ctx->stream(), &matched_graph->graph));
}
// Execute CANN graph
ACL_CHECK(aclmdlRIExecuteAsync(matched_graph->graph, cann_ctx->stream()));
}
#endif // USE_ACL_GRAPH

View File

@@ -39,7 +39,7 @@
#include "kernels.h"
#define NELEMS(x) sizeof(x) / sizeof(*x)
#define NELEMS(x) (sizeof(x) / sizeof(*x))
template<size_t(*Fn)(size_t,size_t,size_t)>
static inline size_t kernel_offs_fn3(size_t a, size_t b, size_t c) {
@@ -635,6 +635,7 @@ static ggml_kleidiai_kernels gemm_gemv_kernels[] = {
},
#endif
#endif
{ /* Sentinel */ }
};
static ggml_kleidiai_kernels gemm_gemv_kernels_q8[] = {
@@ -803,6 +804,7 @@ static ggml_kleidiai_kernels gemm_gemv_kernels_q8[] = {
/* .op_type = */ GGML_TYPE_F32,
},
#endif
{ /* Sentinel */ }
};
ggml_kleidiai_kernels * ggml_kleidiai_select_kernels(cpu_feature cpu_features, const ggml_tensor * tensor) {
@@ -810,7 +812,7 @@ ggml_kleidiai_kernels * ggml_kleidiai_select_kernels(cpu_feature cpu_features, c
if (tensor->op == GGML_OP_MUL_MAT && tensor->src[0] != nullptr && tensor->src[1] != nullptr) {
#if defined(__ARM_FEATURE_SME) || defined(__ARM_FEATURE_DOTPROD) || defined(__ARM_FEATURE_MATMUL_INT8)
for (size_t i = 0; i < NELEMS(gemm_gemv_kernels); ++i) {
for (size_t i = 0; i < NELEMS(gemm_gemv_kernels) - 1; ++i) {
if ((cpu_features & gemm_gemv_kernels[i].required_cpu) == gemm_gemv_kernels[i].required_cpu &&
gemm_gemv_kernels[i].lhs_type == tensor->src[1]->type &&
gemm_gemv_kernels[i].rhs_type == tensor->src[0]->type &&
@@ -820,7 +822,7 @@ ggml_kleidiai_kernels * ggml_kleidiai_select_kernels(cpu_feature cpu_features, c
}
}
if (!kernel) {
for (size_t i = 0; i < NELEMS(gemm_gemv_kernels_q8); ++i) {
for (size_t i = 0; i < NELEMS(gemm_gemv_kernels_q8) - 1; ++i) {
if ((cpu_features & gemm_gemv_kernels_q8[i].required_cpu) == gemm_gemv_kernels_q8[i].required_cpu &&
gemm_gemv_kernels_q8[i].lhs_type == tensor->src[1]->type &&
gemm_gemv_kernels_q8[i].rhs_type == tensor->src[0]->type &&
@@ -830,6 +832,10 @@ ggml_kleidiai_kernels * ggml_kleidiai_select_kernels(cpu_feature cpu_features, c
}
}
}
#else
GGML_UNUSED(gemm_gemv_kernels);
GGML_UNUSED(gemm_gemv_kernels_q8);
GGML_UNUSED(cpu_features);
#endif
}
@@ -840,12 +846,14 @@ ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q4_0(cpu_feature features)
ggml_kleidiai_kernels * kernels = nullptr;
#if defined(__ARM_FEATURE_SME) || defined(__ARM_FEATURE_DOTPROD) || defined(__ARM_FEATURE_MATMUL_INT8)
for (size_t i = 0; i < NELEMS(gemm_gemv_kernels); ++i) {
for (size_t i = 0; i < NELEMS(gemm_gemv_kernels) - 1; ++i) {
if ((features & gemm_gemv_kernels[i].required_cpu) == gemm_gemv_kernels[i].required_cpu) {
kernels = &gemm_gemv_kernels[i];
break;
}
}
#else
GGML_UNUSED(features);
#endif
return kernels;
@@ -855,12 +863,14 @@ ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q8_0(cpu_feature features)
ggml_kleidiai_kernels * kernels = nullptr;
#if defined(__ARM_FEATURE_SME) || defined(__ARM_FEATURE_DOTPROD) || defined(__ARM_FEATURE_MATMUL_INT8)
for (size_t i = 0; i < NELEMS(gemm_gemv_kernels_q8); ++i) {
for (size_t i = 0; i < NELEMS(gemm_gemv_kernels_q8) - 1; ++i) {
if ((features & gemm_gemv_kernels_q8[i].required_cpu) == gemm_gemv_kernels_q8[i].required_cpu) {
kernels = &gemm_gemv_kernels_q8[i];
break;
}
}
#else
GGML_UNUSED(features);
#endif
return kernels;

View File

@@ -9696,13 +9696,12 @@ static void ggml_compute_forward_solve_tri_f32(const struct ggml_compute_params
for (int64_t i00 = 0; i00 < n; ++i00) {
float sum = 0.0f;
for (int64_t t = 0; t < i00; ++t) {
sum += A_batch[i00 * n + t] * X_batch[i01 * n + t];
sum += A_batch[i00 * n + t] * X_batch[t * k + i01];
}
const float diag = A_batch[i00 * n + i00];
GGML_ASSERT(diag != 0.0f && "Zero diagonal in triangular matrix");
X_batch[i01 * n + i00] = (B_batch[i00 * k + i01] - sum) / diag;
X_batch[i00 * k + i01] = (B_batch[i00 * k + i01] - sum) / diag;
}
}
}

View File

@@ -160,18 +160,18 @@ inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
#define GGML_F32xt svfloat32_t
#define GGML_F32xt_ZERO svdup_n_f32(0.0f)
#define GGML_F32xt_SET1(x) svdup_n_f32(x)
#define GGML_F32xt_LOAD_IMPL(pg, a, ...) svld1_f32(pg, a)
#define GGML_F32xt_LOAD(...) GGML_F32xt_LOAD_IMPL(DEFAULT_PG, __VA_ARGS__)
#define GGML_F32xt_STORE_IMPL(pg,a,b) svst1_f32(pg, a, b)
#define GGML_F32xt_STORE(...) GGML_F32xt_STORE_IMPL(DEFAULT_PG, __VA_ARGS__)
#define GGML_F32xt_LOAD_IMPL(pg, a) svld1_f32(pg, a)
#define GGML_F32xt_LOAD(a) GGML_F32xt_LOAD_IMPL(DEFAULT_PG, a)
#define GGML_F32xt_STORE_IMPL(pg, a, b) svst1_f32(pg, a, b)
#define GGML_F32xt_STORE(a, b) GGML_F32xt_STORE_IMPL(DEFAULT_PG, a, b)
#define GGML_F32xt_FMA_IMPL(pg, a, b, c) svmad_f32_m(pg, b, c, a)
#define GGML_F32xt_FMA(...) GGML_F32xt_FMA_IMPL(DEFAULT_PG, __VA_ARGS__)
#define GGML_F32xt_FMA(a, b, c) GGML_F32xt_FMA_IMPL(DEFAULT_PG, a, b, c)
#define GGML_F32xt_ADD_IMPL(pg, a, b) svadd_f32_m(pg, a, b)
#define GGML_F32xt_ADD(...) GGML_F32xt_ADD_IMPL(DEFAULT_PG, __VA_ARGS__)
#define GGML_F32xt_ADD(a, b) GGML_F32xt_ADD_IMPL(DEFAULT_PG, a, b)
#define GGML_F32xt_MUL_IMPL(pg, a, b) svmul_f32_m(pg, a, b)
#define GGML_F32xt_MUL(...) GGML_F32xt_MUL_IMPL(DEFAULT_PG, __VA_ARGS__)
#define GGML_F32xt_MUL(a, b) GGML_F32xt_MUL_IMPL(DEFAULT_PG, a, b)
#define GGML_F32xt_REDUCE_ONE_IMPL(pg, a) svaddv(pg, a)
#define GGML_F32xt_REDUCE_ONE(...) GGML_F32xt_REDUCE_ONE_IMPL(DEFAULT_PG, __VA_ARGS__)
#define GGML_F32xt_REDUCE_ONE(a) GGML_F32xt_REDUCE_ONE_IMPL(DEFAULT_PG, a)
#define GGML_F32xt_REDUCE_IMPL(pg, res, sum1, sum2, sum3, sum4, sum5, sum6, sum7, sum8) \
{ \
sum1 = svadd_f32_m(DEFAULT_PG, sum1, sum2); \
@@ -183,7 +183,8 @@ inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
sum1 = svadd_f32_m(DEFAULT_PG, sum1, sum5); \
(res) = (ggml_float) GGML_F32xt_REDUCE_ONE(sum1); \
}
#define GGML_F32xt_REDUCE(...) GGML_F32xt_REDUCE_IMPL(DEFAULT_PG, __VA_ARGS__)
#define GGML_F32xt_REDUCE(res, sum1, sum2, sum3, sum4, sum5, sum6, sum7, sum8) \
GGML_F32xt_REDUCE_IMPL(DEFAULT_PG, res, sum1, sum2, sum3, sum4, sum5, sum6, sum7, sum8)
#define GGML_F32_VEC GGML_F32xt
#define GGML_F32_VEC_ZERO GGML_F32xt_ZERO
@@ -206,11 +207,11 @@ inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
#define GGML_F32Cxt_STORE(dst_ptr, src_vec) svst1_f16(DEFAULT_PG16, (__fp16 *)(dst_ptr), (src_vec))
#define GGML_F32Cxt_FMA_IMPL(pg, a, b, c) svmad_f16_x(pg, b, c, a)
#define GGML_F32Cxt_FMA(...) GGML_F32Cxt_FMA_IMPL(DEFAULT_PG16, __VA_ARGS__)
#define GGML_F32Cxt_FMA(a, b, c) GGML_F32Cxt_FMA_IMPL(DEFAULT_PG16, a, b, c)
#define GGML_F32Cxt_ADD_IMPL(pg, a, b) svadd_f16_x(pg, a, b)
#define GGML_F32Cxt_ADD(...) GGML_F32Cxt_ADD_IMPL(DEFAULT_PG16, __VA_ARGS__)
#define GGML_F32Cxt_ADD(a, b) GGML_F32Cxt_ADD_IMPL(DEFAULT_PG16, a, b)
#define GGML_F32Cxt_MUL_IMPL(pg, a, b) svmul_f16_x(pg, a, b)
#define GGML_F32Cxt_MUL(...) GGML_F32Cxt_MUL_IMPL(DEFAULT_PG16, __VA_ARGS__)
#define GGML_F32Cxt_MUL(a, b) GGML_F32Cxt_MUL_IMPL(DEFAULT_PG16, a, b)
#define GGML_F32Cxt_REDUCE GGML_F16xt_REDUCE_MIXED
#define GGML_F16x_VEC GGML_F32Cxt
@@ -224,7 +225,7 @@ inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
#define GGML_F16x_VEC_REDUCE GGML_F32Cxt_REDUCE
#define GGML_F16xt_REDUCE_ONE_IMPL(pg, a) svaddv_f16(pg, a)
#define GGML_F16xt_REDUCE_ONE(...) GGML_F16xt_REDUCE_ONE_IMPL(DEFAULT_PG16, __VA_ARGS__)
#define GGML_F16xt_REDUCE_ONE(a) GGML_F16xt_REDUCE_ONE_IMPL(DEFAULT_PG16, a)
#define GGML_F16xt_REDUCE_MIXED_IMPL(pg16, res, sum1, sum2, sum3, sum4) \
{ \
@@ -234,7 +235,8 @@ inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
__fp16 sum_f16 = svaddv_f16(pg16, sum1); \
(res) = (ggml_float) sum_f16; \
}
#define GGML_F16xt_REDUCE_MIXED(...) GGML_F16xt_REDUCE_MIXED_IMPL(DEFAULT_PG16, __VA_ARGS__)
#define GGML_F16xt_REDUCE_MIXED(res, sum1, sum2, sum3, sum4) \
GGML_F16xt_REDUCE_MIXED_IMPL(DEFAULT_PG16, res, sum1, sum2, sum3, sum4)
// F16 NEON

View File

@@ -698,60 +698,61 @@ inline static void ggml_vec_scale_f32(const int n, float * y, const float v) {
}
inline static void ggml_vec_scale_f16(const int n, ggml_fp16_t * y, const float v) {
#if defined(GGML_SIMD)
#if defined(__ARM_FEATURE_SVE)
const int sve_register_length = svcntb() * 8;
const int ggml_f16_epr = sve_register_length / 16;
const int ggml_f16_step = 2 * ggml_f16_epr;
#if defined(GGML_SIMD) && defined(__ARM_FEATURE_SVE)
const int sve_register_length = svcntb() * 8;
const int ggml_f16_epr = sve_register_length / 16;
const int ggml_f16_step = 2 * ggml_f16_epr;
GGML_F16x_VEC vx = GGML_F16x_VEC_SET1(v);
const int np = (n & ~(ggml_f16_step - 1));
svfloat16_t ay1, ay2;
GGML_F16x_VEC vx = GGML_F16x_VEC_SET1(v);
const int np = (n & ~(ggml_f16_step - 1));
svfloat16_t ay1, ay2;
for (int i = 0; i < np; i += ggml_f16_step) {
ay1 = GGML_F16x_VEC_LOAD(y + i + 0*ggml_f16_epr, 0);
ay1 = GGML_F16x_VEC_MUL(ay1, vx);
GGML_F16x_VEC_STORE(y + i + 0*ggml_f16_epr, ay1, 0);
for (int i = 0; i < np; i += ggml_f16_step) {
ay1 = GGML_F16x_VEC_LOAD(y + i + 0*ggml_f16_epr, 0);
ay1 = GGML_F16x_VEC_MUL(ay1, vx);
GGML_F16x_VEC_STORE(y + i + 0*ggml_f16_epr, ay1, 0);
ay2 = GGML_F16x_VEC_LOAD(y + i + 1*ggml_f16_epr, 1);
ay2 = GGML_F16x_VEC_MUL(ay2, vx);
GGML_F16x_VEC_STORE(y + i + 1*ggml_f16_epr, ay2, 1);
ay2 = GGML_F16x_VEC_LOAD(y + i + 1*ggml_f16_epr, 1);
ay2 = GGML_F16x_VEC_MUL(ay2, vx);
GGML_F16x_VEC_STORE(y + i + 1*ggml_f16_epr, ay2, 1);
}
// leftovers
// maximum number of leftover elements will be less that ggmlF_16x_epr. Apply predicated svmad on available elements only
if (np < n) {
svbool_t pg = svwhilelt_b16(np, n);
svfloat16_t hy = svld1_f16(pg, (__fp16 *)(y + np));
svfloat16_t out = svmul_f16_m(pg, hy, vx);
svst1_f16(pg, (__fp16 *)(y + np), out);
}
#elif defined(__riscv_v_intrinsic) && defined(__riscv_zvfh)
for (int i = 0, vl; i < n; i += vl) {
vl = __riscv_vsetvl_e16m2(n - i);
vfloat16m2_t vy = __riscv_vle16_v_f16m2((_Float16 *)&y[i], vl);
vfloat32m4_t vy32 = __riscv_vfwcvt_f_f_v_f32m4(vy, vl);
vy32 = __riscv_vfmul_vf_f32m4(vy32, v, vl);
vy = __riscv_vfncvt_f_f_w_f16m2(vy32, vl);
__riscv_vse16_v_f16m2((_Float16 *)&y[i], vy, vl);
}
#elif defined(GGML_SIMD)
const int np = (n & ~(GGML_F16_STEP - 1));
GGML_F16_VEC vx = GGML_F16_VEC_SET1(v);
GGML_F16_VEC ay[GGML_F16_ARR];
for (int i = 0; i < np; i += GGML_F16_STEP) {
for (int j = 0; j < GGML_F16_ARR; j++) {
ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
ay[j] = GGML_F16_VEC_MUL(ay[j], vx);
GGML_F16_VEC_STORE(y + i + j*GGML_F16_EPR, ay, j);
}
// leftovers
// maximum number of leftover elements will be less that ggmlF_16x_epr. Apply predicated svmad on available elements only
if (np < n) {
svbool_t pg = svwhilelt_b16(np, n);
svfloat16_t hy = svld1_f16(pg, (__fp16 *)(y + np));
svfloat16_t out = svmul_f16_m(pg, hy, vx);
svst1_f16(pg, (__fp16 *)(y + np), out);
}
#elif defined(__riscv_v_intrinsic)
// todo: RVV impl
// scalar
for (int i = 0; i < n; ++i) {
y[i] = GGML_CPU_FP32_TO_FP16(GGML_CPU_FP16_TO_FP32(y[i])*v);
}
#else
const int np = (n & ~(GGML_F16_STEP - 1));
}
GGML_F16_VEC vx = GGML_F16_VEC_SET1(v);
GGML_F16_VEC ay[GGML_F16_ARR];
for (int i = 0; i < np; i += GGML_F16_STEP) {
for (int j = 0; j < GGML_F16_ARR; j++) {
ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
ay[j] = GGML_F16_VEC_MUL(ay[j], vx);
GGML_F16_VEC_STORE(y + i + j*GGML_F16_EPR, ay, j);
}
}
// leftovers
for (int i = np; i < n; ++i) {
y[i] = GGML_CPU_FP32_TO_FP16(GGML_CPU_FP16_TO_FP32(y[i])*v);
}
#endif
// leftovers
for (int i = np; i < n; ++i) {
y[i] = GGML_CPU_FP32_TO_FP16(GGML_CPU_FP16_TO_FP32(y[i])*v);
}
#else
// scalar
for (int i = 0; i < n; ++i) {

View File

@@ -224,6 +224,10 @@ static const char * cu_get_error_str(CUresult err) {
#define AMD_MFMA_AVAILABLE
#endif // defined(GGML_USE_HIP) && defined(CDNA) && !defined(GGML_HIP_NO_MMQ_MFMA)
#if defined(GGML_USE_HIP) && defined(RDNA4)
#define AMD_WMMA_AVAILABLE
#endif // defined(GGML_USE_HIP) && defined(RDNA4)
// The Volta instructions are in principle available on Turing or newer but they are effectively unusable:
#if !defined(GGML_USE_HIP) && __CUDA_ARCH__ == GGML_CUDA_CC_VOLTA
#define VOLTA_MMA_AVAILABLE
@@ -283,6 +287,10 @@ static bool amd_mfma_available(const int cc) {
#endif //!defined(GGML_HIP_NO_MMQ_MFMA)
}
static bool amd_wmma_available(const int cc) {
return GGML_CUDA_CC_IS_RDNA4(cc);
}
static bool volta_mma_available(const int cc) {
return GGML_CUDA_CC_IS_NVIDIA(cc) && ggml_cuda_highest_compiled_arch(cc) == GGML_CUDA_CC_VOLTA;
}

View File

@@ -39,6 +39,15 @@ template<typename dst_t, typename src_t>
return __float2bfloat16(float(x));
} else if constexpr(std::is_same_v<src_t, nv_bfloat16>) {
return __bfloat162float(x);
} else if constexpr(std::is_same_v<src_t, float2> && std::is_same_v<dst_t, half2>) {
return __float22half2_rn(x);
} else if constexpr(std::is_same_v<src_t, float2> && std::is_same_v<dst_t, nv_bfloat162>) {
// bypass compile error on cuda 12.0.1
#ifdef GGML_USE_HIP
return __float22bfloat162_rn(x);
#else
return {x.x, x.y};
#endif // GGML_USE_HIP
} else if constexpr(std::is_same_v<dst_t, int32_t>) {
return int32_t(x);
} else {

View File

@@ -3748,10 +3748,110 @@ static const char * ggml_backend_cuda_device_get_description(ggml_backend_dev_t
return ctx->description.c_str();
}
#if defined(__linux__)
// Helper function to get available memory from /proc/meminfo for UMA systems
static bool ggml_backend_cuda_get_available_uma_memory(long * available_memory_kb, long * free_swap_kb) {
FILE * meminfo_file = nullptr;
// 2KB buffer for reading /proc/meminfo since it does not report size info, should be enough
const size_t BUFFER_SIZE = 2048;
auto file_buffer = std::make_unique<char[]>(BUFFER_SIZE);
size_t bytes_read = 0;
long huge_tlb_total_pages = -1;
long huge_tlb_free_pages = -1;
long huge_tlb_page_size = -1;
if (available_memory_kb == nullptr || free_swap_kb == nullptr) {
return false;
}
meminfo_file = fopen("/proc/meminfo", "r");
if (meminfo_file == nullptr) {
GGML_LOG_ERROR("%s: failed to open /proc/meminfo\n", __func__);
return false;
}
// Read file into buffer
bytes_read = fread(file_buffer.get(), 1, BUFFER_SIZE - 1, meminfo_file);
fclose(meminfo_file);
if (bytes_read == 0) {
GGML_LOG_ERROR("%s: failed to read from /proc/meminfo\n", __func__);
return false;
}
file_buffer[bytes_read] = '\0';
*available_memory_kb = -1;
*free_swap_kb = -1;
// Parse the file buffer line by line
char * line = file_buffer.get();
char * line_next;
while (line < file_buffer.get() + bytes_read) {
// Find the end of the current line
line_next = strchr(line, '\n');
if (line_next != nullptr) {
*line_next = '\0';
line_next++;
} else {
line_next = file_buffer.get() + bytes_read;
}
long value;
if (sscanf(line, "MemAvailable: %ld kB", &value) == 1) {
*available_memory_kb = value;
} else if (sscanf(line, "SwapFree: %ld kB", &value) == 1) {
*free_swap_kb = value;
} else if (sscanf(line, "HugePages_Total: %ld", &value) == 1) {
huge_tlb_total_pages = value;
} else if (sscanf(line, "HugePages_Free: %ld", &value) == 1) {
huge_tlb_free_pages = value;
} else if (sscanf(line, "Hugepagesize: %ld kB", &value) == 1) {
huge_tlb_page_size = value;
}
line = line_next;
}
if (huge_tlb_total_pages != 0 && huge_tlb_total_pages != -1) {
*available_memory_kb = huge_tlb_free_pages * huge_tlb_page_size;
// Hugetlbfs pages are not swappable.
*free_swap_kb = 0;
}
GGML_LOG_DEBUG("%s: final available_memory_kb: %ld\n", __func__, *available_memory_kb);
return true;
}
#endif // defined(__linux__)
static void ggml_backend_cuda_device_get_memory(ggml_backend_dev_t dev, size_t * free, size_t * total) {
ggml_backend_cuda_device_context * ctx = (ggml_backend_cuda_device_context *)dev->context;
ggml_cuda_set_device(ctx->device);
CUDA_CHECK(cudaMemGetInfo(free, total));
// ref: https://github.com/ggml-org/llama.cpp/pull/17368
#if defined(__linux__)
// Check if this is a UMA (Unified Memory Architecture) system
cudaDeviceProp prop;
CUDA_CHECK(cudaGetDeviceProperties(&prop, ctx->device));
// Check if UMA is explicitly enabled via environment variable
bool uma_env = getenv("GGML_CUDA_ENABLE_UNIFIED_MEMORY") != nullptr;
bool is_uma = prop.unifiedAddressing > 0 || uma_env;
if (is_uma) {
// For UMA systems (like DGX Spark), use system memory info
long available_memory_kb = 0;
long free_swap_kb = 0;
if (ggml_backend_cuda_get_available_uma_memory(&available_memory_kb, &free_swap_kb) && available_memory_kb > 0) {
*free = (size_t)available_memory_kb * 1024;
} else {
GGML_LOG_ERROR("%s: /proc/meminfo reading failed, using cudaMemGetInfo\n", __func__);
}
}
#endif // defined(__linux__)
}
static enum ggml_backend_dev_type ggml_backend_cuda_device_get_type(ggml_backend_dev_t dev) {

View File

@@ -74,6 +74,33 @@ namespace ggml_cuda_mma {
static constexpr int J = J_;
#if defined(GGML_USE_HIP)
#if defined(RDNA4)
static constexpr int ne = I * J / 32;
T x[ne] = {0};
static constexpr __device__ bool supported() {
if (I == 16 && J == 16) return true;
return false;
}
static __device__ __forceinline__ int get_i(const int l) {
if constexpr (I == 16 && J == 16) {
return 8 * (threadIdx.x / 16) + l;
} else {
NO_DEVICE_CODE;
return -1;
}
}
static __device__ __forceinline__ int get_j(const int l) {
if constexpr (I == 16 && J == 16) {
return threadIdx.x % 16;
} else {
NO_DEVICE_CODE;
return -1;
}
}
#else
static constexpr int ne = I * J / 64;
T x[ne] = {0};
@@ -119,6 +146,7 @@ namespace ggml_cuda_mma {
return -1;
}
}
#endif // defined(RDNA4)
#elif __CUDA_ARCH__ == GGML_CUDA_CC_VOLTA
static constexpr int ne = I * J / 32;
T x[ne] = {0};
@@ -236,6 +264,32 @@ namespace ggml_cuda_mma {
return -1;
}
}
#elif defined(AMD_WMMA_AVAILABLE)
static constexpr int ne = I * J / 32;
half2 x[ne] = {{0.0f, 0.0f}};
static constexpr __device__ bool supported() {
if (I == 16 && J == 8) return true;
return false;
}
static __device__ __forceinline__ int get_i(const int l) {
if constexpr (I == 16 && J == 8) {
return threadIdx.x % 16;
} else {
NO_DEVICE_CODE;
return -1;
}
}
static __device__ __forceinline__ int get_j(const int l) {
if constexpr (I == 16 && J == 8) {
return 4 * (threadIdx.x / 16) + l;
} else {
NO_DEVICE_CODE;
return -1;
}
}
#else
static constexpr int ne = I * J / WARP_SIZE;
half2 x[ne] = {{0.0f, 0.0f}};
@@ -285,6 +339,34 @@ namespace ggml_cuda_mma {
struct tile<I_, J_, nv_bfloat162> {
static constexpr int I = I_;
static constexpr int J = J_;
#if defined(AMD_WMMA_AVAILABLE)
static constexpr int ne = I * J / 32;
nv_bfloat162 x[ne] = {{0.0f, 0.0f}};
static constexpr __device__ bool supported() {
if (I == 16 && J == 8) return true;
return false;
}
static __device__ __forceinline__ int get_i(const int l) {
if constexpr (I == 16 && J == 8) {
return threadIdx.x % 16;
} else {
NO_DEVICE_CODE;
return -1;
}
}
static __device__ __forceinline__ int get_j(const int l) {
if constexpr (I == 16 && J == 8) {
return 4 * (threadIdx.x / 16) + l;
} else {
NO_DEVICE_CODE;
return -1;
}
}
#else
static constexpr int ne = I * J / WARP_SIZE;
nv_bfloat162 x[ne] = {{0.0f, 0.0f}};
@@ -320,6 +402,7 @@ namespace ggml_cuda_mma {
return -1;
}
}
#endif // defined(AMD_WMMA_AVAILABLE)
};
template <int I, int J>
@@ -353,6 +436,8 @@ namespace ggml_cuda_mma {
const int64_t * xs = (int64_t *) ((const int *) xs0 + (threadIdx.x % t.I) * stride + 2 * (threadIdx.x / t.I));
xi[0] = xs[0];
}
#elif defined(AMD_WMMA_AVAILABLE)
ggml_cuda_memcpy_1<sizeof(t.x)>(t.x, xs0 + t.get_i(0) * stride + t.get_j(0));
#else
#pragma unroll
for (int l = 0; l < t.ne; ++l) {
@@ -639,12 +724,34 @@ namespace ggml_cuda_mma {
: "+r"(Dxi[4]), "+r"(Dxi[5]), "+r"(Dxi[6]), "+r"(Dxi[7])
: "r"(Axi[2]), "r"(Axi[3]), "r"(Bxi[3]));
#endif // __CUDA_ARCH__ >= GGML_CUDA_CC_AMPERE
#elif defined(AMD_WMMA_AVAILABLE)
using halfx8_t = __attribute__((ext_vector_type(8))) _Float16;
using floatx8_t = __attribute__((ext_vector_type(8))) float;
floatx8_t& acc_frag = reinterpret_cast<floatx8_t&>(D.x[0]);
const halfx8_t& a_frag = reinterpret_cast<const halfx8_t&>(A.x[0]);
const halfx8_t& b_frag = reinterpret_cast<const halfx8_t&>(B.x[0]);
acc_frag = __builtin_amdgcn_wmma_f32_16x16x16_f16_w32_gfx12(a_frag, b_frag, acc_frag);
#else
GGML_UNUSED_VARS(D, A, B);
NO_DEVICE_CODE;
#endif // TURING_MMA_AVAILABLE
}
static __device__ __forceinline__ void mma(
tile<16, 16, float> & D, const tile<16, 8, nv_bfloat162> & A, const tile<16, 8, nv_bfloat162> & B) {
#if defined(AMD_WMMA_AVAILABLE)
using bf16x8_t = __attribute__((ext_vector_type(8))) __bf16;
using floatx8_t = __attribute__((ext_vector_type(8))) float;
floatx8_t& acc_frag = reinterpret_cast<floatx8_t&>(D.x[0]);
const bf16x8_t& a_frag = reinterpret_cast<const bf16x8_t&>(A.x[0]);
const bf16x8_t& b_frag = reinterpret_cast<const bf16x8_t&>(B.x[0]);
acc_frag = __builtin_amdgcn_wmma_f32_16x16x16_bf16_w32_gfx12(a_frag, b_frag, acc_frag);
#else
GGML_UNUSED_VARS(D, A, B);
NO_DEVICE_CODE;
#endif // AMPERE_MMA_AVAILABLE
}
static __device__ __forceinline__ void mma(
tile<16, 16, int> & D, const tile<16, 8, int> & A, const tile<16, 8, int> & B) {
#if defined(AMD_MFMA_AVAILABLE)

View File

@@ -151,7 +151,7 @@ bool ggml_cuda_should_use_mmf(enum ggml_type type, int cc, int warp_size, const
return false;
}
} else {
if (src1_ncols > 16) {
if (src1_ncols > 16 || GGML_CUDA_CC_IS_RDNA4(cc)) {
return false;
}
}
@@ -160,9 +160,9 @@ bool ggml_cuda_should_use_mmf(enum ggml_type type, int cc, int warp_size, const
case GGML_TYPE_F32:
return ampere_mma_available(cc);
case GGML_TYPE_F16:
return volta_mma_available(cc) || turing_mma_available(cc);
return volta_mma_available(cc) || turing_mma_available(cc) || amd_wmma_available(cc);
case GGML_TYPE_BF16:
return ampere_mma_available(cc);
return ampere_mma_available(cc) || amd_wmma_available(cc);
default:
return false;
}

View File

@@ -2,6 +2,7 @@
#include "mma.cuh"
#include "common.cuh"
#include "convert.cuh"
using namespace ggml_cuda_mma;
@@ -27,20 +28,35 @@ static __global__ void mul_mat_f(
const int stride_col_id, const int stride_row_id,
const int channel_ratio, const int stride_channel_x, const int stride_channel_y, const int stride_channel_dst,
const int sample_ratio, const int stride_sample_x, const int stride_sample_y, const int stride_sample_dst) {
#if !defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA)
// TODO: handle this in a consistent and simpler way after AMD MFMA support has been added
#if (!defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA)) || defined(AMD_WMMA_AVAILABLE)
#if defined(AMD_WMMA_AVAILABLE)
// Special case for tf32, just dummy mma layout as wmma doesn't support it.
constexpr int tile_B_I = std::is_same_v<T, float> ? 8 : 16;
constexpr int tile_C_J = std::is_same_v<T, float> ? 8 : 16;
typedef tile<16, 8, T> tile_A;
typedef tile<tile_B_I, 8, T> tile_B;
typedef tile<16, tile_C_J, float> tile_C;
constexpr bool a_supported = tile_A::supported();
constexpr bool b_supported = tile_B::supported();
constexpr bool c_supported = tile_C::supported();
constexpr bool supported = a_supported && b_supported && c_supported;
#else
constexpr bool I_16_supported = tile<16, 8, T>::supported() && tile<16, 8, float>::supported();
constexpr bool I_32_supported = tile<32, 8, T>::supported() && tile<32, 8, float>::supported();
if (!I_16_supported && !I_32_supported) {
NO_DEVICE_CODE;
return;
}
constexpr bool supported = I_16_supported || I_32_supported;
constexpr int I_preferred = I_16_supported ? 16 : 32; // For Turing MMA both work but 16 is ~1% faster.
typedef tile<I_preferred, 8, T> tile_A;
typedef tile<8, 8, T> tile_B;
typedef tile<I_preferred, 8, float> tile_C;
#endif // defined(AMD_WMMA_AVAILABLE)
if constexpr (!supported) {
NO_DEVICE_CODE;
return;
}
constexpr int warp_size = ggml_cuda_get_physical_warp_size();
constexpr int tile_k_padded = warp_size + 4;
@@ -161,11 +177,11 @@ static __global__ void mul_mat_f(
if constexpr (!has_ids) {
const float2 tmp = j < cols_per_block ? y2[j*stride_col_y + col] : make_float2(0.0f, 0.0f);
tile_xy[j0*tile_k_padded + threadIdx.x] = {tmp.x, tmp.y};
tile_xy[j0*tile_k_padded + threadIdx.x] = ggml_cuda_cast<T>(tmp);
} else {
const bool valid = j < cols_per_block && (col_base + j) < ncols_dst_total && slot_map[j] >= 0;
float2 tmp = valid ? *(const float2*) &y[slot_map[j]*stride_channel_y + 2*(j*stride_col_y + col)] : make_float2(0.0f, 0.0f);
tile_xy[j0*tile_k_padded + threadIdx.x] = {tmp.x, tmp.y};
tile_xy[j0*tile_k_padded + threadIdx.x] = ggml_cuda_cast<T>(tmp);
}
}
} else {
@@ -239,7 +255,7 @@ static __global__ void mul_mat_f(
channel_ratio, stride_channel_x, stride_channel_y, stride_channel_dst,
sample_ratio, stride_sample_x, stride_sample_y, stride_sample_dst);
NO_DEVICE_CODE;
#endif // !defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA)
#endif // (!defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA)) || defined(AMD_WMMA_AVAILABLE)
}
//This kernel is for larger batch sizes of mul_mat_id
@@ -253,20 +269,35 @@ static __global__ void mul_mat_f_ids(
const int channel_ratio, const int stride_channel_x, const int stride_channel_y, const int stride_channel_dst,
const int sample_ratio, const int stride_sample_x, const int stride_sample_y, const int stride_sample_dst,
const uint3 sis1_fd, const uint3 nch_fd) {
#if !defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA)
// TODO: handle this in a consistent and simpler way after AMD MFMA support has been added
#if (!defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA)) || defined(AMD_WMMA_AVAILABLE)
#if defined(AMD_WMMA_AVAILABLE)
// Special case for tf32, just dummy mma layout as wmma doesn't support it.
constexpr int tile_B_I = std::is_same_v<T, float> ? 8 : 16;
constexpr int tile_C_J = std::is_same_v<T, float> ? 8 : 16;
typedef tile<16, 8, T> tile_A;
typedef tile<tile_B_I, 8, T> tile_B;
typedef tile<16, tile_C_J, float> tile_C;
constexpr bool a_supported = tile_A::supported();
constexpr bool b_supported = tile_B::supported();
constexpr bool c_supported = tile_C::supported();
constexpr bool supported = a_supported && b_supported && c_supported;
#else
constexpr bool I_16_supported = tile<16, 8, T>::supported() && tile<16, 8, float>::supported();
constexpr bool I_32_supported = tile<32, 8, T>::supported() && tile<32, 8, float>::supported();
constexpr bool supported = I_16_supported || I_32_supported;
if (!I_16_supported && !I_32_supported) {
NO_DEVICE_CODE;
return;
}
constexpr int I_preferred = I_16_supported ? 16 : 32; // For Turing MMA both work butr 16 is ~1% faster.
constexpr int I_preferred = I_16_supported ? 16 : 32; // For Turing MMA both work but 16 is ~1% faster.
typedef tile<I_preferred, 8, T> tile_A;
typedef tile<8, 8, T> tile_B;
typedef tile<I_preferred, 8, float> tile_C;
#endif // defined(AMD_WMMA_AVAILABLE)
if constexpr (!supported) {
NO_DEVICE_CODE;
return;
}
constexpr int warp_size = ggml_cuda_get_physical_warp_size();
constexpr int tile_k_padded = warp_size + 4;
@@ -408,7 +439,7 @@ static __global__ void mul_mat_f_ids(
#pragma unroll
for (int j0 = 0; j0 < tile_B::I; ++j0) {
const float2 tmp = vals_buf[curr_buf][j0];
tile_xy[j0*tile_k_padded + threadIdx.x] = {tmp.x, tmp.y};
tile_xy[j0*tile_k_padded + threadIdx.x] = ggml_cuda_cast<T>(tmp);
}
if (itB + 1 < ntB) {
@@ -492,7 +523,7 @@ static __global__ void mul_mat_f_ids(
channel_ratio, stride_channel_x, stride_channel_y, stride_channel_dst,
sample_ratio, stride_sample_x, stride_sample_y, stride_sample_dst, sis1_fd, nch_fd);
NO_DEVICE_CODE;
#endif // !defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA)
#endif // (!defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA)) || defined(AMD_WMMA_AVAILABLE)
}
template<typename T, int cols_per_block, int nwarps>
@@ -554,7 +585,8 @@ void mul_mat_f_cuda(
cudaStream_t stream, const mmf_ids_data * ids_data) {
typedef tile<16, 8, T> tile_A_16;
typedef tile<32, 8, T> tile_A_32;
typedef tile< 8, 8, T> tile_B;
typedef tile<16, 8, T> tile_B_16;
typedef tile< 8, 8, T> tile_B_8;
GGML_ASSERT(ncols_x % 2 == 0);
GGML_ASSERT(stride_row % 2 == 0);
@@ -581,7 +613,8 @@ void mul_mat_f_cuda(
constexpr int rows_per_block = MMF_ROWS_PER_BLOCK;
const int nbytes_shared_iter = nwarps_best * (volta_mma_available(cc) ? tile_A_32::I : tile_A_16::I) * (warp_size + 4) * 4;
const int nbytes_shared_combine = GGML_PAD(cols_per_block, tile_B::I) * (nwarps_best*rows_per_block + 4) * 4;
const int nbytes_cols_per_block_pad = amd_wmma_available(cc) ? tile_B_16::I : tile_B_8::I;
const int nbytes_shared_combine = GGML_PAD(cols_per_block, nbytes_cols_per_block_pad) * (nwarps_best*rows_per_block + 4) * 4;
const int nbytes_shared = std::max(nbytes_shared_iter, nbytes_shared_combine);
const int nbytes_slotmap = ids ? GGML_PAD(cols_per_block, 16) * sizeof(int) : 0;
const int nbytes_shared_total = nbytes_shared + nbytes_slotmap;

View File

@@ -106,33 +106,32 @@ static void glu_swiglu_fp32_per_thread(const struct htp_tensor * src0,
t1 = HAP_perf_get_qtimer_count();
int is_aligned = 1;
int opt_path = 0;
if (!htp_is_aligned((void *) src0->data, VLEN) || !htp_is_aligned((void *) dst->data, VLEN)) {
is_aligned = 0;
FARF(HIGH, "swiglu-f32: unaligned addresses in elementwise op, possibly slower execution\n");
}
if ((1 == is_aligned) && !(nb01 & (VLEN - 1))) {
opt_path = 1;
}
const uint8_t * restrict data_src0 = (const uint8_t *) src0->data;
const uint8_t * restrict data_src1 = (const uint8_t *) src1->data;
uint8_t * restrict data_dst = (uint8_t *) dst->data;
bool src1_valid = src1->ne[0];
const bool src1_valid = src1->ne[0];
const int nc = (src1_valid) ? ne00 : ne00 / 2;
if (!src1_valid) {
data_src1 = data_src0;
src1_row_size = src0_row_size;
const int32_t swapped = op_params[1];
data_src1 = data_src0;
src1_row_size = src0_row_size;
const size_t nc_in_bytes = nc * SIZEOF_FP32;
data_src0 += swapped ? nc_in_bytes : 0;
data_src1 += swapped ? 0 : nc_in_bytes;
}
uint8_t * restrict src0_spad_data = src0_spad->data + (ith * src0_row_size);
uint8_t * restrict src1_spad_data = src1_spad->data + (ith * src1_row_size);
uint8_t * restrict dst_spad_data = dst_spad->data + (ith * dst_row_size);
const int32_t swapped = op_params[1];
const int nc = (src1_valid) ? ne0 : ne0 / 2;
const bool opt_path = ((1 == is_aligned) && !(nb01 & (VLEN - 1)));
for (uint32_t ir = src0_start_row; ir < src0_end_row; ir++) {
const float * restrict src0 = (float *) (data_src0 + (ir * src0_row_size));
const float * restrict src1 = (float *) (data_src1 + (ir * src1_row_size));
@@ -142,12 +141,7 @@ static void glu_swiglu_fp32_per_thread(const struct htp_tensor * src0,
htp_l2fetch(src0 + src0_row_size, 1, src0_row_size, src0_row_size);
}
if (!src1_valid) {
src0 += swapped ? nc : 0;
src1 += swapped ? 0 : nc;
}
if (1 == opt_path) {
if (opt_path) {
hvx_fast_sigmoid_f32((const uint8_t *) src0, (uint8_t *) src0_spad_data, nc);
hvx_mul_mul_f32_opt((const uint8_t *) src0, (const uint8_t *) src0_spad_data, (const uint8_t *) src1,
(uint8_t *) dst, nc);
@@ -218,7 +212,7 @@ static void glu_swiglu_oai_fp32_per_thread(const struct htp_tensor * src0,
const float alpha = ((const float *) (op_params))[2];
const float limit = ((const float *) (op_params))[3];
const int nc = (src1_valid) ? ne0 : ne0 / 2;
const int nc = (src1_valid) ? ne00 : ne00 / 2;
for (uint32_t ir = src0_start_row; ir < src0_end_row; ir++) {
const float * restrict src0 = (float *) (data_src0 + (ir * src0_row_size));

View File

@@ -16,6 +16,19 @@
#include "hvx-utils.h"
#include "ops-utils.h"
static inline HVX_Vector hvx_vec_exp_fp32_guard(HVX_Vector in_vec) {
static const float kInf = INFINITY;
static const float kMaxExp = 88.02f; // log(INF)
const HVX_Vector max_exp = hvx_vec_splat_fp32(kMaxExp);
const HVX_Vector inf = hvx_vec_splat_fp32(kInf);
const HVX_VectorPred pred0 = Q6_Q_vcmp_gt_VsfVsf(in_vec, max_exp);
HVX_Vector out = hvx_vec_exp_fp32(in_vec);
return Q6_V_vmux_QVV(pred0, inf, out);
}
void hvx_exp_f32(const uint8_t * restrict src, uint8_t * restrict dst, const int num_elems, bool negate) {
int left_over = num_elems & (VLEN_FP32 - 1);
int num_elems_whole = num_elems - left_over;
@@ -42,9 +55,9 @@ void hvx_exp_f32(const uint8_t * restrict src, uint8_t * restrict dst, const int
for (int i = 0; i < num_elems_whole; i += VLEN_FP32) {
if (true == negate) {
HVX_Vector neg_vec_in = hvx_vec_neg_fp32(*p_vec_in1++);
*p_vec_out++ = hvx_vec_exp_fp32(neg_vec_in);
*p_vec_out++ = hvx_vec_exp_fp32_guard(neg_vec_in);
} else {
*p_vec_out++ = hvx_vec_exp_fp32(*p_vec_in1++);
*p_vec_out++ = hvx_vec_exp_fp32_guard(*p_vec_in1++);
}
}
} else {
@@ -54,9 +67,9 @@ void hvx_exp_f32(const uint8_t * restrict src, uint8_t * restrict dst, const int
if (true == negate) {
HVX_Vector neg_vec_in = hvx_vec_neg_fp32(in);
*(HVX_UVector *) (dst + i * SIZEOF_FP32) = hvx_vec_exp_fp32(neg_vec_in);
*(HVX_UVector *) (dst + i * SIZEOF_FP32) = hvx_vec_exp_fp32_guard(neg_vec_in);
} else {
*(HVX_UVector *) (dst + i * SIZEOF_FP32) = hvx_vec_exp_fp32(in);
*(HVX_UVector *) (dst + i * SIZEOF_FP32) = hvx_vec_exp_fp32_guard(in);
}
}
}
@@ -70,9 +83,9 @@ void hvx_exp_f32(const uint8_t * restrict src, uint8_t * restrict dst, const int
if (true == negate) {
HVX_Vector neg_vec_in = hvx_vec_neg_fp32(in);
vec_out = hvx_vec_exp_fp32(neg_vec_in);
vec_out = hvx_vec_exp_fp32_guard(neg_vec_in);
} else {
vec_out = hvx_vec_exp_fp32(in);
vec_out = hvx_vec_exp_fp32_guard(in);
}
hvx_vec_store_u((void *) dstf, left_over * SIZEOF_FP32, vec_out);

View File

@@ -38,13 +38,13 @@ void hvx_inverse_f32(const uint8_t * restrict src, uint8_t * restrict dst, const
#pragma unroll(4)
for (int i = 0; i < num_elems_whole; i += VLEN_FP32) {
*p_vec_out++ = hvx_vec_inverse_fp32(*p_vec_in++);
*p_vec_out++ = hvx_vec_inverse_fp32_guard(*p_vec_in++);
}
} else {
#pragma unroll(4)
for (int i = 0; i < num_elems_whole; i += VLEN_FP32) {
HVX_Vector in = *(HVX_UVector *) (src + i * SIZEOF_FP32);
*(HVX_UVector *) (dst + i * SIZEOF_FP32) = hvx_vec_inverse_fp32(in);
*(HVX_UVector *) (dst + i * SIZEOF_FP32) = hvx_vec_inverse_fp32_guard(in);
}
}
@@ -53,7 +53,7 @@ void hvx_inverse_f32(const uint8_t * restrict src, uint8_t * restrict dst, const
float * dstf = (float *) dst + num_elems_whole;
HVX_Vector in = *(HVX_UVector *) srcf;
HVX_Vector out = hvx_vec_inverse_fp32(in);
HVX_Vector out = hvx_vec_inverse_fp32_guard(in);
hvx_vec_store_u((void *) dstf, left_over * SIZEOF_FP32, out);
}

View File

@@ -401,7 +401,9 @@ void hvx_add_scalar_f32(const uint8_t * restrict src, const float val, uint8_t *
FARF(HIGH, "hvx_add_scalar_f32: unaligned loop in hvx op, possibly slower execution\n");
}
HVX_Vector val_vec = hvx_vec_splat_fp32(val);
static const float kInf = INFINITY;
const HVX_Vector inf = hvx_vec_splat_fp32(kInf);
HVX_Vector val_vec = hvx_vec_splat_fp32(val);
if (0 == unaligned_loop) {
HVX_Vector * restrict vec_in1 = (HVX_Vector *) src;
@@ -409,17 +411,24 @@ void hvx_add_scalar_f32(const uint8_t * restrict src, const float val, uint8_t *
#pragma unroll(4)
for (int i = 0; i < num_elems_whole; i += VLEN_FP32) {
HVX_Vector v = Q6_Vqf32_vadd_VsfVsf(*vec_in1++, val_vec);
*vec_out++ = Q6_Vsf_equals_Vqf32(v);
HVX_Vector in = *vec_in1++;
const HVX_VectorPred pred_inf = Q6_Q_vcmp_eq_VwVw(inf, in);
HVX_Vector v = Q6_Vqf32_vadd_VsfVsf(in, val_vec);
v = Q6_Vsf_equals_Vqf32(v);
v = Q6_V_vmux_QVV(pred_inf, inf, v);
*vec_out++ = v;
}
} else {
#pragma unroll(4)
for (int i = 0; i < num_elems_whole; i += VLEN_FP32) {
HVX_Vector in = *(HVX_UVector *) (src + i * SIZEOF_FP32);
HVX_Vector out = Q6_Vqf32_vadd_VsfVsf(in, val_vec);
const HVX_VectorPred pred_inf = Q6_Q_vcmp_eq_VwVw(inf, in);
HVX_Vector out = Q6_Vqf32_vadd_VsfVsf(in, val_vec);
out = Q6_Vsf_equals_Vqf32(out);
out = Q6_V_vmux_QVV(pred_inf, inf, out);
*(HVX_UVector *) (dst + i * SIZEOF_FP32) = Q6_Vsf_equals_Vqf32(out);
*(HVX_UVector *) (dst + i * SIZEOF_FP32) = out;
}
}
@@ -429,8 +438,12 @@ void hvx_add_scalar_f32(const uint8_t * restrict src, const float val, uint8_t *
HVX_Vector in = *(HVX_UVector *) srcf;
HVX_Vector out = Q6_Vqf32_vadd_VsfVsf(in, val_vec);
hvx_vec_store_u((void *) dstf, left_over * SIZEOF_FP32, Q6_Vsf_equals_Vqf32(out));
const HVX_VectorPred pred_inf = Q6_Q_vcmp_eq_VwVw(inf, in);
HVX_Vector out = Q6_Vqf32_vadd_VsfVsf(in, val_vec);
out = Q6_Vsf_equals_Vqf32(out);
out = Q6_V_vmux_QVV(pred_inf, inf, out);
hvx_vec_store_u((void *) dstf, left_over * SIZEOF_FP32, out);
}
}

View File

@@ -12,6 +12,15 @@
#define VLEN_FP32 (VLEN / SIZEOF_FP32)
#define VLEN_FP16 (VLEN / SIZEOF_FP16)
typedef union {
HVX_Vector v;
uint8_t b[VLEN];
uint16_t h[VLEN_FP16];
uint32_t w[VLEN_FP32];
__fp16 fp16[VLEN_FP16];
float fp32[VLEN_FP32];
} __attribute__((aligned(VLEN), packed)) HVX_VectorAlias;
static inline HVX_Vector hvx_vec_splat_fp32(float i) {
union {
float f;
@@ -243,19 +252,16 @@ static __attribute__((always_inline)) int32_t is_in_one_chunk(void * addr, uint3
}
static void hvx_vec_dump_fp16_n(char * pref, HVX_Vector v, uint32_t n) {
union {
HVX_Vector v;
__fp16 d[64];
} u = { .v = v };
HVX_VectorAlias u = { .v = v };
const uint32_t n0 = n / 16;
const uint32_t n1 = n % 16;
int i = 0;
for (; i < n0; i++) {
htp_dump_fp16_line(pref, u.d + (16 * i), 16);
htp_dump_fp16_line(pref, u.fp16 + (16 * i), 16);
}
if (n1) {
htp_dump_fp16_line(pref, u.d + (16 * i), n1);
htp_dump_fp16_line(pref, u.fp16 + (16 * i), n1);
}
}
@@ -411,8 +417,8 @@ static inline HVX_Vector hvx_vec_fp32_reduce_sum_n(HVX_Vector in, unsigned int n
HVX_Vector sum = in, sum_t;
while (width < total) {
sum_t = Q6_V_vror_VR(sum, width); // rotate right
sum = Q6_Vsf_equals_Vqf32(Q6_Vqf32_vadd_VsfVsf(sum, sum_t)); // elementwise sum
sum_t = Q6_V_vror_VR(sum, width); // rotate right
sum = Q6_Vsf_equals_Vqf32(Q6_Vqf32_vadd_VsfVsf(sum, sum_t)); // elementwise sum
width = width << 1;
}
return sum;
@@ -491,7 +497,7 @@ static inline HVX_Vector hvx_vec_abs_fp16(HVX_Vector v) {
static inline HVX_Vector hvx_vec_neg_fp16(HVX_Vector v) {
// neg by setting the fp16 sign bit
HVX_Vector mask = Q6_Vh_vsplat_R(0x8000);
return Q6_V_vor_VV(v, mask);
return Q6_V_vxor_VV(v, mask);
}
static inline HVX_Vector hvx_vec_abs_fp32(HVX_Vector v) {
@@ -506,7 +512,7 @@ static inline HVX_Vector hvx_vec_neg_fp32(HVX_Vector v) {
#else
// neg by setting the fp32 sign bit
HVX_Vector mask = Q6_V_vsplat_R(0x80000000);
return Q6_V_vor_VV(v, mask);
return Q6_V_vxor_VV(v, mask);
#endif // __HTP_ARCH__ > 75
}
@@ -720,6 +726,24 @@ static inline HVX_Vector hvx_vec_inverse_fp32(HVX_Vector v_sf) {
return Q6_Vsf_equals_Vqf32(r_qf);
}
static inline HVX_Vector hvx_vec_inverse_fp32_guard(HVX_Vector v_sf) {
static const float kInf = INFINITY;
static const uint32_t kNanMask = 0x7fffffff;
static const uint32_t kNanMin = 0x7f800000;
const HVX_Vector inf = hvx_vec_splat_fp32(kInf);
const HVX_VectorPred pred_inf = Q6_Q_vcmp_gt_VsfVsf(inf, v_sf);
HVX_Vector out = hvx_vec_inverse_fp32(v_sf);
const HVX_Vector nan_mask = Q6_V_vsplat_R(kNanMask);
const HVX_Vector nan_min = Q6_V_vsplat_R(kNanMin);
HVX_Vector masked_out = Q6_V_vand_VV(out, nan_mask);
const HVX_VectorPred pred = Q6_Q_vcmp_gtand_QVuwVuw(pred_inf, nan_min, masked_out);
return Q6_V_vmux_QVV(pred, out, Q6_V_vzero());
}
#define FAST_SIGMOID_LOG2F (0x3fb8aa3b) // 1.442695022
#define FAST_SIGMOID_C1 (0x3d009076) // 0.03138777
#define FAST_SIGMOID_C2 (0x3e8d74bd) // 0.276281267
@@ -934,6 +958,16 @@ static inline HVX_Vector hvx_vec_rsqrt_fp32(HVX_Vector in_vec) {
return Q6_Vsf_equals_Vqf32(temp);
}
static inline HVX_Vector hvx_vec_fast_sigmoid_fp32_guard(HVX_Vector v) {
static const float kMaxExp = -88.02f; // log(INF)
const HVX_Vector max_exp = Q6_V_vsplat_R(*((uint32_t *) &kMaxExp));
const HVX_VectorPred pred_inf = Q6_Q_vcmp_gt_VsfVsf(v, max_exp);
HVX_Vector out = hvx_vec_fast_sigmoid_fp32(v);
return Q6_V_vmux_QVV(pred_inf, out, Q6_V_vzero());
}
static inline void hvx_fast_sigmoid_f32(const uint8_t * restrict src, uint8_t * restrict dst, const int num_elems) {
int step_of_1 = num_elems >> 5;
int remaining = num_elems - step_of_1 * VLEN_FP32;
@@ -945,7 +979,7 @@ static inline void hvx_fast_sigmoid_f32(const uint8_t * restrict src, uint8_t *
#pragma unroll(4)
for (int i = 0; i < step_of_1; i++) {
v_dst[i] = hvx_vec_fast_sigmoid_fp32(v_src[i]);
v_dst[i] = hvx_vec_fast_sigmoid_fp32_guard(v_src[i]);
}
}

View File

@@ -11,6 +11,7 @@
#include <cassert>
#include <algorithm>
#include <limits>
#include <cmath>
static ggml_metal_buffer_id ggml_metal_get_buffer_id(const ggml_tensor * t) {
if (!t) {

View File

@@ -6895,9 +6895,23 @@ static void ggml_cl_mul_mat(ggml_backend_t backend, const ggml_tensor * src0, co
cl_context context = backend_ctx->context;
if(src0t == GGML_TYPE_F16 && src1t == GGML_TYPE_F32){
if (ne01 >= 64 && ne1 >= 32 && ne00 >= 16 && (ne12 % ne02) == 0){
ggml_cl_mul_mat_kq_kqv_adreno(backend, src0, src1, dst);
return;
if (ne01 >= 64 && ne1 >= 32 && ne00 >= 16 && (ne12 % ne02) == 0) {
// For KQ
if (ggml_is_permuted(src0) && ggml_is_permuted(src1) &&
nb00 <= nb02 &&
nb02 <= nb01 &&
nb01 <= nb03 &&
nb10 <= nb12 &&
nb12 <= nb11 &&
nb11 <= nb13) {
ggml_cl_mul_mat_kq_kqv_adreno(backend, src0, src1, dst);
return;
}
// For KQV
if (!ggml_is_contiguous(src0) && ggml_is_contiguous(src1)) {
ggml_cl_mul_mat_kq_kqv_adreno(backend, src0, src1, dst);
return;
}
}
}

View File

@@ -513,6 +513,7 @@ struct vk_device_struct {
vk_queue compute_queue;
vk_queue transfer_queue;
bool single_queue;
bool support_async;
uint32_t subgroup_size;
uint32_t shader_core_count;
bool uma;
@@ -669,6 +670,20 @@ struct vk_device_struct {
vk_pipeline pipeline_hardsigmoid[2];
vk_pipeline pipeline_hardswish[2];
vk_pipeline pipeline_abs[2];
vk_pipeline pipeline_softplus[2];
vk_pipeline pipeline_step[2];
vk_pipeline pipeline_round[2];
vk_pipeline pipeline_ceil[2];
vk_pipeline pipeline_floor[2];
vk_pipeline pipeline_trunc[2];
vk_pipeline pipeline_add1_f16_f16;
vk_pipeline pipeline_add1_f16_f32;
vk_pipeline pipeline_add1_f32_f32;
vk_pipeline pipeline_arange_f32;
vk_pipeline pipeline_fill_f32;
vk_pipeline pipeline_geglu[2];
vk_pipeline pipeline_reglu[2];
@@ -3841,6 +3856,12 @@ static void ggml_vk_load_shaders(vk_device& device) {
CREATE_UNARY(hardsigmoid)
CREATE_UNARY(hardswish)
CREATE_UNARY(abs)
CREATE_UNARY(softplus)
CREATE_UNARY(step)
CREATE_UNARY(round)
CREATE_UNARY(ceil)
CREATE_UNARY(floor)
CREATE_UNARY(trunc)
#undef CREATE_UNARY
#define CREATE_UNARY_RTE(name) \
@@ -3854,6 +3875,14 @@ static void ggml_vk_load_shaders(vk_device& device) {
CREATE_UNARY_RTE(exp)
#undef CREATE_UNARY_RTE
ggml_vk_create_pipeline(device, device->pipeline_add1_f16_f16, "add1_f16_f16", add1_f16_f16_len, add1_f16_f16_data, "main", 3, sizeof(vk_op_binary_push_constants), {512, 1, 1}, {}, 1);
ggml_vk_create_pipeline(device, device->pipeline_add1_f16_f32, "add1_f16_f32", add1_f16_f32_len, add1_f16_f32_data, "main", 3, sizeof(vk_op_binary_push_constants), {512, 1, 1}, {}, 1);
ggml_vk_create_pipeline(device, device->pipeline_add1_f32_f32, "add1_f32_f32", add1_f32_f32_len, add1_f32_f32_data, "main", 3, sizeof(vk_op_binary_push_constants), {512, 1, 1}, {}, 1);
ggml_vk_create_pipeline(device, device->pipeline_arange_f32, "arange_f32", arange_f32_len, arange_f32_data, "main", 1, sizeof(vk_op_unary_push_constants), {512, 1, 1}, {}, 1);
ggml_vk_create_pipeline(device, device->pipeline_fill_f32, "fill_f32", fill_f32_len, fill_f32_data, "main", 1, sizeof(vk_op_unary_push_constants), {512, 1, 1}, {}, 1);
#define CREATE_GLU(name) \
if (device->float_controls_rte_fp16) { \
ggml_vk_create_pipeline(device, device->pipeline_ ## name [0], #name "_f32_rte", name ## _f32_rte_len, name ## _f32_rte_data, "main", 3, sizeof(vk_op_glu_push_constants), {512, 1, 1}, {}, 1, true); \
@@ -4245,6 +4274,16 @@ static vk_device ggml_vk_get_device(size_t idx) {
device->vendor_id = device->properties.vendorID;
device->driver_id = driver_props.driverID;
// Implementing the async backend interfaces seems broken on older Intel HW,
// see https://github.com/ggml-org/llama.cpp/issues/17302.
device->support_async = (device->vendor_id != VK_VENDOR_ID_INTEL ||
std::string(device->properties.deviceName.data()).find("(DG1)") == std::string::npos) &&
getenv("GGML_VK_DISABLE_ASYNC") == nullptr;
if (!device->support_async) {
GGML_LOG_DEBUG("ggml_vulkan: WARNING: Async execution disabled on certain Intel devices.\n");
}
const char* GGML_VK_FORCE_MAX_ALLOCATION_SIZE = getenv("GGML_VK_FORCE_MAX_ALLOCATION_SIZE");
if (GGML_VK_FORCE_MAX_ALLOCATION_SIZE != nullptr) {
@@ -5579,7 +5618,7 @@ static vk_subbuffer ggml_vk_tensor_subbuffer(
const ggml_backend_vk_context * ctx, const ggml_tensor * tensor, bool allow_misalign = false) {
vk_buffer buffer = nullptr;
size_t offset = 0;
uint64_t offset = 0;
if (ctx->device->uma) {
ggml_vk_host_get(ctx->device, tensor->data, buffer, offset);
}
@@ -5590,9 +5629,9 @@ static vk_subbuffer ggml_vk_tensor_subbuffer(
}
GGML_ASSERT(buffer != nullptr);
size_t size = ggml_nbytes(tensor);
uint64_t size = ggml_nbytes(tensor);
size_t misalign_bytes = offset & (ctx->device->properties.limits.minStorageBufferOffsetAlignment - 1);
uint64_t misalign_bytes = offset & (ctx->device->properties.limits.minStorageBufferOffsetAlignment - 1);
// The shader must support misaligned offsets when indexing into the buffer
GGML_ASSERT(allow_misalign || misalign_bytes == 0);
offset &= ~misalign_bytes;
@@ -6856,13 +6895,13 @@ static void ggml_vk_mul_mat_vec_q_f16(ggml_backend_vk_context * ctx, vk_context&
vk_subbuffer d_X, d_Y;
if (qx_needs_dequant) {
d_X = { ctx->prealloc_x, 0, ctx->prealloc_x->size };
d_X = { ctx->prealloc_x, 0, x_sz };
} else {
d_X = d_Qx;
GGML_ASSERT(qx_sz == x_sz);
}
if (qy_needs_dequant || quantize_y) {
d_Y = { ctx->prealloc_y, 0, ctx->prealloc_y->size };
d_Y = { ctx->prealloc_y, 0, y_sz };
} else {
d_Y = d_Qy;
}
@@ -7563,12 +7602,12 @@ static void ggml_vk_mul_mat_vec_id_q_f16(ggml_backend_vk_context * ctx, vk_conte
vk_subbuffer d_X, d_Y;
if (qx_needs_dequant) {
d_X = { ctx->prealloc_x, 0, ctx->prealloc_x->size };
d_X = { ctx->prealloc_x, 0, x_sz };
} else {
d_X = d_Qx;
}
if (qy_needs_dequant) {
d_Y = { ctx->prealloc_y, 0, ctx->prealloc_y->size };
d_Y = { ctx->prealloc_y, 0, y_sz };
} else {
d_Y = d_Qy;
}
@@ -8279,6 +8318,18 @@ static vk_pipeline ggml_vk_op_get_pipeline(ggml_backend_vk_context * ctx, const
return ctx->device->pipeline_hardswish[dst->type == GGML_TYPE_F16];
case GGML_UNARY_OP_ABS:
return ctx->device->pipeline_abs[dst->type == GGML_TYPE_F16];
case GGML_UNARY_OP_SOFTPLUS:
return ctx->device->pipeline_softplus[dst->type == GGML_TYPE_F16];
case GGML_UNARY_OP_STEP:
return ctx->device->pipeline_step[dst->type == GGML_TYPE_F16];
case GGML_UNARY_OP_ROUND:
return ctx->device->pipeline_round[dst->type == GGML_TYPE_F16];
case GGML_UNARY_OP_CEIL:
return ctx->device->pipeline_ceil[dst->type == GGML_TYPE_F16];
case GGML_UNARY_OP_FLOOR:
return ctx->device->pipeline_floor[dst->type == GGML_TYPE_F16];
case GGML_UNARY_OP_TRUNC:
return ctx->device->pipeline_trunc[dst->type == GGML_TYPE_F16];
default:
break;
}
@@ -8473,7 +8524,7 @@ static vk_pipeline ggml_vk_op_get_pipeline(ggml_backend_vk_context * ctx, const
case GGML_OP_CONV_TRANSPOSE_2D:
if (src1->type == GGML_TYPE_F32 && dst->type == GGML_TYPE_F32 &&
ggml_is_contiguous(src0) && ggml_is_contiguous(src1) && ggml_is_contiguous(dst)) {
std::array<uint32_t, 3> elements;
std::array<uint32_t, 3> elements{};
if (op == GGML_OP_CONV_2D) elements = ggml_vk_get_conv_elements(dst);
else if (op == GGML_OP_CONV_TRANSPOSE_2D) elements = ggml_vk_get_conv_transpose_2d_elements(dst);
vk_conv_shapes shape;
@@ -8551,6 +8602,27 @@ static vk_pipeline ggml_vk_op_get_pipeline(ggml_backend_vk_context * ctx, const
}
}
return nullptr;
case GGML_OP_ADD1:
if (src0->type == GGML_TYPE_F16 && src1->type == GGML_TYPE_F16 && dst->type == GGML_TYPE_F16) {
return ctx->device->pipeline_add1_f16_f16;
}
if (src0->type == GGML_TYPE_F16 && src1->type == GGML_TYPE_F32 && dst->type == GGML_TYPE_F16) {
return ctx->device->pipeline_add1_f16_f32;
}
if (src0->type == GGML_TYPE_F32 && src1->type == GGML_TYPE_F32 && dst->type == GGML_TYPE_F32) {
return ctx->device->pipeline_add1_f32_f32;
}
return nullptr;
case GGML_OP_ARANGE:
if (dst->type == GGML_TYPE_F32) {
return ctx->device->pipeline_arange_f32;
}
return nullptr;
case GGML_OP_FILL:
if (dst->type == GGML_TYPE_F32) {
return ctx->device->pipeline_fill_f32;
}
return nullptr;
default:
return nullptr;
}
@@ -8840,6 +8912,9 @@ static void ggml_vk_op_f32(ggml_backend_vk_context * ctx, vk_context& subctx, co
case GGML_OP_SUB:
case GGML_OP_DIV:
case GGML_OP_MUL:
case GGML_OP_ADD1:
case GGML_OP_ARANGE:
case GGML_OP_FILL:
case GGML_OP_SCALE:
case GGML_OP_SQR:
case GGML_OP_SQRT:
@@ -9457,6 +9532,63 @@ static void ggml_vk_sqrt(ggml_backend_vk_context * ctx, vk_context& subctx, cons
ggml_vk_op_f32(ctx, subctx, src0, nullptr, nullptr, nullptr, dst, GGML_OP_SQRT, vk_op_unary_push_constants_init(src0, dst));
}
static void ggml_vk_add1(ggml_backend_vk_context * ctx, vk_context& subctx, const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {
const uint32_t src0_type_size = ggml_type_size(src0->type);
const uint32_t src1_type_size = ggml_type_size(src1->type);
const uint32_t dst_type_size = ggml_type_size(dst->type);
ggml_vk_op_f32<vk_op_binary_push_constants>(ctx, subctx, src0, src1, nullptr, nullptr, dst, GGML_OP_ADD1, {
(uint32_t)ggml_nelements(src0),
(uint32_t)src0->ne[0], (uint32_t)src0->ne[1], (uint32_t)src0->ne[2],(uint32_t)src0->ne[3], (uint32_t)src0->nb[0] / src0_type_size, (uint32_t)src0->nb[1] / src0_type_size, (uint32_t)src0->nb[2] / src0_type_size, (uint32_t)src0->nb[3] / src0_type_size,
(uint32_t)src1->ne[0], (uint32_t)src1->ne[1], (uint32_t)src1->ne[2],(uint32_t)src1->ne[3], (uint32_t)src1->nb[0] / src1_type_size, (uint32_t)src1->nb[1] / src1_type_size, (uint32_t)src1->nb[2] / src1_type_size, (uint32_t)src1->nb[3] / src1_type_size,
(uint32_t) dst->ne[0], (uint32_t) dst->ne[1], (uint32_t) dst->ne[2],(uint32_t) dst->ne[3], (uint32_t) dst->nb[0] / dst_type_size, (uint32_t) dst->nb[1] / dst_type_size, (uint32_t) dst->nb[2] / dst_type_size, (uint32_t) dst->nb[3] / dst_type_size,
0,
0.0f, 0.0f, 0,
});
}
static void ggml_vk_arange(ggml_backend_vk_context * ctx, vk_context& subctx, ggml_tensor * dst) {
VK_LOG_DEBUG("ggml_vk_arange(dst=" << dst << ", ne=" << ggml_nelements(dst) << ")");
vk_op_push_constants pc = {
(uint32_t)ggml_nelements(dst),
1,
ggml_get_op_params_f32(dst, 0),
ggml_get_op_params_f32(dst, 2),
};
vk_pipeline pipeline = ggml_vk_op_get_pipeline(ctx, nullptr, nullptr, nullptr, dst, GGML_OP_ARANGE);
GGML_ASSERT(pipeline != nullptr);
ggml_pipeline_request_descriptor_sets(ctx, pipeline, 1);
vk_subbuffer dst_buf = ggml_vk_tensor_subbuffer(ctx, dst, false);
std::array<uint32_t, 3> elements = { (uint32_t)ggml_nelements(dst), 1, 1 };
ggml_vk_dispatch_pipeline(ctx, subctx, pipeline, { dst_buf }, pc, elements);
}
static void ggml_vk_fill(ggml_backend_vk_context * ctx, vk_context& subctx, ggml_tensor * dst) {
VK_LOG_DEBUG("ggml_vk_fill(dst=" << dst << ", ne=" << ggml_nelements(dst) << ")");
vk_op_push_constants pc = {
(uint32_t)ggml_nelements(dst),
1,
ggml_get_op_params_f32(dst, 0),
0.0f,
};
vk_pipeline pipeline = ggml_vk_op_get_pipeline(ctx, nullptr, nullptr, nullptr, dst, GGML_OP_FILL);
GGML_ASSERT(pipeline != nullptr);
ggml_pipeline_request_descriptor_sets(ctx, pipeline, 1);
vk_subbuffer dst_buf = ggml_vk_tensor_subbuffer(ctx, dst, false);
std::array<uint32_t, 3> elements = { (uint32_t)ggml_nelements(dst), 1, 1 };
ggml_vk_dispatch_pipeline(ctx, subctx, pipeline, { dst_buf }, pc, elements);
}
static void ggml_vk_sin(ggml_backend_vk_context * ctx, vk_context& subctx, const ggml_tensor * src0, ggml_tensor * dst) {
ggml_vk_op_f32(ctx, subctx, src0, nullptr, nullptr, nullptr, dst, GGML_OP_SIN, vk_op_unary_push_constants_init(src0, dst));
}
@@ -11249,13 +11381,13 @@ static void ggml_vk_preallocate_buffers(ggml_backend_vk_context * ctx, vk_contex
}
}
static bool ggml_vk_compute_forward(ggml_backend_vk_context* ctx, ggml_cgraph * cgraph, ggml_tensor* tensor, int tensor_idx, bool almost_ready);
static void ggml_vk_compute_forward(ggml_backend_vk_context* ctx, ggml_cgraph * cgraph, ggml_tensor* tensor, int tensor_idx, bool almost_ready);
// Returns true if node has enqueued work into the queue, false otherwise
// If submit is true the current all operations queued so far are being submitted to Vulkan to overlap cmdlist creation and GPU execution.
static bool ggml_vk_build_graph(ggml_backend_vk_context * ctx, ggml_cgraph * cgraph, int node_idx, ggml_tensor *node_begin, int node_idx_begin, bool last_node, bool almost_ready, bool submit){
ggml_tensor * node = cgraph->nodes[node_idx];
if (ggml_is_empty(node) || !node->buffer) {
if (ggml_is_empty(node) || ggml_op_is_empty(node->op) || !node->buffer) {
return false;
}
@@ -11267,123 +11399,19 @@ static bool ggml_vk_build_graph(ggml_backend_vk_context * ctx, ggml_cgraph * cgr
ggml_tensor * src2 = node->src[2];
ggml_tensor * src3 = node->src[3];
switch (node->op) {
// Return on empty ops to avoid generating a compute_ctx and setting exit_tensor
case GGML_OP_RESHAPE:
case GGML_OP_VIEW:
case GGML_OP_PERMUTE:
case GGML_OP_TRANSPOSE:
case GGML_OP_NONE:
return false;
case GGML_OP_UNARY:
switch (ggml_get_unary_op(node)) {
case GGML_UNARY_OP_EXP:
case GGML_UNARY_OP_SILU:
case GGML_UNARY_OP_GELU:
case GGML_UNARY_OP_GELU_ERF:
case GGML_UNARY_OP_GELU_QUICK:
case GGML_UNARY_OP_RELU:
case GGML_UNARY_OP_NEG:
case GGML_UNARY_OP_TANH:
case GGML_UNARY_OP_SIGMOID:
case GGML_UNARY_OP_HARDSIGMOID:
case GGML_UNARY_OP_HARDSWISH:
case GGML_UNARY_OP_ABS:
break;
default:
return false;
}
break;
case GGML_OP_GLU:
switch (ggml_get_glu_op(node)) {
case GGML_GLU_OP_GEGLU:
case GGML_GLU_OP_REGLU:
case GGML_GLU_OP_SWIGLU:
case GGML_GLU_OP_SWIGLU_OAI:
case GGML_GLU_OP_GEGLU_ERF:
case GGML_GLU_OP_GEGLU_QUICK:
break;
default:
return false;
}
break;
case GGML_OP_ADD:
{
int next_node_idx = node_idx + 1 + ctx->num_additional_fused_ops;
if (next_node_idx < cgraph->n_nodes &&
cgraph->nodes[next_node_idx]->op == GGML_OP_RMS_NORM &&
cgraph->nodes[next_node_idx]->src[0] == cgraph->nodes[next_node_idx - 1] &&
ggml_nrows(cgraph->nodes[next_node_idx]) == 1 &&
ctx->device->add_rms_fusion) {
uint32_t size = ggml_vk_rms_partials_size(ctx, cgraph->nodes[node_idx]);
ctx->do_add_rms_partials_offset_calculation = true;
if (ctx->prealloc_size_add_rms_partials_offset + size <= ctx->prealloc_size_add_rms_partials) {
ctx->do_add_rms_partials = true;
}
if (node->op == GGML_OP_ADD) {
int next_node_idx = node_idx + 1 + ctx->num_additional_fused_ops;
if (next_node_idx < cgraph->n_nodes &&
cgraph->nodes[next_node_idx]->op == GGML_OP_RMS_NORM &&
cgraph->nodes[next_node_idx]->src[0] == cgraph->nodes[next_node_idx - 1] &&
ggml_nrows(cgraph->nodes[next_node_idx]) == 1 &&
ctx->device->add_rms_fusion) {
uint32_t size = ggml_vk_rms_partials_size(ctx, cgraph->nodes[node_idx]);
ctx->do_add_rms_partials_offset_calculation = true;
if (ctx->prealloc_size_add_rms_partials_offset + size <= ctx->prealloc_size_add_rms_partials) {
ctx->do_add_rms_partials = true;
}
} break;
case GGML_OP_REPEAT:
case GGML_OP_REPEAT_BACK:
case GGML_OP_GET_ROWS:
case GGML_OP_ADD_ID:
case GGML_OP_ACC:
case GGML_OP_SUB:
case GGML_OP_MUL:
case GGML_OP_DIV:
case GGML_OP_CONCAT:
case GGML_OP_UPSCALE:
case GGML_OP_SCALE:
case GGML_OP_SQR:
case GGML_OP_SQRT:
case GGML_OP_SIN:
case GGML_OP_COS:
case GGML_OP_LOG:
case GGML_OP_CLAMP:
case GGML_OP_PAD:
case GGML_OP_ROLL:
case GGML_OP_CPY:
case GGML_OP_SET_ROWS:
case GGML_OP_CONT:
case GGML_OP_DUP:
case GGML_OP_SILU_BACK:
case GGML_OP_NORM:
case GGML_OP_GROUP_NORM:
case GGML_OP_RMS_NORM:
case GGML_OP_RMS_NORM_BACK:
case GGML_OP_L2_NORM:
case GGML_OP_DIAG_MASK_INF:
case GGML_OP_SOFT_MAX:
case GGML_OP_SOFT_MAX_BACK:
case GGML_OP_ROPE:
case GGML_OP_ROPE_BACK:
case GGML_OP_MUL_MAT:
case GGML_OP_MUL_MAT_ID:
case GGML_OP_ARGSORT:
case GGML_OP_SUM:
case GGML_OP_SUM_ROWS:
case GGML_OP_MEAN:
case GGML_OP_ARGMAX:
case GGML_OP_COUNT_EQUAL:
case GGML_OP_IM2COL:
case GGML_OP_IM2COL_3D:
case GGML_OP_TIMESTEP_EMBEDDING:
case GGML_OP_CONV_TRANSPOSE_1D:
case GGML_OP_POOL_2D:
case GGML_OP_CONV_2D:
case GGML_OP_CONV_TRANSPOSE_2D:
case GGML_OP_CONV_2D_DW:
case GGML_OP_RWKV_WKV6:
case GGML_OP_RWKV_WKV7:
case GGML_OP_SSM_SCAN:
case GGML_OP_SSM_CONV:
case GGML_OP_LEAKY_RELU:
case GGML_OP_FLASH_ATTN_EXT:
case GGML_OP_OPT_STEP_ADAMW:
case GGML_OP_OPT_STEP_SGD:
break;
default:
std::cerr << "ggml_vulkan: Error: Missing op: " << ggml_op_name(node->op) << std::endl;
GGML_ABORT("fatal error");
}
}
vk_context compute_ctx;
@@ -11542,6 +11570,18 @@ static bool ggml_vk_build_graph(ggml_backend_vk_context * ctx, ggml_cgraph * cgr
case GGML_OP_UPSCALE:
ggml_vk_upscale(ctx, compute_ctx, src0, node);
break;
case GGML_OP_ADD1:
ggml_vk_add1(ctx, compute_ctx, src0, src1, node);
break;
case GGML_OP_ARANGE:
ggml_vk_arange(ctx, compute_ctx, node);
break;
case GGML_OP_FILL:
ggml_vk_fill(ctx, compute_ctx, node);
break;
case GGML_OP_SCALE:
ggml_vk_scale(ctx, compute_ctx, src0, node);
@@ -11626,6 +11666,12 @@ static bool ggml_vk_build_graph(ggml_backend_vk_context * ctx, ggml_cgraph * cgr
case GGML_UNARY_OP_HARDSIGMOID:
case GGML_UNARY_OP_HARDSWISH:
case GGML_UNARY_OP_ABS:
case GGML_UNARY_OP_SOFTPLUS:
case GGML_UNARY_OP_STEP:
case GGML_UNARY_OP_ROUND:
case GGML_UNARY_OP_CEIL:
case GGML_UNARY_OP_FLOOR:
case GGML_UNARY_OP_TRUNC:
ggml_vk_unary(ctx, compute_ctx, src0, node);
break;
default:
@@ -11802,136 +11848,14 @@ static bool ggml_vk_build_graph(ggml_backend_vk_context * ctx, ggml_cgraph * cgr
ctx->compute_ctx.reset();
bool ok = ggml_vk_compute_forward(ctx, cgraph, node_begin, node_idx_begin, almost_ready);
if (!ok) {
if (node->op == GGML_OP_UNARY) {
std::cerr << __func__ << ": error: op not supported UNARY " << node->name << " (" << ggml_unary_op_name(static_cast<ggml_unary_op>(node->op_params[0])) << ")" << std::endl;
} else if (node->op == GGML_OP_GLU) {
std::cerr << __func__ << ": error: op not supported GLU " << node->name << " (" << ggml_glu_op_name(static_cast<ggml_glu_op>(node->op_params[0])) << ")" << std::endl;
} else {
std::cerr << __func__ << ": error: op not supported " << node->name << " (" << ggml_op_name(node->op) << ")" << std::endl;
}
}
ggml_vk_compute_forward(ctx, cgraph, node_begin, node_idx_begin, almost_ready);
}
return true;
}
static bool ggml_vk_compute_forward(ggml_backend_vk_context * ctx, ggml_cgraph * cgraph, ggml_tensor * tensor, int tensor_idx, bool almost_ready = false) {
static void ggml_vk_compute_forward(ggml_backend_vk_context * ctx, ggml_cgraph * cgraph, ggml_tensor * tensor, int tensor_idx, bool almost_ready = false) {
GGML_UNUSED(cgraph);
ggml_backend_buffer * buf = nullptr;
switch (tensor->op) {
case GGML_OP_ADD:
case GGML_OP_ACC:
case GGML_OP_GET_ROWS:
case GGML_OP_SUB:
case GGML_OP_MUL:
case GGML_OP_DIV:
case GGML_OP_ADD_ID:
case GGML_OP_CONCAT:
case GGML_OP_UPSCALE:
case GGML_OP_SCALE:
case GGML_OP_SQR:
case GGML_OP_SQRT:
case GGML_OP_SIN:
case GGML_OP_COS:
case GGML_OP_LOG:
case GGML_OP_CLAMP:
case GGML_OP_PAD:
case GGML_OP_ROLL:
case GGML_OP_CPY:
case GGML_OP_SET_ROWS:
case GGML_OP_CONT:
case GGML_OP_DUP:
case GGML_OP_SILU_BACK:
case GGML_OP_NORM:
case GGML_OP_GROUP_NORM:
case GGML_OP_RMS_NORM:
case GGML_OP_RMS_NORM_BACK:
case GGML_OP_L2_NORM:
case GGML_OP_DIAG_MASK_INF:
case GGML_OP_SOFT_MAX:
case GGML_OP_SOFT_MAX_BACK:
case GGML_OP_ROPE:
case GGML_OP_ROPE_BACK:
case GGML_OP_RESHAPE:
case GGML_OP_VIEW:
case GGML_OP_PERMUTE:
case GGML_OP_TRANSPOSE:
case GGML_OP_NONE:
case GGML_OP_ARGSORT:
case GGML_OP_SUM:
case GGML_OP_SUM_ROWS:
case GGML_OP_MEAN:
case GGML_OP_ARGMAX:
case GGML_OP_COUNT_EQUAL:
case GGML_OP_IM2COL:
case GGML_OP_IM2COL_3D:
case GGML_OP_TIMESTEP_EMBEDDING:
case GGML_OP_CONV_TRANSPOSE_1D:
case GGML_OP_POOL_2D:
case GGML_OP_CONV_2D:
case GGML_OP_CONV_TRANSPOSE_2D:
case GGML_OP_CONV_2D_DW:
case GGML_OP_RWKV_WKV6:
case GGML_OP_RWKV_WKV7:
case GGML_OP_SSM_SCAN:
case GGML_OP_SSM_CONV:
case GGML_OP_LEAKY_RELU:
case GGML_OP_REPEAT:
case GGML_OP_REPEAT_BACK:
case GGML_OP_OPT_STEP_ADAMW:
case GGML_OP_OPT_STEP_SGD:
buf = tensor->buffer;
break;
case GGML_OP_UNARY:
switch (ggml_get_unary_op(tensor)) {
case GGML_UNARY_OP_EXP:
case GGML_UNARY_OP_SILU:
case GGML_UNARY_OP_GELU:
case GGML_UNARY_OP_GELU_ERF:
case GGML_UNARY_OP_GELU_QUICK:
case GGML_UNARY_OP_RELU:
case GGML_UNARY_OP_NEG:
case GGML_UNARY_OP_TANH:
case GGML_UNARY_OP_SIGMOID:
case GGML_UNARY_OP_HARDSIGMOID:
case GGML_UNARY_OP_HARDSWISH:
case GGML_UNARY_OP_ABS:
buf = tensor->buffer;
break;
default:
return false;
}
break;
case GGML_OP_GLU:
switch (ggml_get_glu_op(tensor)) {
case GGML_GLU_OP_GEGLU:
case GGML_GLU_OP_REGLU:
case GGML_GLU_OP_SWIGLU:
case GGML_GLU_OP_SWIGLU_OAI:
case GGML_GLU_OP_GEGLU_ERF:
case GGML_GLU_OP_GEGLU_QUICK:
buf = tensor->buffer;
break;
default:
return false;
}
break;
case GGML_OP_MUL_MAT:
case GGML_OP_MUL_MAT_ID:
case GGML_OP_FLASH_ATTN_EXT:
buf = tensor->buffer;
break;
default:
return false;
}
if (buf == nullptr) {
return false;
}
GGML_UNUSED(tensor);
VK_LOG_DEBUG("ggml_vk_compute_forward(" << tensor << ", name=" << tensor->name << ", op=" << ggml_op_name(tensor->op) << ", type=" << tensor->type << ", ne0=" << tensor->ne[0] << ", ne1=" << tensor->ne[1] << ", ne2=" << tensor->ne[2] << ", ne3=" << tensor->ne[3] << ", nb0=" << tensor->nb[0] << ", nb1=" << tensor->nb[1] << ", nb2=" << tensor->nb[2] << ", nb3=" << tensor->nb[3] << ", view_src=" << tensor->view_src << ", view_offs=" << tensor->view_offs << ")");
@@ -11975,8 +11899,6 @@ static bool ggml_vk_compute_forward(ggml_backend_vk_context * ctx, ggml_cgraph *
subctx->out_memcpys.clear();
subctx->memsets.clear();
}
return true;
}
// Clean up after graph processing is done
@@ -13030,6 +12952,10 @@ static ggml_status ggml_backend_vk_graph_compute(ggml_backend_t backend, ggml_cg
ctx->device->perf_logger->print_timings();
}
if (!ctx->device->support_async) {
ggml_vk_synchronize(ctx);
}
return GGML_STATUS_SUCCESS;
UNUSED(backend);
@@ -13323,6 +13249,10 @@ ggml_backend_t ggml_backend_vk_init(size_t dev_num) {
/* .context = */ ctx,
};
if (!ctx->device->support_async) {
vk_backend->iface.get_tensor_async = nullptr;
}
return vk_backend;
}
@@ -13501,6 +13431,12 @@ static bool ggml_backend_vk_device_supports_op(ggml_backend_dev_t dev, const ggm
case GGML_UNARY_OP_HARDSIGMOID:
case GGML_UNARY_OP_HARDSWISH:
case GGML_UNARY_OP_ABS:
case GGML_UNARY_OP_SOFTPLUS:
case GGML_UNARY_OP_STEP:
case GGML_UNARY_OP_ROUND:
case GGML_UNARY_OP_CEIL:
case GGML_UNARY_OP_FLOOR:
case GGML_UNARY_OP_TRUNC:
return ggml_is_contiguous(op->src[0]) &&
(op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) &&
(op->type == GGML_TYPE_F32 || op->type == GGML_TYPE_F16) &&
@@ -13818,6 +13754,9 @@ static bool ggml_backend_vk_device_supports_op(ggml_backend_dev_t dev, const ggm
case GGML_OP_UPSCALE:
case GGML_OP_ACC:
case GGML_OP_CONCAT:
case GGML_OP_ADD1:
case GGML_OP_ARANGE:
case GGML_OP_FILL:
case GGML_OP_SCALE:
case GGML_OP_PAD:
case GGML_OP_ROLL:
@@ -14300,6 +14239,16 @@ static void ggml_vk_check_results_0(ggml_backend_vk_context * ctx, ggml_cgraph *
} else if (tensor->op == GGML_OP_SCALE) {
const float * params = (const float *)tensor->op_params;
tensor_clone = ggml_scale_bias(ggml_ctx, src_clone[0], params[0], params[1]);
} else if (tensor->op == GGML_OP_ADD1) {
tensor_clone = ggml_add1(ggml_ctx, src_clone[0], src_clone[1]);
} else if (tensor->op == GGML_OP_ARANGE) {
const float start = ggml_get_op_params_f32(tensor, 0);
const float stop = ggml_get_op_params_f32(tensor, 1);
const float step = ggml_get_op_params_f32(tensor, 2);
tensor_clone = ggml_arange(ggml_ctx, start, stop, step);
} else if (tensor->op == GGML_OP_FILL) {
const float value = ggml_get_op_params_f32(tensor, 0);
tensor_clone = ggml_fill(ggml_ctx, tensor_clone, value);
} else if (tensor->op == GGML_OP_SQR) {
tensor_clone = ggml_sqr(ggml_ctx, src_clone[0]);
} else if (tensor->op == GGML_OP_SQRT) {
@@ -14413,6 +14362,24 @@ static void ggml_vk_check_results_0(ggml_backend_vk_context * ctx, ggml_cgraph *
case GGML_UNARY_OP_ABS:
tensor_clone = ggml_abs(ggml_ctx, src_clone[0]);
break;
case GGML_UNARY_OP_SOFTPLUS:
tensor_clone = ggml_softplus(ggml_ctx, src_clone[0]);
break;
case GGML_UNARY_OP_STEP:
tensor_clone = ggml_step(ggml_ctx, src_clone[0]);
break;
case GGML_UNARY_OP_ROUND:
tensor_clone = ggml_round(ggml_ctx, src_clone[0]);
break;
case GGML_UNARY_OP_CEIL:
tensor_clone = ggml_ceil(ggml_ctx, src_clone[0]);
break;
case GGML_UNARY_OP_FLOOR:
tensor_clone = ggml_floor(ggml_ctx, src_clone[0]);
break;
case GGML_UNARY_OP_TRUNC:
tensor_clone = ggml_trunc(ggml_ctx, src_clone[0]);
break;
default:
std::cerr << "Missing vk_check_results OP: " << ggml_op_name(tensor->op) << std::endl;
GGML_ABORT("fatal error");

View File

@@ -0,0 +1,28 @@
#version 450
#extension GL_EXT_shader_16bit_storage : require
#include "types.glsl"
#include "generic_binary_head.glsl"
const uint num_threads = 256;
layout(local_size_x = num_threads, local_size_y = 1, local_size_z = 1) in;
void main() {
uint idx = get_idx();
const uint num_iter = 2;
[[unroll]] for (uint i = 0; i < num_iter; ++i) {
if (idx >= p.ne) {
continue;
}
uint i00, i01, i02, i03;
get_indices(idx, i00, i01, i02, i03);
data_d[get_doffset() + dst_idx(i00, i01, i02, i03)] = D_TYPE(FLOAT_TYPE(data_a[get_aoffset() + src0_idx(i00, i01, i02, i03)]) + FLOAT_TYPE(data_b[get_boffset()]));
idx += num_threads;
}
}

View File

@@ -0,0 +1,20 @@
#version 450
#include "generic_head.glsl"
#include "types.glsl"
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
layout (binding = 0) writeonly buffer D {D_TYPE data_d[];};
void main() {
const uint i = gl_GlobalInvocationID.x;
if (i >= p.KX) {
return;
}
// p.param1 = start, p.param2 = step
float value = p.param1 + p.param2 * float(i);
data_d[i] = D_TYPE(value);
}

View File

@@ -0,0 +1,22 @@
#version 450
#include "generic_head.glsl"
#include "types.glsl"
#extension GL_EXT_control_flow_attributes : enable
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
layout (binding = 0) readonly buffer X {A_TYPE data_a[];};
layout (binding = 1) writeonly buffer D {D_TYPE data_d[];};
void main() {
const uint i = gl_GlobalInvocationID.z * 262144 + gl_GlobalInvocationID.y * 512 + gl_GlobalInvocationID.x;
if (i >= p.KX) {
return;
}
const float x = float(data_a[i]);
data_d[i] = D_TYPE(ceil(x));
}

View File

@@ -0,0 +1,19 @@
#version 450
#include "generic_head.glsl"
#include "types.glsl"
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
layout (binding = 0) writeonly buffer D {D_TYPE data_d[];};
void main() {
const uint i = gl_GlobalInvocationID.x;
if (i >= p.KX) {
return;
}
// p.param1 = fill value
data_d[i] = D_TYPE(p.param1);
}

View File

@@ -0,0 +1,22 @@
#version 450
#include "generic_head.glsl"
#include "types.glsl"
#extension GL_EXT_control_flow_attributes : enable
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
layout (binding = 0) readonly buffer X {A_TYPE data_a[];};
layout (binding = 1) writeonly buffer D {D_TYPE data_d[];};
void main() {
const uint i = gl_GlobalInvocationID.z * 262144 + gl_GlobalInvocationID.y * 512 + gl_GlobalInvocationID.x;
if (i >= p.KX) {
return;
}
const float x = float(data_a[i]);
data_d[i] = D_TYPE(floor(x));
}

View File

@@ -0,0 +1,29 @@
#version 450
#include "generic_head.glsl"
#include "types.glsl"
#extension GL_EXT_control_flow_attributes : enable
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
layout (binding = 0) readonly buffer X {A_TYPE data_a[];};
layout (binding = 1) writeonly buffer D {D_TYPE data_d[];};
void main() {
const uint i = gl_GlobalInvocationID.z * 262144 + gl_GlobalInvocationID.y * 512 + gl_GlobalInvocationID.x;
if (i >= p.KX) {
return;
}
const float x = float(data_a[i]);
float result;
// Round halfway cases away from zero as roundf does.
if (x >= 0.0) {
result = floor(x + 0.5);
} else {
result = ceil(x - 0.5);
}
data_d[i] = D_TYPE(result);
}

View File

@@ -0,0 +1,23 @@
#version 450
#include "generic_head.glsl"
#include "types.glsl"
#extension GL_EXT_control_flow_attributes : enable
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
layout (binding = 0) readonly buffer X {A_TYPE data_a[];};
layout (binding = 1) writeonly buffer D {D_TYPE data_d[];};
void main() {
const uint i = gl_GlobalInvocationID.z * 262144 + gl_GlobalInvocationID.y * 512 + gl_GlobalInvocationID.x;
if (i >= p.KX) {
return;
}
const float x = float(data_a[i]);
const float result = (x > 20.0f) ? x : log(1.0f + exp(x));
data_d[i] = D_TYPE(result);
}

View File

@@ -0,0 +1,22 @@
#version 450
#include "generic_head.glsl"
#include "types.glsl"
#extension GL_EXT_control_flow_attributes : enable
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
layout (binding = 0) readonly buffer X {A_TYPE data_a[];};
layout (binding = 1) writeonly buffer D {D_TYPE data_d[];};
void main() {
const uint i = gl_GlobalInvocationID.z * 262144 + gl_GlobalInvocationID.y * 512 + gl_GlobalInvocationID.x;
if (i >= p.KX) {
return;
}
const float x = float(data_a[i]);
data_d[i] = D_TYPE(x >= 0.0f ? 1.0f : 0.0f);
}

View File

@@ -0,0 +1,22 @@
#version 450
#include "generic_head.glsl"
#include "types.glsl"
#extension GL_EXT_control_flow_attributes : enable
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
layout (binding = 0) readonly buffer X {A_TYPE data_a[];};
layout (binding = 1) writeonly buffer D {D_TYPE data_d[];};
void main() {
const uint i = gl_GlobalInvocationID.z * 262144 + gl_GlobalInvocationID.y * 512 + gl_GlobalInvocationID.x;
if (i >= p.KX) {
return;
}
const float x = float(data_a[i]);
data_d[i] = D_TYPE(trunc(x));
}

View File

@@ -846,6 +846,25 @@ void process_shaders() {
string_to_spv("abs_f16", "abs.comp", {{"A_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}});
string_to_spv("abs_f32", "abs.comp", {{"A_TYPE", "float"}, {"D_TYPE", "float"}});
string_to_spv("softplus_f16", "softplus.comp", {{"A_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}});
string_to_spv("softplus_f32", "softplus.comp", {{"A_TYPE", "float"}, {"D_TYPE", "float"}});
string_to_spv("add1_f16_f16", "add1.comp", {{"A_TYPE", "float16_t"}, {"B_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}, {"FLOAT_TYPE", "float"}});
string_to_spv("add1_f16_f32", "add1.comp", {{"A_TYPE", "float16_t"}, {"B_TYPE", "float"}, {"D_TYPE", "float16_t"}, {"FLOAT_TYPE", "float"}});
string_to_spv("add1_f32_f32", "add1.comp", {{"A_TYPE", "float"}, {"B_TYPE", "float"}, {"D_TYPE", "float"}, {"FLOAT_TYPE", "float"}});
string_to_spv("arange_f32", "arange.comp", {{"A_TYPE", "float"}, {"D_TYPE", "float"}, {"FLOAT_TYPE", "float"}});
string_to_spv("fill_f32", "fill.comp", {{"D_TYPE", "float"}, {"FLOAT_TYPE", "float"}});
string_to_spv("step_f16", "step.comp", {{"A_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}});
string_to_spv("step_f32", "step.comp", {{"A_TYPE", "float"}, {"D_TYPE", "float"}});
string_to_spv("round_f16", "round.comp", {{"A_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}});
string_to_spv("round_f32", "round.comp", {{"A_TYPE", "float"}, {"D_TYPE", "float"}});
string_to_spv("ceil_f16", "ceil.comp", {{"A_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}});
string_to_spv("ceil_f32", "ceil.comp", {{"A_TYPE", "float"}, {"D_TYPE", "float"}});
string_to_spv("floor_f16", "floor.comp", {{"A_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}});
string_to_spv("floor_f32", "floor.comp", {{"A_TYPE", "float"}, {"D_TYPE", "float"}});
string_to_spv("trunc_f16", "trunc.comp", {{"A_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}});
string_to_spv("trunc_f32", "trunc.comp", {{"A_TYPE", "float"}, {"D_TYPE", "float"}});
for (auto rte : {false, true}) {
std::string suffix = rte ? "_rte" : "";
string_to_spv("geglu_f16" + suffix, "geglu.comp", {{"A_TYPE", "float16_t"}, {"D_TYPE", "float16_t"}, {"RTE16", rte ? "1" : "0"}});

View File

@@ -1 +1 @@
7b6abb2b92fcef35cb01c6ce6ada9bd85306522d
781baf2a14d9e0aaee542b2e1bb918bfc4132199

View File

@@ -6,8 +6,10 @@
#include <cmath>
#include <algorithm>
#include <cstdint>
#include <stdexcept>
#define MAX_REPETITION_THRESHOLD 2000
//
// helpers
//
@@ -345,8 +347,10 @@ const char * llama_grammar_parser::parse_sequence(
size_t last_sym_start = rule.size();
const char * pos = src;
auto handle_repetitions = [&](int min_times, int max_times) {
// use UINT64_MAX as the empty value because we aligned to the proper uint64_t type so -1 can't be used
// (though it's technically the same as -1 now)
auto handle_repetitions = [&](uint64_t min_times, uint64_t max_times) {
bool no_max = max_times == UINT64_MAX;
if (last_sym_start == rule.size()) {
throw std::runtime_error(std::string("expecting preceding item to */+/?/{ at ") + pos);
}
@@ -373,20 +377,20 @@ const char * llama_grammar_parser::parse_sequence(
rule.resize(last_sym_start);
} else {
// Repeat the previous elements (min_times - 1) times
for (int i = 1; i < min_times; i++) {
for (uint64_t i = 1; i < min_times; i++) {
rule.insert(rule.end(), prev_rule.begin(), prev_rule.end());
}
}
uint32_t last_rec_rule_id = 0;
auto n_opt = max_times < 0 ? 1 : max_times - min_times;
auto n_opt = no_max ? 1 : max_times - min_times;
llama_grammar_rule rec_rule(prev_rule);
for (int i = 0; i < n_opt; i++) {
for (uint64_t i = 0; i < n_opt; i++) {
rec_rule.resize(prev_rule.size());
uint32_t rec_rule_id = generate_symbol_id( rule_name);
if (i > 0 || max_times < 0) {
rec_rule.push_back({LLAMA_GRETYPE_RULE_REF, max_times < 0 ? rec_rule_id : last_rec_rule_id});
if (i > 0 || no_max) {
rec_rule.push_back({LLAMA_GRETYPE_RULE_REF, no_max ? rec_rule_id : last_rec_rule_id});
}
rec_rule.push_back({LLAMA_GRETYPE_ALT, 0});
rec_rule.push_back({LLAMA_GRETYPE_END, 0});
@@ -478,10 +482,10 @@ const char * llama_grammar_parser::parse_sequence(
throw std::runtime_error(std::string("expecting an int at ") + pos);
}
const char * int_end = parse_int(pos);
int min_times = std::stoul(std::string(pos, int_end - pos));
uint64_t min_times = std::stoul(std::string(pos, int_end - pos));
pos = parse_space(int_end, is_nested);
int max_times = -1;
uint64_t max_times = UINT64_MAX; // default: no max limit
if (*pos == '}') {
max_times = min_times;
@@ -502,6 +506,10 @@ const char * llama_grammar_parser::parse_sequence(
} else {
throw std::runtime_error(std::string("expecting ',' at ") + pos);
}
bool has_max = max_times != UINT64_MAX;
if (min_times > MAX_REPETITION_THRESHOLD || (has_max && max_times > MAX_REPETITION_THRESHOLD)) {
throw std::runtime_error(std::string("number of repetitions exceeds sane defaults, please reduce the number of repetitions"));
}
handle_repetitions(min_times, max_times);
} else {
break;

View File

@@ -20,10 +20,10 @@ static llama_logger_state g_logger_state;
time_meas::time_meas(int64_t & t_acc, bool disable) : t_start_us(disable ? -1 : ggml_time_us()), t_acc(t_acc) {}
time_meas::~time_meas() {
if (t_start_us >= 0) {
t_acc += ggml_time_us() - t_start_us;
}
if (t_start_us >= 0) {
t_acc += ggml_time_us() - t_start_us;
}
}
void llama_log_set(ggml_log_callback log_callback, void * user_data) {
ggml_log_set(log_callback, user_data);

View File

@@ -1593,7 +1593,8 @@ void llama_model::load_hparams(llama_model_loader & ml) {
} break;
case LLM_ARCH_DEEPSEEK2:
{
bool is_lite = (hparams.n_layer == 27);
// lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B
bool is_lite = (hparams.n_layer == 27 || hparams.n_layer == 26);
ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
ml.get_key(LLM_KV_LEADING_DENSE_BLOCK_COUNT, hparams.n_layer_dense_lead);
if (!is_lite) {
@@ -4581,7 +4582,8 @@ bool llama_model::load_tensors(llama_model_loader & ml) {
} break;
case LLM_ARCH_DEEPSEEK2:
{
const bool is_lite = (hparams.n_layer == 27);
// lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B
const bool is_lite = (hparams.n_layer == 27 || hparams.n_layer == 26);
const bool is_mla = (hparams.n_embd_head_k_mla != 0 && hparams.n_embd_head_v_mla != 0);

View File

@@ -472,9 +472,6 @@ static void llama_sampler_chain_reset(struct llama_sampler * smpl) {
for (auto * smpl : chain->samplers) {
llama_sampler_reset(smpl);
}
chain->t_sample_us = 0;
chain->n_sample = 0;
}
static struct llama_sampler * llama_sampler_chain_clone(const struct llama_sampler * smpl) {
@@ -2670,8 +2667,7 @@ struct llama_perf_sampler_data llama_perf_sampler(const struct llama_sampler * c
void llama_perf_sampler_print(const struct llama_sampler * chain) {
const auto data = llama_perf_sampler(chain);
LLAMA_LOG_INFO("%s: sampling time = %10.2f ms / %5d runs (%8.2f ms per token, %8.2f tokens per second)\n",
__func__, data.t_sample_ms, data.n_sample, data.t_sample_ms / data.n_sample, 1e3 / data.t_sample_ms * data.n_sample);
LLAMA_LOG_INFO("%s: samplers time = %10.2f ms / %5d runs\n", __func__, data.t_sample_ms, data.n_sample);
}
void llama_perf_sampler_reset(struct llama_sampler * chain) {
@@ -2681,5 +2677,6 @@ void llama_perf_sampler_reset(struct llama_sampler * chain) {
auto * ctx = (struct llama_sampler_chain *) chain->ctx;
ctx->t_sample_us = ctx->n_sample = 0;
ctx->t_sample_us = 0;
ctx->n_sample = 0;
}

View File

@@ -4,7 +4,8 @@
llm_build_deepseek2::llm_build_deepseek2(const llama_model & model, const llm_graph_params & params) :
llm_graph_context(params) {
bool is_lite = (hparams.n_layer == 27);
// lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B
bool is_lite = (hparams.n_layer == 27 || hparams.n_layer == 26);
const bool is_mla = (hparams.n_embd_head_k_mla != 0 && hparams.n_embd_head_v_mla != 0);

View File

@@ -7026,6 +7026,7 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_eval() {
test_cases.emplace_back(new test_bin_bcast(ggml_add, GGML_TYPE_F32, {16, 5, 4, 3}, {1, 1, 1, 1}, 16));
test_cases.emplace_back(new test_add1());
test_cases.emplace_back(new test_add1(GGML_TYPE_F32, {1024, 1024, 1, 1}));
test_cases.emplace_back(new test_scale());
test_cases.emplace_back(new test_scale(GGML_TYPE_F32, {10, 10, 10, 10}, 2.0f, 1.0f));
test_cases.emplace_back(new test_scale(GGML_TYPE_F32, {10, 10, 10, 10}, 2.0f, 1.0f, true)); // inplace test
@@ -7365,9 +7366,13 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_eval() {
test_cases.emplace_back(new test_clamp (type, {7, 1, 5, 3}));
test_cases.emplace_back(new test_leaky_relu(type, {7, 1, 5, 3}));
test_cases.emplace_back(new test_floor (type, {7, 1, 5, 3}));
test_cases.emplace_back(new test_floor (type, { 1024, 1024, 1, 1 }));
test_cases.emplace_back(new test_ceil (type, {7, 1, 5, 3}));
test_cases.emplace_back(new test_ceil (type, { 1024, 1024, 1, 1 }));
test_cases.emplace_back(new test_round (type, {7, 1, 5, 3}));
test_cases.emplace_back(new test_round (type, { 1024, 1024, 1, 1 }));
test_cases.emplace_back(new test_trunc (type, {7, 1, 5, 3}));
test_cases.emplace_back(new test_trunc (type, { 1024, 1024, 1, 1 }));
}
test_cases.emplace_back(new test_diag_mask_inf(GGML_TYPE_F32, {10, 10, 1, 1}, 5));
@@ -7569,6 +7574,7 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_eval() {
test_cases.emplace_back(new test_pad_reflect_1d(GGML_TYPE_F32, {3000, 384, 4, 1}));
test_cases.emplace_back(new test_roll());
test_cases.emplace_back(new test_arange());
test_cases.emplace_back(new test_arange(GGML_TYPE_F32, 0.0f, 1048576.0f, 1.0f));
test_cases.emplace_back(new test_timestep_embedding());
test_cases.emplace_back(new test_leaky_relu());
@@ -7596,6 +7602,7 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_eval() {
test_cases.emplace_back(new test_fill(0.0f));
test_cases.emplace_back(new test_fill(2.0f, GGML_TYPE_F32, { 303, 207, 11, 3 }));
test_cases.emplace_back(new test_fill(-152.0f, GGML_TYPE_F32, { 800, 600, 4, 4 }));
test_cases.emplace_back(new test_fill(3.5f, GGML_TYPE_F32, { 2048, 512, 2, 2 }));
test_cases.emplace_back(new test_solve_tri());
test_cases.emplace_back(new test_solve_tri(GGML_TYPE_F32, { 11, 11, 1, 1 }, { 5, 11, 1, 1 }));
@@ -7812,6 +7819,7 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_perf() {
for (int bs : {1, 4, 8, 32, 64, 128, 256, 512}) {
for (ggml_type type_a : {GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_Q4_0, GGML_TYPE_Q8_0, GGML_TYPE_Q4_K, GGML_TYPE_Q6_K, GGML_TYPE_IQ2_XS}) {
for (ggml_type type_b : {GGML_TYPE_F32}) {
test_cases.emplace_back(new test_mul_mat_id(type_a, type_b, 128, 8, false, 768, bs, 2048));
test_cases.emplace_back(new test_mul_mat_id_fusion(type_a, type_b, 128, 8, false, 768, bs, 2048, 1));
}
}
@@ -7820,6 +7828,7 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_perf() {
for (int bs : {1, 4, 8, 32, 64, 128, 256, 512}) {
for (ggml_type type_a : {GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_Q4_0, GGML_TYPE_Q8_0, GGML_TYPE_Q4_K, GGML_TYPE_Q6_K, GGML_TYPE_IQ2_XS}) {
for (ggml_type type_b : {GGML_TYPE_F32}) {
test_cases.emplace_back(new test_mul_mat_id(type_a, type_b, 32, 4, false, 1792, bs, 2048));
test_cases.emplace_back(new test_mul_mat_id_fusion(type_a, type_b, 32, 4, false, 1792, bs, 2048, 1));
}
}
@@ -7830,6 +7839,7 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_perf() {
for (int bs : {1, 4, 8, 512}) {
for (ggml_type type_a : {GGML_TYPE_MXFP4}) {
for (ggml_type type_b : {GGML_TYPE_F32}) {
test_cases.emplace_back(new test_mul_mat_id(type_a, type_b, 32, 4, false, 2880, bs, 2880));
test_cases.emplace_back(new test_mul_mat_id_fusion(type_a, type_b, 32, 4, false, 2880, bs, 2880, 1));
}
}

View File

@@ -147,11 +147,15 @@ int main(int argc, char ** argv) {
return 1;
}
auto * mem = llama_get_memory(ctx);
llama_memory_t mem = llama_get_memory(ctx);
const llama_vocab * vocab = llama_model_get_vocab(model);
// note: the time for chat template initialization is not negligible:
auto chat_templates = common_chat_templates_init(model, params.chat_template);
// start measuring performance timings from here
llama_perf_context_reset(ctx);
LOG_INF("%s: llama threadpool init, n_threads = %d\n", __func__, (int) params.cpuparams.n_threads);
auto * cpu_dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);

Binary file not shown.

View File

@@ -0,0 +1,273 @@
<script lang="ts">
import { FileText, Image, Music, FileIcon, Eye } from '@lucide/svelte';
import { FileTypeCategory, MimeTypeApplication } from '$lib/enums/files';
import { convertPDFToImage } from '$lib/utils/pdf-processing';
import { Button } from '$lib/components/ui/button';
import { getFileTypeCategory } from '$lib/utils/file-type';
interface Props {
// Either an uploaded file or a stored attachment
uploadedFile?: ChatUploadedFile;
attachment?: DatabaseMessageExtra;
// For uploaded files
preview?: string;
name?: string;
type?: string;
textContent?: string;
}
let { uploadedFile, attachment, preview, name, type, textContent }: Props = $props();
let displayName = $derived(uploadedFile?.name || attachment?.name || name || 'Unknown File');
let displayPreview = $derived(
uploadedFile?.preview || (attachment?.type === 'imageFile' ? attachment.base64Url : preview)
);
let displayType = $derived(
uploadedFile?.type ||
(attachment?.type === 'imageFile'
? 'image'
: attachment?.type === 'textFile'
? 'text'
: attachment?.type === 'audioFile'
? attachment.mimeType || 'audio'
: attachment?.type === 'pdfFile'
? MimeTypeApplication.PDF
: type || 'unknown')
);
let displayTextContent = $derived(
uploadedFile?.textContent ||
(attachment?.type === 'textFile'
? attachment.content
: attachment?.type === 'pdfFile'
? attachment.content
: textContent)
);
let isAudio = $derived(
getFileTypeCategory(displayType) === FileTypeCategory.AUDIO || displayType === 'audio'
);
let isImage = $derived(
getFileTypeCategory(displayType) === FileTypeCategory.IMAGE || displayType === 'image'
);
let isPdf = $derived(displayType === MimeTypeApplication.PDF);
let isText = $derived(
getFileTypeCategory(displayType) === FileTypeCategory.TEXT || displayType === 'text'
);
let IconComponent = $derived(() => {
if (isImage) return Image;
if (isText || isPdf) return FileText;
if (isAudio) return Music;
return FileIcon;
});
let pdfViewMode = $state<'text' | 'pages'>('pages');
let pdfImages = $state<string[]>([]);
let pdfImagesLoading = $state(false);
let pdfImagesError = $state<string | null>(null);
async function loadPdfImages() {
if (!isPdf || pdfImages.length > 0 || pdfImagesLoading) return;
pdfImagesLoading = true;
pdfImagesError = null;
try {
let file: File | null = null;
if (uploadedFile?.file) {
file = uploadedFile.file;
} else if (attachment?.type === 'pdfFile') {
// Check if we have pre-processed images
if (attachment.images && Array.isArray(attachment.images)) {
pdfImages = attachment.images;
return;
}
// Convert base64 back to File for processing
if (attachment.base64Data) {
const base64Data = attachment.base64Data;
const byteCharacters = atob(base64Data);
const byteNumbers = new Array(byteCharacters.length);
for (let i = 0; i < byteCharacters.length; i++) {
byteNumbers[i] = byteCharacters.charCodeAt(i);
}
const byteArray = new Uint8Array(byteNumbers);
file = new File([byteArray], displayName, { type: MimeTypeApplication.PDF });
}
}
if (file) {
pdfImages = await convertPDFToImage(file);
} else {
throw new Error('No PDF file available for conversion');
}
} catch (error) {
pdfImagesError = error instanceof Error ? error.message : 'Failed to load PDF images';
} finally {
pdfImagesLoading = false;
}
}
export function reset() {
pdfImages = [];
pdfImagesLoading = false;
pdfImagesError = null;
pdfViewMode = 'pages';
}
$effect(() => {
if (isPdf && pdfViewMode === 'pages') {
loadPdfImages();
}
});
</script>
<div class="space-y-4">
<div class="flex items-center justify-end gap-6">
{#if isPdf}
<div class="flex items-center gap-2">
<Button
variant={pdfViewMode === 'text' ? 'default' : 'outline'}
size="sm"
onclick={() => (pdfViewMode = 'text')}
disabled={pdfImagesLoading}
>
<FileText class="mr-1 h-4 w-4" />
Text
</Button>
<Button
variant={pdfViewMode === 'pages' ? 'default' : 'outline'}
size="sm"
onclick={() => {
pdfViewMode = 'pages';
loadPdfImages();
}}
disabled={pdfImagesLoading}
>
{#if pdfImagesLoading}
<div
class="mr-1 h-4 w-4 animate-spin rounded-full border-2 border-current border-t-transparent"
></div>
{:else}
<Eye class="mr-1 h-4 w-4" />
{/if}
Pages
</Button>
</div>
{/if}
</div>
<div class="flex-1 overflow-auto">
{#if isImage && displayPreview}
<div class="flex items-center justify-center">
<img
src={displayPreview}
alt={displayName}
class="max-h-full rounded-lg object-contain shadow-lg"
/>
</div>
{:else if isPdf && pdfViewMode === 'pages'}
{#if pdfImagesLoading}
<div class="flex items-center justify-center p-8">
<div class="text-center">
<div
class="mx-auto mb-4 h-8 w-8 animate-spin rounded-full border-4 border-primary border-t-transparent"
></div>
<p class="text-muted-foreground">Converting PDF to images...</p>
</div>
</div>
{:else if pdfImagesError}
<div class="flex items-center justify-center p-8">
<div class="text-center">
<FileText class="mx-auto mb-4 h-16 w-16 text-muted-foreground" />
<p class="mb-4 text-muted-foreground">Failed to load PDF images</p>
<p class="text-sm text-muted-foreground">{pdfImagesError}</p>
<Button class="mt-4" onclick={() => (pdfViewMode = 'text')}>View as Text</Button>
</div>
</div>
{:else if pdfImages.length > 0}
<div class="max-h-[70vh] space-y-4 overflow-auto">
{#each pdfImages as image, index (image)}
<div class="text-center">
<p class="mb-2 text-sm text-muted-foreground">Page {index + 1}</p>
<img
src={image}
alt="PDF Page {index + 1}"
class="mx-auto max-w-full rounded-lg shadow-lg"
/>
</div>
{/each}
</div>
{:else}
<div class="flex items-center justify-center p-8">
<div class="text-center">
<FileText class="mx-auto mb-4 h-16 w-16 text-muted-foreground" />
<p class="mb-4 text-muted-foreground">No PDF pages available</p>
</div>
</div>
{/if}
{:else if (isText || (isPdf && pdfViewMode === 'text')) && displayTextContent}
<div
class="max-h-[60vh] overflow-auto rounded-lg bg-muted p-4 font-mono text-sm break-words whitespace-pre-wrap"
>
{displayTextContent}
</div>
{:else if isAudio}
<div class="flex items-center justify-center p-8">
<div class="w-full max-w-md text-center">
<Music class="mx-auto mb-4 h-16 w-16 text-muted-foreground" />
{#if attachment?.type === 'audioFile'}
<audio
controls
class="mb-4 w-full"
src="data:{attachment.mimeType};base64,{attachment.base64Data}"
>
Your browser does not support the audio element.
</audio>
{:else if uploadedFile?.preview}
<audio controls class="mb-4 w-full" src={uploadedFile.preview}>
Your browser does not support the audio element.
</audio>
{:else}
<p class="mb-4 text-muted-foreground">Audio preview not available</p>
{/if}
<p class="text-sm text-muted-foreground">
{displayName}
</p>
</div>
</div>
{:else}
<div class="flex items-center justify-center p-8">
<div class="text-center">
{#if IconComponent}
<IconComponent class="mx-auto mb-4 h-16 w-16 text-muted-foreground" />
{/if}
<p class="mb-4 text-muted-foreground">Preview not available for this file type</p>
</div>
</div>
{/if}
</div>
</div>

View File

@@ -1,314 +0,0 @@
<script lang="ts">
import * as Dialog from '$lib/components/ui/dialog';
import { FileText, Image, Music, FileIcon, Eye } from '@lucide/svelte';
import { FileTypeCategory, MimeTypeApplication } from '$lib/enums/files';
import { convertPDFToImage } from '$lib/utils/pdf-processing';
import { Button } from '$lib/components/ui/button';
import { getFileTypeCategory } from '$lib/utils/file-type';
import { formatFileSize } from '$lib/utils/file-preview';
interface Props {
open: boolean;
// Either an uploaded file or a stored attachment
uploadedFile?: ChatUploadedFile;
attachment?: DatabaseMessageExtra;
// For uploaded files
preview?: string;
name?: string;
type?: string;
size?: number;
textContent?: string;
}
let {
open = $bindable(),
uploadedFile,
attachment,
preview,
name,
type,
size,
textContent
}: Props = $props();
let displayName = $derived(uploadedFile?.name || attachment?.name || name || 'Unknown File');
let displayPreview = $derived(
uploadedFile?.preview || (attachment?.type === 'imageFile' ? attachment.base64Url : preview)
);
let displayType = $derived(
uploadedFile?.type ||
(attachment?.type === 'imageFile'
? 'image'
: attachment?.type === 'textFile'
? 'text'
: attachment?.type === 'audioFile'
? attachment.mimeType || 'audio'
: attachment?.type === 'pdfFile'
? MimeTypeApplication.PDF
: type || 'unknown')
);
let displaySize = $derived(uploadedFile?.size || size);
let displayTextContent = $derived(
uploadedFile?.textContent ||
(attachment?.type === 'textFile'
? attachment.content
: attachment?.type === 'pdfFile'
? attachment.content
: textContent)
);
let isAudio = $derived(
getFileTypeCategory(displayType) === FileTypeCategory.AUDIO || displayType === 'audio'
);
let isImage = $derived(
getFileTypeCategory(displayType) === FileTypeCategory.IMAGE || displayType === 'image'
);
let isPdf = $derived(displayType === MimeTypeApplication.PDF);
let isText = $derived(
getFileTypeCategory(displayType) === FileTypeCategory.TEXT || displayType === 'text'
);
let IconComponent = $derived(() => {
if (isImage) return Image;
if (isText || isPdf) return FileText;
if (isAudio) return Music;
return FileIcon;
});
let pdfViewMode = $state<'text' | 'pages'>('pages');
let pdfImages = $state<string[]>([]);
let pdfImagesLoading = $state(false);
let pdfImagesError = $state<string | null>(null);
async function loadPdfImages() {
if (!isPdf || pdfImages.length > 0 || pdfImagesLoading) return;
pdfImagesLoading = true;
pdfImagesError = null;
try {
let file: File | null = null;
if (uploadedFile?.file) {
file = uploadedFile.file;
} else if (attachment?.type === 'pdfFile') {
// Check if we have pre-processed images
if (attachment.images && Array.isArray(attachment.images)) {
pdfImages = attachment.images;
return;
}
// Convert base64 back to File for processing
if (attachment.base64Data) {
const base64Data = attachment.base64Data;
const byteCharacters = atob(base64Data);
const byteNumbers = new Array(byteCharacters.length);
for (let i = 0; i < byteCharacters.length; i++) {
byteNumbers[i] = byteCharacters.charCodeAt(i);
}
const byteArray = new Uint8Array(byteNumbers);
file = new File([byteArray], displayName, { type: MimeTypeApplication.PDF });
}
}
if (file) {
pdfImages = await convertPDFToImage(file);
} else {
throw new Error('No PDF file available for conversion');
}
} catch (error) {
pdfImagesError = error instanceof Error ? error.message : 'Failed to load PDF images';
} finally {
pdfImagesLoading = false;
}
}
$effect(() => {
if (open) {
pdfImages = [];
pdfImagesLoading = false;
pdfImagesError = null;
pdfViewMode = 'pages';
}
});
$effect(() => {
if (open && isPdf && pdfViewMode === 'pages') {
loadPdfImages();
}
});
</script>
<Dialog.Root bind:open>
<Dialog.Content class="grid max-h-[90vh] max-w-5xl overflow-hidden !p-10 sm:w-auto sm:max-w-6xl">
<Dialog.Header class="flex-shrink-0">
<div class="flex items-center justify-between gap-6">
<div class="flex items-center gap-3">
{#if IconComponent}
<IconComponent class="h-5 w-5 text-muted-foreground" />
{/if}
<div>
<Dialog.Title class="text-left">{displayName}</Dialog.Title>
<div class="flex items-center gap-2 text-sm text-muted-foreground">
<span>{displayType}</span>
{#if displaySize}
<span></span>
<span>{formatFileSize(displaySize)}</span>
{/if}
</div>
</div>
</div>
{#if isPdf}
<div class="flex items-center gap-2">
<Button
variant={pdfViewMode === 'text' ? 'default' : 'outline'}
size="sm"
onclick={() => (pdfViewMode = 'text')}
disabled={pdfImagesLoading}
>
<FileText class="mr-1 h-4 w-4" />
Text
</Button>
<Button
variant={pdfViewMode === 'pages' ? 'default' : 'outline'}
size="sm"
onclick={() => {
pdfViewMode = 'pages';
loadPdfImages();
}}
disabled={pdfImagesLoading}
>
{#if pdfImagesLoading}
<div
class="mr-1 h-4 w-4 animate-spin rounded-full border-2 border-current border-t-transparent"
></div>
{:else}
<Eye class="mr-1 h-4 w-4" />
{/if}
Pages
</Button>
</div>
{/if}
</div>
</Dialog.Header>
<div class="flex-1 overflow-auto">
{#if isImage && displayPreview}
<div class="flex items-center justify-center">
<img
src={displayPreview}
alt={displayName}
class="max-h-full rounded-lg object-contain shadow-lg"
/>
</div>
{:else if isPdf && pdfViewMode === 'pages'}
{#if pdfImagesLoading}
<div class="flex items-center justify-center p-8">
<div class="text-center">
<div
class="mx-auto mb-4 h-8 w-8 animate-spin rounded-full border-4 border-primary border-t-transparent"
></div>
<p class="text-muted-foreground">Converting PDF to images...</p>
</div>
</div>
{:else if pdfImagesError}
<div class="flex items-center justify-center p-8">
<div class="text-center">
<FileText class="mx-auto mb-4 h-16 w-16 text-muted-foreground" />
<p class="mb-4 text-muted-foreground">Failed to load PDF images</p>
<p class="text-sm text-muted-foreground">{pdfImagesError}</p>
<Button class="mt-4" onclick={() => (pdfViewMode = 'text')}>View as Text</Button>
</div>
</div>
{:else if pdfImages.length > 0}
<div class="max-h-[70vh] space-y-4 overflow-auto">
{#each pdfImages as image, index (image)}
<div class="text-center">
<p class="mb-2 text-sm text-muted-foreground">Page {index + 1}</p>
<img
src={image}
alt="PDF Page {index + 1}"
class="mx-auto max-w-full rounded-lg shadow-lg"
/>
</div>
{/each}
</div>
{:else}
<div class="flex items-center justify-center p-8">
<div class="text-center">
<FileText class="mx-auto mb-4 h-16 w-16 text-muted-foreground" />
<p class="mb-4 text-muted-foreground">No PDF pages available</p>
</div>
</div>
{/if}
{:else if (isText || (isPdf && pdfViewMode === 'text')) && displayTextContent}
<div
class="max-h-[60vh] overflow-auto rounded-lg bg-muted p-4 font-mono text-sm break-words whitespace-pre-wrap"
>
{displayTextContent}
</div>
{:else if isAudio}
<div class="flex items-center justify-center p-8">
<div class="w-full max-w-md text-center">
<Music class="mx-auto mb-4 h-16 w-16 text-muted-foreground" />
{#if attachment?.type === 'audioFile'}
<audio
controls
class="mb-4 w-full"
src="data:{attachment.mimeType};base64,{attachment.base64Data}"
>
Your browser does not support the audio element.
</audio>
{:else if uploadedFile?.preview}
<audio controls class="mb-4 w-full" src={uploadedFile.preview}>
Your browser does not support the audio element.
</audio>
{:else}
<p class="mb-4 text-muted-foreground">Audio preview not available</p>
{/if}
<p class="text-sm text-muted-foreground">
{displayName}
</p>
</div>
</div>
{:else}
<div class="flex items-center justify-center p-8">
<div class="text-center">
{#if IconComponent}
<IconComponent class="mx-auto mb-4 h-16 w-16 text-muted-foreground" />
{/if}
<p class="mb-4 text-muted-foreground">Preview not available for this file type</p>
</div>
</div>
{/if}
</div>
</Dialog.Content>
</Dialog.Root>

View File

@@ -1,11 +1,10 @@
<script lang="ts">
import { ChatAttachmentImagePreview, ChatAttachmentFilePreview } from '$lib/components/app';
import { ChatAttachmentThumbnailImage, ChatAttachmentThumbnailFile } from '$lib/components/app';
import { Button } from '$lib/components/ui/button';
import { ChevronLeft, ChevronRight } from '@lucide/svelte';
import { FileTypeCategory } from '$lib/enums/files';
import { getFileTypeCategory } from '$lib/utils/file-type';
import ChatAttachmentPreviewDialog from './ChatAttachmentPreviewDialog.svelte';
import ChatAttachmentsViewAllDialog from './ChatAttachmentsViewAllDialog.svelte';
import { DialogChatAttachmentPreview, DialogChatAttachmentsViewAll } from '$lib/components/app';
import type { ChatAttachmentDisplayItem, ChatAttachmentPreviewItem } from '$lib/types/chat';
interface Props {
@@ -200,7 +199,7 @@
>
{#each displayItems as item (item.id)}
{#if item.isImage && item.preview}
<ChatAttachmentImagePreview
<ChatAttachmentThumbnailImage
class="flex-shrink-0 cursor-pointer {limitToSingleRow ? 'first:ml-4 last:mr-4' : ''}"
id={item.id}
name={item.name}
@@ -213,7 +212,7 @@
onClick={(event) => openPreview(item, event)}
/>
{:else}
<ChatAttachmentFilePreview
<ChatAttachmentThumbnailFile
class="flex-shrink-0 cursor-pointer {limitToSingleRow ? 'first:ml-4 last:mr-4' : ''}"
id={item.id}
name={item.name}
@@ -256,7 +255,7 @@
{/if}
{#if previewItem}
<ChatAttachmentPreviewDialog
<DialogChatAttachmentPreview
bind:open={previewDialogOpen}
uploadedFile={previewItem.uploadedFile}
attachment={previewItem.attachment}
@@ -268,7 +267,7 @@
/>
{/if}
<ChatAttachmentsViewAllDialog
<DialogChatAttachmentsViewAll
bind:open={viewAllDialogOpen}
{uploadedFiles}
{attachments}

View File

@@ -1,13 +1,14 @@
<script lang="ts">
import * as Dialog from '$lib/components/ui/dialog';
import { ChatAttachmentImagePreview, ChatAttachmentFilePreview } from '$lib/components/app';
import {
ChatAttachmentThumbnailImage,
ChatAttachmentThumbnailFile,
DialogChatAttachmentPreview
} from '$lib/components/app';
import { FileTypeCategory } from '$lib/enums/files';
import { getFileTypeCategory } from '$lib/utils/file-type';
import ChatAttachmentPreviewDialog from './ChatAttachmentPreviewDialog.svelte';
import type { ChatAttachmentDisplayItem, ChatAttachmentPreviewItem } from '$lib/types/chat';
interface Props {
open?: boolean;
uploadedFiles?: ChatUploadedFile[];
attachments?: DatabaseMessageExtra[];
readonly?: boolean;
@@ -18,7 +19,6 @@
}
let {
open = $bindable(false),
uploadedFiles = [],
attachments = [],
readonly = false,
@@ -127,70 +127,57 @@
}
</script>
<Dialog.Root bind:open>
<Dialog.Portal>
<Dialog.Overlay />
<Dialog.Content class="flex !max-h-[90vh] !max-w-6xl flex-col">
<Dialog.Header>
<Dialog.Title>All Attachments ({displayItems.length})</Dialog.Title>
<Dialog.Description class="text-sm text-muted-foreground">
View and manage all attached files
</Dialog.Description>
</Dialog.Header>
<div class="min-h-0 flex-1 space-y-6 overflow-y-auto px-1">
{#if fileItems.length > 0}
<div>
<h3 class="mb-3 text-sm font-medium text-foreground">Files ({fileItems.length})</h3>
<div class="flex flex-wrap items-start gap-3">
{#each fileItems as item (item.id)}
<ChatAttachmentFilePreview
class="cursor-pointer"
id={item.id}
name={item.name}
type={item.type}
size={item.size}
{readonly}
onRemove={onFileRemove}
textContent={item.textContent}
onClick={(event) => openPreview(item, event)}
/>
{/each}
</div>
</div>
{/if}
{#if imageItems.length > 0}
<div>
<h3 class="mb-3 text-sm font-medium text-foreground">Images ({imageItems.length})</h3>
<div class="flex flex-wrap items-start gap-3">
{#each imageItems as item (item.id)}
{#if item.preview}
<ChatAttachmentImagePreview
class="cursor-pointer"
id={item.id}
name={item.name}
preview={item.preview}
{readonly}
onRemove={onFileRemove}
height={imageHeight}
width={imageWidth}
{imageClass}
onClick={(event) => openPreview(item, event)}
/>
{/if}
{/each}
</div>
</div>
{/if}
<div class="space-y-4">
<div class="min-h-0 flex-1 space-y-6 overflow-y-auto px-1">
{#if fileItems.length > 0}
<div>
<h3 class="mb-3 text-sm font-medium text-foreground">Files ({fileItems.length})</h3>
<div class="flex flex-wrap items-start gap-3">
{#each fileItems as item (item.id)}
<ChatAttachmentThumbnailFile
class="cursor-pointer"
id={item.id}
name={item.name}
type={item.type}
size={item.size}
{readonly}
onRemove={onFileRemove}
textContent={item.textContent}
onClick={(event) => openPreview(item, event)}
/>
{/each}
</div>
</div>
</Dialog.Content>
</Dialog.Portal>
</Dialog.Root>
{/if}
{#if imageItems.length > 0}
<div>
<h3 class="mb-3 text-sm font-medium text-foreground">Images ({imageItems.length})</h3>
<div class="flex flex-wrap items-start gap-3">
{#each imageItems as item (item.id)}
{#if item.preview}
<ChatAttachmentThumbnailImage
class="cursor-pointer"
id={item.id}
name={item.name}
preview={item.preview}
{readonly}
onRemove={onFileRemove}
height={imageHeight}
width={imageWidth}
{imageClass}
onClick={(event) => openPreview(item, event)}
/>
{/if}
{/each}
</div>
</div>
{/if}
</div>
</div>
{#if previewItem}
<ChatAttachmentPreviewDialog
<DialogChatAttachmentPreview
bind:open={previewDialogOpen}
uploadedFile={previewItem.uploadedFile}
attachment={previewItem.attachment}

View File

@@ -1,9 +1,11 @@
<script lang="ts">
import { Square, ArrowUp } from '@lucide/svelte';
import { Button } from '$lib/components/ui/button';
import ChatFormActionFileAttachments from './ChatFormActionFileAttachments.svelte';
import ChatFormActionRecord from './ChatFormActionRecord.svelte';
import ChatFormModelSelector from './ChatFormModelSelector.svelte';
import {
ChatFormActionFileAttachments,
ChatFormActionRecord,
ChatFormModelSelector
} from '$lib/components/app';
import { config } from '$lib/stores/settings.svelte';
import type { FileTypeCategory } from '$lib/enums/files';

View File

@@ -1,7 +1,10 @@
<script lang="ts">
import { Edit, Copy, RefreshCw, Trash2, ArrowRight } from '@lucide/svelte';
import { ActionButton, ConfirmationDialog } from '$lib/components/app';
import ChatMessageBranchingControls from './ChatMessageBranchingControls.svelte';
import {
ActionButton,
ChatMessageBranchingControls,
DialogConfirmation
} from '$lib/components/app';
interface Props {
role: 'user' | 'assistant';
@@ -80,7 +83,7 @@
</div>
</div>
<ConfirmationDialog
<DialogConfirmation
bind:open={showDeleteDialog}
title="Delete Message"
description={deletionInfo && deletionInfo.totalCount > 1

View File

@@ -5,13 +5,13 @@
ChatScreenHeader,
ChatScreenWarning,
ChatMessages,
ChatProcessingInfo,
EmptyFileAlertDialog,
ChatErrorDialog,
ChatScreenProcessingInfo,
DialogEmptyFileAlert,
DialogChatError,
ServerErrorSplash,
ServerInfo,
ServerLoadingSplash,
ConfirmationDialog
DialogConfirmation
} from '$lib/components/app';
import * as AlertDialog from '$lib/components/ui/alert-dialog';
import {
@@ -299,7 +299,7 @@
class="pointer-events-none sticky right-0 bottom-0 left-0 mt-auto"
in:slide={{ duration: 150, axis: 'y' }}
>
<ChatProcessingInfo />
<ChatScreenProcessingInfo />
{#if serverWarning()}
<ChatScreenWarning class="pointer-events-auto mx-auto max-w-[48rem] px-4" />
@@ -432,7 +432,7 @@
</AlertDialog.Portal>
</AlertDialog.Root>
<ConfirmationDialog
<DialogConfirmation
bind:open={showDeleteDialog}
title="Delete Conversation"
description="Are you sure you want to delete this conversation? This action cannot be undone and will permanently remove all messages in this conversation."
@@ -444,7 +444,7 @@
onCancel={() => (showDeleteDialog = false)}
/>
<EmptyFileAlertDialog
<DialogEmptyFileAlert
bind:open={showEmptyFileDialog}
emptyFiles={emptyFileNames}
onOpenChange={(open) => {
@@ -454,7 +454,7 @@
}}
/>
<ChatErrorDialog
<DialogChatError
message={activeErrorDialog?.message ?? ''}
onOpenChange={handleErrorDialogOpenChange}
open={Boolean(activeErrorDialog)}

View File

@@ -1,6 +1,6 @@
<script lang="ts">
import { Settings } from '@lucide/svelte';
import { ChatSettingsDialog } from '$lib/components/app';
import { DialogChatSettings } from '$lib/components/app';
import { Button } from '$lib/components/ui/button';
let settingsOpen = $state(false);
@@ -20,4 +20,4 @@
</div>
</header>
<ChatSettingsDialog open={settingsOpen} onOpenChange={(open) => (settingsOpen = open)} />
<DialogChatSettings open={settingsOpen} onOpenChange={(open) => (settingsOpen = open)} />

View File

@@ -12,20 +12,21 @@
ChevronRight,
Database
} from '@lucide/svelte';
import { ChatSettingsFooter, ChatSettingsFields } from '$lib/components/app';
import ImportExportTab from './ImportExportTab.svelte';
import * as Dialog from '$lib/components/ui/dialog';
import {
ChatSettingsFooter,
ChatSettingsImportExportTab,
ChatSettingsFields
} from '$lib/components/app';
import { ScrollArea } from '$lib/components/ui/scroll-area';
import { config, updateMultipleConfig } from '$lib/stores/settings.svelte';
import { setMode } from 'mode-watcher';
import type { Component } from 'svelte';
interface Props {
onOpenChange?: (open: boolean) => void;
open?: boolean;
onSave?: () => void;
}
let { onOpenChange, open = false }: Props = $props();
let { onSave }: Props = $props();
const settingSections: Array<{
fields: SettingsFieldConfig[];
@@ -269,7 +270,6 @@
settingSections.find((section) => section.title === activeSection) || settingSections[0]
);
let localConfig: SettingsConfigType = $state({ ...config() });
let originalTheme: string = $state('');
let canScrollLeft = $state(false);
let canScrollRight = $state(false);
@@ -285,18 +285,10 @@
localConfig[key] = value;
}
function handleClose() {
if (localConfig.theme !== originalTheme) {
setMode(originalTheme as 'light' | 'dark' | 'system');
}
onOpenChange?.(false);
}
function handleReset() {
localConfig = { ...config() };
setMode(localConfig.theme as 'light' | 'dark' | 'system');
originalTheme = localConfig.theme as string;
}
function handleSave() {
@@ -347,7 +339,7 @@
}
updateMultipleConfig(processedConfig);
onOpenChange?.(false);
onSave?.();
}
function scrollToCenter(element: HTMLElement) {
@@ -383,14 +375,11 @@
canScrollRight = scrollLeft < scrollWidth - clientWidth - 1; // -1 for rounding
}
$effect(() => {
if (open) {
localConfig = { ...config() };
originalTheme = config().theme as string;
export function reset() {
localConfig = { ...config() };
setTimeout(updateScrollButtons, 100);
}
});
setTimeout(updateScrollButtons, 100);
}
$effect(() => {
if (scrollContainer) {
@@ -399,120 +388,106 @@
});
</script>
<Dialog.Root {open} onOpenChange={handleClose}>
<Dialog.Content
class="z-999999 flex h-[100dvh] max-h-[100dvh] min-h-[100dvh] flex-col gap-0 rounded-none p-0
md:h-[64vh] md:max-h-[64vh] md:min-h-0 md:rounded-lg"
style="max-width: 48rem;"
>
<div class="flex flex-1 flex-col overflow-hidden md:flex-row">
<!-- Desktop Sidebar -->
<div class="hidden w-64 border-r border-border/30 p-6 md:block">
<nav class="space-y-1 py-2">
<Dialog.Title class="mb-6 flex items-center gap-2">Settings</Dialog.Title>
<div class="flex h-full flex-col overflow-hidden md:flex-row">
<!-- Desktop Sidebar -->
<div class="hidden w-64 border-r border-border/30 p-6 md:block">
<nav class="space-y-1 py-2">
{#each settingSections as section (section.title)}
<button
class="flex w-full cursor-pointer items-center gap-3 rounded-lg px-3 py-2 text-left text-sm transition-colors hover:bg-accent {activeSection ===
section.title
? 'bg-accent text-accent-foreground'
: 'text-muted-foreground'}"
onclick={() => (activeSection = section.title)}
>
<section.icon class="h-4 w-4" />
{#each settingSections as section (section.title)}
<button
class="flex w-full cursor-pointer items-center gap-3 rounded-lg px-3 py-2 text-left text-sm transition-colors hover:bg-accent {activeSection ===
section.title
? 'bg-accent text-accent-foreground'
: 'text-muted-foreground'}"
onclick={() => (activeSection = section.title)}
>
<section.icon class="h-4 w-4" />
<span class="ml-2">{section.title}</span>
</button>
{/each}
</nav>
</div>
<span class="ml-2">{section.title}</span>
</button>
{/each}
</nav>
</div>
<!-- Mobile Header with Horizontal Scrollable Menu -->
<div class="flex flex-col md:hidden">
<div class="border-b border-border/30 py-4">
<!-- Horizontal Scrollable Category Menu with Navigation -->
<div class="relative flex items-center" style="scroll-padding: 1rem;">
<button
class="absolute left-2 z-10 flex h-6 w-6 items-center justify-center rounded-full bg-muted shadow-md backdrop-blur-sm transition-opacity hover:bg-accent {canScrollLeft
? 'opacity-100'
: 'pointer-events-none opacity-0'}"
onclick={scrollLeft}
aria-label="Scroll left"
>
<ChevronLeft class="h-4 w-4" />
</button>
<!-- Mobile Header with Horizontal Scrollable Menu -->
<div class="flex flex-col md:hidden">
<div class="border-b border-border/30 py-4">
<Dialog.Title class="mb-6 flex items-center gap-2 px-4">Settings</Dialog.Title>
<!-- Horizontal Scrollable Category Menu with Navigation -->
<div class="relative flex items-center" style="scroll-padding: 1rem;">
<button
class="absolute left-2 z-10 flex h-6 w-6 items-center justify-center rounded-full bg-muted shadow-md backdrop-blur-sm transition-opacity hover:bg-accent {canScrollLeft
? 'opacity-100'
: 'pointer-events-none opacity-0'}"
onclick={scrollLeft}
aria-label="Scroll left"
>
<ChevronLeft class="h-4 w-4" />
</button>
<div
class="scrollbar-hide overflow-x-auto py-2"
bind:this={scrollContainer}
onscroll={updateScrollButtons}
>
<div class="flex min-w-max gap-2">
{#each settingSections as section (section.title)}
<button
class="flex cursor-pointer items-center gap-2 rounded-lg px-3 py-2 text-sm whitespace-nowrap transition-colors first:ml-4 last:mr-4 hover:bg-accent {activeSection ===
section.title
? 'bg-accent text-accent-foreground'
: 'text-muted-foreground'}"
onclick={(e: MouseEvent) => {
activeSection = section.title;
scrollToCenter(e.currentTarget as HTMLElement);
}}
>
<section.icon class="h-4 w-4 flex-shrink-0" />
<span>{section.title}</span>
</button>
{/each}
</div>
</div>
<button
class="absolute right-2 z-10 flex h-6 w-6 items-center justify-center rounded-full bg-muted shadow-md backdrop-blur-sm transition-opacity hover:bg-accent {canScrollRight
? 'opacity-100'
: 'pointer-events-none opacity-0'}"
onclick={scrollRight}
aria-label="Scroll right"
>
<ChevronRight class="h-4 w-4" />
</button>
<div
class="scrollbar-hide overflow-x-auto py-2"
bind:this={scrollContainer}
onscroll={updateScrollButtons}
>
<div class="flex min-w-max gap-2">
{#each settingSections as section (section.title)}
<button
class="flex cursor-pointer items-center gap-2 rounded-lg px-3 py-2 text-sm whitespace-nowrap transition-colors first:ml-4 last:mr-4 hover:bg-accent {activeSection ===
section.title
? 'bg-accent text-accent-foreground'
: 'text-muted-foreground'}"
onclick={(e: MouseEvent) => {
activeSection = section.title;
scrollToCenter(e.currentTarget as HTMLElement);
}}
>
<section.icon class="h-4 w-4 flex-shrink-0" />
<span>{section.title}</span>
</button>
{/each}
</div>
</div>
<button
class="absolute right-2 z-10 flex h-6 w-6 items-center justify-center rounded-full bg-muted shadow-md backdrop-blur-sm transition-opacity hover:bg-accent {canScrollRight
? 'opacity-100'
: 'pointer-events-none opacity-0'}"
onclick={scrollRight}
aria-label="Scroll right"
>
<ChevronRight class="h-4 w-4" />
</button>
</div>
<ScrollArea class="max-h-[calc(100dvh-13.5rem)] flex-1 md:max-h-[calc(100vh-13.5rem)]">
<div class="space-y-6 p-4 md:p-6">
<div class="grid">
<div class="mb-6 flex hidden items-center gap-2 border-b border-border/30 pb-6 md:flex">
<currentSection.icon class="h-5 w-5" />
<h3 class="text-lg font-semibold">{currentSection.title}</h3>
</div>
{#if currentSection.title === 'Import/Export'}
<ImportExportTab />
{:else}
<div class="space-y-6">
<ChatSettingsFields
fields={currentSection.fields}
{localConfig}
onConfigChange={handleConfigChange}
onThemeChange={handleThemeChange}
/>
</div>
{/if}
</div>
<div class="mt-8 border-t pt-6">
<p class="text-xs text-muted-foreground">
Settings are saved in browser's localStorage
</p>
</div>
</div>
</ScrollArea>
</div>
</div>
<ChatSettingsFooter onReset={handleReset} onSave={handleSave} />
</Dialog.Content>
</Dialog.Root>
<ScrollArea class="max-h-[calc(100dvh-13.5rem)] flex-1 md:max-h-[calc(100vh-13.5rem)]">
<div class="space-y-6 p-4 md:p-6">
<div class="grid">
<div class="mb-6 flex hidden items-center gap-2 border-b border-border/30 pb-6 md:flex">
<currentSection.icon class="h-5 w-5" />
<h3 class="text-lg font-semibold">{currentSection.title}</h3>
</div>
{#if currentSection.title === 'Import/Export'}
<ChatSettingsImportExportTab />
{:else}
<div class="space-y-6">
<ChatSettingsFields
fields={currentSection.fields}
{localConfig}
onConfigChange={handleConfigChange}
onThemeChange={handleThemeChange}
/>
</div>
{/if}
</div>
<div class="mt-8 border-t pt-6">
<p class="text-xs text-muted-foreground">Settings are saved in browser's localStorage</p>
</div>
</div>
</ScrollArea>
</div>
<ChatSettingsFooter onReset={handleReset} onSave={handleSave} />

View File

@@ -9,7 +9,7 @@
import { supportsVision } from '$lib/stores/server.svelte';
import { getParameterInfo, resetParameterToServerDefault } from '$lib/stores/settings.svelte';
import { ParameterSyncService } from '$lib/services/parameter-sync';
import ParameterSourceIndicator from './ParameterSourceIndicator.svelte';
import { ChatSettingsParameterSourceIndicator } from '$lib/components/app';
import type { Component } from 'svelte';
interface Props {
@@ -63,7 +63,7 @@
{/if}
</Label>
{#if isCustomRealTime}
<ParameterSourceIndicator />
<ChatSettingsParameterSourceIndicator />
{/if}
</div>
@@ -145,7 +145,7 @@
{/if}
</Label>
{#if isCustomRealTime}
<ParameterSourceIndicator />
<ChatSettingsParameterSourceIndicator />
{/if}
</div>

View File

@@ -1,7 +1,7 @@
<script lang="ts">
import { Download, Upload } from '@lucide/svelte';
import { Button } from '$lib/components/ui/button';
import ConversationSelectionDialog from './ConversationSelectionDialog.svelte';
import { DialogConversationSelection } from '$lib/components/app';
import { DatabaseStore } from '$lib/stores/database';
import type { ExportedConversations } from '$lib/types/database';
import { createMessageCountMap } from '$lib/utils/conversation-utils';
@@ -236,7 +236,7 @@
</div>
</div>
<ConversationSelectionDialog
<DialogConversationSelection
conversations={availableConversations}
{messageCountMap}
mode="export"
@@ -245,7 +245,7 @@
onConfirm={handleExportConfirm}
/>
<ConversationSelectionDialog
<DialogConversationSelection
conversations={availableConversations}
{messageCountMap}
mode="import"

View File

@@ -1,249 +0,0 @@
<script lang="ts">
import { Search, X } from '@lucide/svelte';
import * as Dialog from '$lib/components/ui/dialog';
import { Button } from '$lib/components/ui/button';
import { Input } from '$lib/components/ui/input';
import { Checkbox } from '$lib/components/ui/checkbox';
import { ScrollArea } from '$lib/components/ui/scroll-area';
import { SvelteSet } from 'svelte/reactivity';
interface Props {
conversations: DatabaseConversation[];
messageCountMap?: Map<string, number>;
mode: 'export' | 'import';
onCancel: () => void;
onConfirm: (selectedConversations: DatabaseConversation[]) => void;
open?: boolean;
}
let {
conversations,
messageCountMap = new Map(),
mode,
onCancel,
onConfirm,
open = $bindable(false)
}: Props = $props();
let searchQuery = $state('');
let selectedIds = $state.raw<SvelteSet<string>>(new SvelteSet(conversations.map((c) => c.id)));
let lastClickedId = $state<string | null>(null);
let filteredConversations = $derived(
conversations.filter((conv) => {
const name = conv.name || 'Untitled conversation';
return name.toLowerCase().includes(searchQuery.toLowerCase());
})
);
let allSelected = $derived(
filteredConversations.length > 0 &&
filteredConversations.every((conv) => selectedIds.has(conv.id))
);
let someSelected = $derived(
filteredConversations.some((conv) => selectedIds.has(conv.id)) && !allSelected
);
function toggleConversation(id: string, shiftKey: boolean = false) {
const newSet = new SvelteSet(selectedIds);
if (shiftKey && lastClickedId !== null) {
const lastIndex = filteredConversations.findIndex((c) => c.id === lastClickedId);
const currentIndex = filteredConversations.findIndex((c) => c.id === id);
if (lastIndex !== -1 && currentIndex !== -1) {
const start = Math.min(lastIndex, currentIndex);
const end = Math.max(lastIndex, currentIndex);
const shouldSelect = !newSet.has(id);
for (let i = start; i <= end; i++) {
if (shouldSelect) {
newSet.add(filteredConversations[i].id);
} else {
newSet.delete(filteredConversations[i].id);
}
}
selectedIds = newSet;
return;
}
}
if (newSet.has(id)) {
newSet.delete(id);
} else {
newSet.add(id);
}
selectedIds = newSet;
lastClickedId = id;
}
function toggleAll() {
if (allSelected) {
const newSet = new SvelteSet(selectedIds);
filteredConversations.forEach((conv) => newSet.delete(conv.id));
selectedIds = newSet;
} else {
const newSet = new SvelteSet(selectedIds);
filteredConversations.forEach((conv) => newSet.add(conv.id));
selectedIds = newSet;
}
}
function handleConfirm() {
const selected = conversations.filter((conv) => selectedIds.has(conv.id));
onConfirm(selected);
}
function handleCancel() {
selectedIds = new SvelteSet(conversations.map((c) => c.id));
searchQuery = '';
lastClickedId = null;
onCancel();
}
let previousOpen = $state(false);
$effect(() => {
if (open && !previousOpen) {
selectedIds = new SvelteSet(conversations.map((c) => c.id));
searchQuery = '';
lastClickedId = null;
} else if (!open && previousOpen) {
onCancel();
}
previousOpen = open;
});
</script>
<Dialog.Root bind:open>
<Dialog.Portal>
<Dialog.Overlay class="z-[1000000]" />
<Dialog.Content class="z-[1000001] max-w-2xl">
<Dialog.Header>
<Dialog.Title>
Select Conversations to {mode === 'export' ? 'Export' : 'Import'}
</Dialog.Title>
<Dialog.Description>
{#if mode === 'export'}
Choose which conversations you want to export. Selected conversations will be downloaded
as a JSON file.
{:else}
Choose which conversations you want to import. Selected conversations will be merged
with your existing conversations.
{/if}
</Dialog.Description>
</Dialog.Header>
<div class="space-y-4">
<div class="relative">
<Search class="absolute top-1/2 left-3 h-4 w-4 -translate-y-1/2 text-muted-foreground" />
<Input bind:value={searchQuery} placeholder="Search conversations..." class="pr-9 pl-9" />
{#if searchQuery}
<button
class="absolute top-1/2 right-3 -translate-y-1/2 text-muted-foreground hover:text-foreground"
onclick={() => (searchQuery = '')}
type="button"
>
<X class="h-4 w-4" />
</button>
{/if}
</div>
<div class="flex items-center justify-between text-sm text-muted-foreground">
<span>
{selectedIds.size} of {conversations.length} selected
{#if searchQuery}
({filteredConversations.length} shown)
{/if}
</span>
</div>
<div class="overflow-hidden rounded-md border">
<ScrollArea class="h-[400px]">
<table class="w-full">
<thead class="sticky top-0 z-10 bg-muted">
<tr class="border-b">
<th class="w-12 p-3 text-left">
<Checkbox
checked={allSelected}
indeterminate={someSelected}
onCheckedChange={toggleAll}
/>
</th>
<th class="p-3 text-left text-sm font-medium">Conversation Name</th>
<th class="w-32 p-3 text-left text-sm font-medium">Messages</th>
</tr>
</thead>
<tbody>
{#if filteredConversations.length === 0}
<tr>
<td colspan="3" class="p-8 text-center text-sm text-muted-foreground">
{#if searchQuery}
No conversations found matching "{searchQuery}"
{:else}
No conversations available
{/if}
</td>
</tr>
{:else}
{#each filteredConversations as conv (conv.id)}
<tr
class="cursor-pointer border-b transition-colors hover:bg-muted/50"
onclick={(e) => toggleConversation(conv.id, e.shiftKey)}
>
<td class="p-3">
<Checkbox
checked={selectedIds.has(conv.id)}
onclick={(e) => {
e.preventDefault();
e.stopPropagation();
toggleConversation(conv.id, e.shiftKey);
}}
/>
</td>
<td class="p-3 text-sm">
<div
class="max-w-[17rem] truncate"
title={conv.name || 'Untitled conversation'}
>
{conv.name || 'Untitled conversation'}
</div>
</td>
<td class="p-3 text-sm text-muted-foreground">
{messageCountMap.get(conv.id) ?? 0}
</td>
</tr>
{/each}
{/if}
</tbody>
</table>
</ScrollArea>
</div>
</div>
<Dialog.Footer>
<Button variant="outline" onclick={handleCancel}>Cancel</Button>
<Button onclick={handleConfirm} disabled={selectedIds.size === 0}>
{mode === 'export' ? 'Export' : 'Import'} ({selectedIds.size})
</Button>
</Dialog.Footer>
</Dialog.Content>
</Dialog.Portal>
</Dialog.Root>

View File

@@ -2,7 +2,7 @@
import { goto } from '$app/navigation';
import { page } from '$app/state';
import { Trash2 } from '@lucide/svelte';
import { ChatSidebarConversationItem, ConfirmationDialog } from '$lib/components/app';
import { ChatSidebarConversationItem, DialogConfirmation } from '$lib/components/app';
import ScrollArea from '$lib/components/ui/scroll-area/scroll-area.svelte';
import * as Sidebar from '$lib/components/ui/sidebar';
import * as AlertDialog from '$lib/components/ui/alert-dialog';
@@ -158,7 +158,7 @@
<div class="bottom-0 z-10 bg-sidebar bg-sidebar/50 px-4 py-4 backdrop-blur-lg md:sticky"></div>
</ScrollArea>
<ConfirmationDialog
<DialogConfirmation
bind:open={showDeleteDialog}
title="Delete Conversation"
description={selectedConversation

View File

@@ -0,0 +1,78 @@
<script lang="ts">
import * as Dialog from '$lib/components/ui/dialog';
import { ChatAttachmentPreview } from '$lib/components/app';
import { formatFileSize } from '$lib/utils/file-preview';
interface Props {
open: boolean;
// Either an uploaded file or a stored attachment
uploadedFile?: ChatUploadedFile;
attachment?: DatabaseMessageExtra;
// For uploaded files
preview?: string;
name?: string;
type?: string;
size?: number;
textContent?: string;
}
let {
open = $bindable(),
uploadedFile,
attachment,
preview,
name,
type,
size,
textContent
}: Props = $props();
let chatAttachmentPreviewRef: ChatAttachmentPreview | undefined = $state();
let displayName = $derived(uploadedFile?.name || attachment?.name || name || 'Unknown File');
let displayType = $derived(
uploadedFile?.type ||
(attachment?.type === 'imageFile'
? 'image'
: attachment?.type === 'textFile'
? 'text'
: attachment?.type === 'audioFile'
? attachment.mimeType || 'audio'
: attachment?.type === 'pdfFile'
? 'application/pdf'
: type || 'unknown')
);
let displaySize = $derived(uploadedFile?.size || size);
$effect(() => {
if (open && chatAttachmentPreviewRef) {
chatAttachmentPreviewRef.reset();
}
});
</script>
<Dialog.Root bind:open>
<Dialog.Content class="grid max-h-[90vh] max-w-5xl overflow-hidden sm:w-auto sm:max-w-6xl">
<Dialog.Header>
<Dialog.Title>{displayName}</Dialog.Title>
<Dialog.Description>
{displayType}
{#if displaySize}
{formatFileSize(displaySize)}
{/if}
</Dialog.Description>
</Dialog.Header>
<ChatAttachmentPreview
bind:this={chatAttachmentPreviewRef}
{uploadedFile}
{attachment}
{preview}
{name}
{type}
{textContent}
/>
</Dialog.Content>
</Dialog.Root>

View File

@@ -0,0 +1,51 @@
<script lang="ts">
import * as Dialog from '$lib/components/ui/dialog';
import { ChatAttachmentsViewAll } from '$lib/components/app';
interface Props {
open?: boolean;
uploadedFiles?: ChatUploadedFile[];
attachments?: DatabaseMessageExtra[];
readonly?: boolean;
onFileRemove?: (fileId: string) => void;
imageHeight?: string;
imageWidth?: string;
imageClass?: string;
}
let {
open = $bindable(false),
uploadedFiles = [],
attachments = [],
readonly = false,
onFileRemove,
imageHeight = 'h-24',
imageWidth = 'w-auto',
imageClass = ''
}: Props = $props();
let totalCount = $derived(uploadedFiles.length + attachments.length);
</script>
<Dialog.Root bind:open>
<Dialog.Portal>
<Dialog.Overlay />
<Dialog.Content class="flex !max-h-[90vh] !max-w-6xl flex-col">
<Dialog.Header>
<Dialog.Title>All Attachments ({totalCount})</Dialog.Title>
<Dialog.Description>View and manage all attached files</Dialog.Description>
</Dialog.Header>
<ChatAttachmentsViewAll
{uploadedFiles}
{attachments}
{readonly}
{onFileRemove}
{imageHeight}
{imageWidth}
{imageClass}
/>
</Dialog.Content>
</Dialog.Portal>
</Dialog.Root>

View File

@@ -0,0 +1,37 @@
<script lang="ts">
import * as Dialog from '$lib/components/ui/dialog';
import { ChatSettings } from '$lib/components/app';
interface Props {
onOpenChange?: (open: boolean) => void;
open?: boolean;
}
let { onOpenChange, open = false }: Props = $props();
let chatSettingsRef: ChatSettings | undefined = $state();
function handleClose() {
onOpenChange?.(false);
}
function handleSave() {
onOpenChange?.(false);
}
$effect(() => {
if (open && chatSettingsRef) {
chatSettingsRef.reset();
}
});
</script>
<Dialog.Root {open} onOpenChange={handleClose}>
<Dialog.Content
class="z-999999 flex h-[100dvh] max-h-[100dvh] min-h-[100dvh] flex-col gap-0 rounded-none p-0
md:h-[64vh] md:max-h-[64vh] md:min-h-0 md:rounded-lg"
style="max-width: 48rem;"
>
<ChatSettings bind:this={chatSettingsRef} onSave={handleSave} />
</Dialog.Content>
</Dialog.Root>

View File

@@ -0,0 +1,68 @@
<script lang="ts">
import * as Dialog from '$lib/components/ui/dialog';
import { ConversationSelection } from '$lib/components/app';
interface Props {
conversations: DatabaseConversation[];
messageCountMap?: Map<string, number>;
mode: 'export' | 'import';
onCancel: () => void;
onConfirm: (selectedConversations: DatabaseConversation[]) => void;
open?: boolean;
}
let {
conversations,
messageCountMap = new Map(),
mode,
onCancel,
onConfirm,
open = $bindable(false)
}: Props = $props();
let conversationSelectionRef: ConversationSelection | undefined = $state();
let previousOpen = $state(false);
$effect(() => {
if (open && !previousOpen && conversationSelectionRef) {
conversationSelectionRef.reset();
} else if (!open && previousOpen) {
onCancel();
}
previousOpen = open;
});
</script>
<Dialog.Root bind:open>
<Dialog.Portal>
<Dialog.Overlay class="z-[1000000]" />
<Dialog.Content class="z-[1000001] max-w-2xl">
<Dialog.Header>
<Dialog.Title>
Select Conversations to {mode === 'export' ? 'Export' : 'Import'}
</Dialog.Title>
<Dialog.Description>
{#if mode === 'export'}
Choose which conversations you want to export. Selected conversations will be downloaded
as a JSON file.
{:else}
Choose which conversations you want to import. Selected conversations will be merged
with your existing conversations.
{/if}
</Dialog.Description>
</Dialog.Header>
<ConversationSelection
bind:this={conversationSelectionRef}
{conversations}
{messageCountMap}
{mode}
{onCancel}
{onConfirm}
/>
</Dialog.Content>
</Dialog.Portal>
</Dialog.Root>

View File

@@ -1,56 +1,63 @@
// Chat
export { default as ChatAttachmentPreview } from './chat/ChatAttachments/ChatAttachmentPreview.svelte';
export { default as ChatAttachmentThumbnailFile } from './chat/ChatAttachments/ChatAttachmentThumbnailFile.svelte';
export { default as ChatAttachmentThumbnailImage } from './chat/ChatAttachments/ChatAttachmentThumbnailImage.svelte';
export { default as ChatAttachmentsList } from './chat/ChatAttachments/ChatAttachmentsList.svelte';
export { default as ChatAttachmentFilePreview } from './chat/ChatAttachments/ChatAttachmentFilePreview.svelte';
export { default as ChatAttachmentImagePreview } from './chat/ChatAttachments/ChatAttachmentImagePreview.svelte';
export { default as ChatAttachmentPreviewDialog } from './chat/ChatAttachments/ChatAttachmentPreviewDialog.svelte';
export { default as ChatAttachmentsViewAllDialog } from './chat/ChatAttachments/ChatAttachmentsViewAllDialog.svelte';
export { default as ChatAttachmentsViewAll } from './chat/ChatAttachments/ChatAttachmentsViewAll.svelte';
export { default as ChatForm } from './chat/ChatForm/ChatForm.svelte';
export { default as ChatFormTextarea } from './chat/ChatForm/ChatFormTextarea.svelte';
export { default as ChatFormActions } from './chat/ChatForm/ChatFormActions.svelte';
export { default as ChatFormActionFileAttachments } from './chat/ChatForm/ChatFormActionFileAttachments.svelte';
export { default as ChatFormActionRecord } from './chat/ChatForm/ChatFormActionRecord.svelte';
export { default as ChatFormModelSelector } from './chat/ChatForm/ChatFormModelSelector.svelte';
export { default as ChatFormHelperText } from './chat/ChatForm/ChatFormHelperText.svelte';
export { default as ChatFormActionFileAttachments } from './chat/ChatForm/ChatFormActions/ChatFormActionFileAttachments.svelte';
export { default as ChatFormActionRecord } from './chat/ChatForm/ChatFormActions/ChatFormActionRecord.svelte';
export { default as ChatFormActions } from './chat/ChatForm/ChatFormActions/ChatFormActions.svelte';
export { default as ChatFormFileInputInvisible } from './chat/ChatForm/ChatFormFileInputInvisible.svelte';
export { default as ChatFormHelperText } from './chat/ChatForm/ChatFormHelperText.svelte';
export { default as ChatFormModelSelector } from './chat/ChatForm/ChatFormModelSelector.svelte';
export { default as ChatFormTextarea } from './chat/ChatForm/ChatFormTextarea.svelte';
export { default as ChatMessage } from './chat/ChatMessages/ChatMessage.svelte';
export { default as ChatMessages } from './chat/ChatMessages/ChatMessages.svelte';
export { default as ChatMessageBranchingControls } from './chat/ChatMessages/ChatMessageBranchingControls.svelte';
export { default as ChatMessageThinkingBlock } from './chat/ChatMessages/ChatMessageThinkingBlock.svelte';
export { default as MessageBranchingControls } from './chat/ChatMessages/ChatMessageBranchingControls.svelte';
export { default as ChatProcessingInfo } from './chat/ChatProcessingInfo.svelte';
export { default as ChatScreenHeader } from './chat/ChatScreen/ChatScreenHeader.svelte';
export { default as ChatScreenWarning } from './chat/ChatScreen/ChatScreenWarning.svelte';
export { default as ChatScreen } from './chat/ChatScreen/ChatScreen.svelte';
export { default as ChatScreenHeader } from './chat/ChatScreen/ChatScreenHeader.svelte';
export { default as ChatScreenProcessingInfo } from './chat/ChatScreen/ChatScreenProcessingInfo.svelte';
export { default as ChatScreenWarning } from './chat/ChatScreen/ChatScreenWarning.svelte';
export { default as ChatSettingsDialog } from './chat/ChatSettings/ChatSettingsDialog.svelte';
export { default as ChatSettings } from './chat/ChatSettings/ChatSettings.svelte';
export { default as ChatSettingsFooter } from './chat/ChatSettings/ChatSettingsFooter.svelte';
export { default as ChatSettingsFields } from './chat/ChatSettings/ChatSettingsFields.svelte';
export { default as ImportExportTab } from './chat/ChatSettings/ImportExportTab.svelte';
export { default as ConversationSelectionDialog } from './chat/ChatSettings/ConversationSelectionDialog.svelte';
export { default as ParameterSourceIndicator } from './chat/ChatSettings/ParameterSourceIndicator.svelte';
export { default as ChatSettingsImportExportTab } from './chat/ChatSettings/ChatSettingsImportExportTab.svelte';
export { default as ChatSettingsParameterSourceIndicator } from './chat/ChatSettings/ChatSettingsParameterSourceIndicator.svelte';
export { default as ChatSidebar } from './chat/ChatSidebar/ChatSidebar.svelte';
export { default as ChatSidebarConversationItem } from './chat/ChatSidebar/ChatSidebarConversationItem.svelte';
export { default as ChatSidebarSearch } from './chat/ChatSidebar/ChatSidebarSearch.svelte';
export { default as ChatErrorDialog } from './dialogs/ChatErrorDialog.svelte';
export { default as EmptyFileAlertDialog } from './dialogs/EmptyFileAlertDialog.svelte';
export { default as ConversationTitleUpdateDialog } from './dialogs/ConversationTitleUpdateDialog.svelte';
// Dialogs
export { default as DialogChatAttachmentPreview } from './dialogs/DialogChatAttachmentPreview.svelte';
export { default as DialogChatAttachmentsViewAll } from './dialogs/DialogChatAttachmentsViewAll.svelte';
export { default as DialogChatError } from './dialogs/DialogChatError.svelte';
export { default as DialogChatSettings } from './dialogs/DialogChatSettings.svelte';
export { default as DialogConfirmation } from './dialogs/DialogConfirmation.svelte';
export { default as DialogConversationSelection } from './dialogs/DialogConversationSelection.svelte';
export { default as DialogConversationTitleUpdate } from './dialogs/DialogConversationTitleUpdate.svelte';
export { default as DialogEmptyFileAlert } from './dialogs/DialogEmptyFileAlert.svelte';
// Miscellanous
export { default as ActionButton } from './misc/ActionButton.svelte';
export { default as ActionDropdown } from './misc/ActionDropdown.svelte';
export { default as ConversationSelection } from './misc/ConversationSelection.svelte';
export { default as KeyboardShortcutInfo } from './misc/KeyboardShortcutInfo.svelte';
export { default as MarkdownContent } from './misc/MarkdownContent.svelte';
export { default as RemoveButton } from './misc/RemoveButton.svelte';
// Server
export { default as ServerStatus } from './server/ServerStatus.svelte';
export { default as ServerErrorSplash } from './server/ServerErrorSplash.svelte';
export { default as ServerLoadingSplash } from './server/ServerLoadingSplash.svelte';
export { default as ServerInfo } from './server/ServerInfo.svelte';
// Shared components
export { default as ActionButton } from './misc/ActionButton.svelte';
export { default as ActionDropdown } from './misc/ActionDropdown.svelte';
export { default as ConfirmationDialog } from './dialogs/ConfirmationDialog.svelte';

View File

@@ -0,0 +1,205 @@
<script lang="ts">
import { Search, X } from '@lucide/svelte';
import { Button } from '$lib/components/ui/button';
import { Input } from '$lib/components/ui/input';
import { Checkbox } from '$lib/components/ui/checkbox';
import { ScrollArea } from '$lib/components/ui/scroll-area';
import { SvelteSet } from 'svelte/reactivity';
interface Props {
conversations: DatabaseConversation[];
messageCountMap?: Map<string, number>;
mode: 'export' | 'import';
onCancel: () => void;
onConfirm: (selectedConversations: DatabaseConversation[]) => void;
}
let { conversations, messageCountMap = new Map(), mode, onCancel, onConfirm }: Props = $props();
let searchQuery = $state('');
let selectedIds = $state.raw<SvelteSet<string>>(new SvelteSet(conversations.map((c) => c.id)));
let lastClickedId = $state<string | null>(null);
let filteredConversations = $derived(
conversations.filter((conv) => {
const name = conv.name || 'Untitled conversation';
return name.toLowerCase().includes(searchQuery.toLowerCase());
})
);
let allSelected = $derived(
filteredConversations.length > 0 &&
filteredConversations.every((conv) => selectedIds.has(conv.id))
);
let someSelected = $derived(
filteredConversations.some((conv) => selectedIds.has(conv.id)) && !allSelected
);
function toggleConversation(id: string, shiftKey: boolean = false) {
const newSet = new SvelteSet(selectedIds);
if (shiftKey && lastClickedId !== null) {
const lastIndex = filteredConversations.findIndex((c) => c.id === lastClickedId);
const currentIndex = filteredConversations.findIndex((c) => c.id === id);
if (lastIndex !== -1 && currentIndex !== -1) {
const start = Math.min(lastIndex, currentIndex);
const end = Math.max(lastIndex, currentIndex);
const shouldSelect = !newSet.has(id);
for (let i = start; i <= end; i++) {
if (shouldSelect) {
newSet.add(filteredConversations[i].id);
} else {
newSet.delete(filteredConversations[i].id);
}
}
selectedIds = newSet;
return;
}
}
if (newSet.has(id)) {
newSet.delete(id);
} else {
newSet.add(id);
}
selectedIds = newSet;
lastClickedId = id;
}
function toggleAll() {
if (allSelected) {
const newSet = new SvelteSet(selectedIds);
filteredConversations.forEach((conv) => newSet.delete(conv.id));
selectedIds = newSet;
} else {
const newSet = new SvelteSet(selectedIds);
filteredConversations.forEach((conv) => newSet.add(conv.id));
selectedIds = newSet;
}
}
function handleConfirm() {
const selected = conversations.filter((conv) => selectedIds.has(conv.id));
onConfirm(selected);
}
function handleCancel() {
selectedIds = new SvelteSet(conversations.map((c) => c.id));
searchQuery = '';
lastClickedId = null;
onCancel();
}
export function reset() {
selectedIds = new SvelteSet(conversations.map((c) => c.id));
searchQuery = '';
lastClickedId = null;
}
</script>
<div class="space-y-4">
<div class="relative">
<Search class="absolute top-1/2 left-3 h-4 w-4 -translate-y-1/2 text-muted-foreground" />
<Input bind:value={searchQuery} placeholder="Search conversations..." class="pr-9 pl-9" />
{#if searchQuery}
<button
class="absolute top-1/2 right-3 -translate-y-1/2 text-muted-foreground hover:text-foreground"
onclick={() => (searchQuery = '')}
type="button"
>
<X class="h-4 w-4" />
</button>
{/if}
</div>
<div class="flex items-center justify-between text-sm text-muted-foreground">
<span>
{selectedIds.size} of {conversations.length} selected
{#if searchQuery}
({filteredConversations.length} shown)
{/if}
</span>
</div>
<div class="overflow-hidden rounded-md border">
<ScrollArea class="h-[400px]">
<table class="w-full">
<thead class="sticky top-0 z-10 bg-muted">
<tr class="border-b">
<th class="w-12 p-3 text-left">
<Checkbox
checked={allSelected}
indeterminate={someSelected}
onCheckedChange={toggleAll}
/>
</th>
<th class="p-3 text-left text-sm font-medium">Conversation Name</th>
<th class="w-32 p-3 text-left text-sm font-medium">Messages</th>
</tr>
</thead>
<tbody>
{#if filteredConversations.length === 0}
<tr>
<td colspan="3" class="p-8 text-center text-sm text-muted-foreground">
{#if searchQuery}
No conversations found matching "{searchQuery}"
{:else}
No conversations available
{/if}
</td>
</tr>
{:else}
{#each filteredConversations as conv (conv.id)}
<tr
class="cursor-pointer border-b transition-colors hover:bg-muted/50"
onclick={(e) => toggleConversation(conv.id, e.shiftKey)}
>
<td class="p-3">
<Checkbox
checked={selectedIds.has(conv.id)}
onclick={(e) => {
e.preventDefault();
e.stopPropagation();
toggleConversation(conv.id, e.shiftKey);
}}
/>
</td>
<td class="p-3 text-sm">
<div class="max-w-[17rem] truncate" title={conv.name || 'Untitled conversation'}>
{conv.name || 'Untitled conversation'}
</div>
</td>
<td class="p-3 text-sm text-muted-foreground">
{messageCountMap.get(conv.id) ?? 0}
</td>
</tr>
{/each}
{/if}
</tbody>
</table>
</ScrollArea>
</div>
<div class="flex justify-end gap-2">
<Button variant="outline" onclick={handleCancel}>Cancel</Button>
<Button onclick={handleConfirm} disabled={selectedIds.size === 0}>
{mode === 'export' ? 'Export' : 'Import'} ({selectedIds.size})
</Button>
</div>
</div>

View File

@@ -1,7 +1,7 @@
<script lang="ts">
import '../app.css';
import { page } from '$app/state';
import { ChatSidebar, ConversationTitleUpdateDialog } from '$lib/components/app';
import { ChatSidebar, DialogConversationTitleUpdate } from '$lib/components/app';
import {
activeMessages,
isLoading,
@@ -150,7 +150,7 @@
<Toaster richColors />
<ConversationTitleUpdateDialog
<DialogConversationTitleUpdate
bind:open={titleUpdateDialogOpen}
currentTitle={titleUpdateCurrentTitle}
newTitle={titleUpdateNewTitle}

View File

@@ -0,0 +1,19 @@
<script module>
import { defineMeta } from '@storybook/addon-svelte-csf';
import { ChatSettings } from '$lib/components/app';
import { fn } from 'storybook/test';
const { Story } = defineMeta({
title: 'Components/ChatSettings',
component: ChatSettings,
parameters: {
layout: 'fullscreen'
},
args: {
onClose: fn(),
onSave: fn()
}
});
</script>
<Story name="Default" />

View File

@@ -1,26 +0,0 @@
<script module>
import { defineMeta } from '@storybook/addon-svelte-csf';
import { ChatSettingsDialog } from '$lib/components/app';
import { fn } from 'storybook/test';
const { Story } = defineMeta({
title: 'Components/ChatSettingsDialog',
component: ChatSettingsDialog,
parameters: {
layout: 'fullscreen'
},
argTypes: {
open: {
control: 'boolean',
description: 'Whether the dialog is open'
}
},
args: {
onOpenChange: fn()
}
});
</script>
<Story name="Open" args={{ open: true }} />
<Story name="Closed" args={{ open: false }} />

View File

@@ -22,7 +22,38 @@ target_compile_definitions(${TARGET} PRIVATE
CPPHTTPLIB_TCP_NODELAY=1
)
if (LLAMA_OPENSSL)
if (LLAMA_BUILD_BORINGSSL)
set(OPENSSL_NO_ASM ON CACHE BOOL "Disable OpenSSL ASM code (BoringSSL)")
set(FIPS OFF CACHE BOOL "Enable FIPS (BoringSSL)")
set(BORINGSSL_GIT "https://boringssl.googlesource.com/boringssl" CACHE STRING "BoringSSL git repository")
set(BORINGSSL_VERSION "0.20251002.0" CACHE STRING "BoringSSL version")
message(STATUS "Fetching BoringSSL version ${BORINGSSL_VERSION}")
include(FetchContent)
FetchContent_Declare(
boringssl
GIT_REPOSITORY ${BORINGSSL_GIT}
GIT_TAG ${BORINGSSL_VERSION}
PATCH_COMMAND ${CMAKE_COMMAND} -P "${CMAKE_CURRENT_SOURCE_DIR}/patch-boringssl.cmake"
)
set(SAVED_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS})
set(SAVED_BUILD_TESTING ${BUILD_TESTING})
set(BUILD_SHARED_LIBS OFF)
set(BUILD_TESTING OFF)
FetchContent_MakeAvailable(boringssl)
set(BUILD_SHARED_LIBS ${SAVED_BUILD_SHARED_LIBS})
set(BUILD_TESTING ${SAVED_BUILD_TESTING})
set(CPPHTTPLIB_OPENSSL_SUPPORT TRUE)
target_link_libraries(${TARGET} PUBLIC ssl crypto)
elseif (LLAMA_OPENSSL)
find_package(OpenSSL)
if (OpenSSL_FOUND)
include(CheckCSourceCompiles)
@@ -44,17 +75,20 @@ if (LLAMA_OPENSSL)
set(CMAKE_REQUIRED_INCLUDES ${SAVED_CMAKE_REQUIRED_INCLUDES})
if (OPENSSL_VERSION_SUPPORTED)
message(STATUS "OpenSSL found: ${OPENSSL_VERSION}")
target_compile_definitions(${TARGET} PUBLIC CPPHTTPLIB_OPENSSL_SUPPORT)
set(CPPHTTPLIB_OPENSSL_SUPPORT TRUE)
target_link_libraries(${TARGET} PUBLIC OpenSSL::SSL OpenSSL::Crypto)
if (APPLE AND CMAKE_SYSTEM_NAME STREQUAL "Darwin")
target_compile_definitions(${TARGET} PUBLIC CPPHTTPLIB_USE_CERTS_FROM_MACOSX_KEYCHAIN)
find_library(CORE_FOUNDATION_FRAMEWORK CoreFoundation REQUIRED)
find_library(SECURITY_FRAMEWORK Security REQUIRED)
target_link_libraries(${TARGET} PUBLIC ${CORE_FOUNDATION_FRAMEWORK} ${SECURITY_FRAMEWORK})
endif()
endif()
else()
message(STATUS "OpenSSL not found, SSL support disabled")
endif()
endif()
if (CPPHTTPLIB_OPENSSL_SUPPORT)
target_compile_definitions(${TARGET} PUBLIC CPPHTTPLIB_OPENSSL_SUPPORT) # used in server.cpp
if (APPLE AND CMAKE_SYSTEM_NAME STREQUAL "Darwin")
target_compile_definitions(${TARGET} PRIVATE CPPHTTPLIB_USE_CERTS_FROM_MACOSX_KEYCHAIN)
find_library(CORE_FOUNDATION_FRAMEWORK CoreFoundation REQUIRED)
find_library(SECURITY_FRAMEWORK Security REQUIRED)
target_link_libraries(${TARGET} PUBLIC ${CORE_FOUNDATION_FRAMEWORK} ${SECURITY_FRAMEWORK})
endif()
endif()

View File

@@ -0,0 +1,6 @@
# Remove bssl
file(READ "CMakeLists.txt" content)
string(REPLACE "add_executable(bssl" "#add_executable(bssl" content "${content}")
string(REPLACE "target_link_libraries(bssl" "#target_link_libraries(bssl" content "${content}")
string(REPLACE "install(TARGETS bssl" "#install(TARGETS bssl" content "${content}")
file(WRITE "CMakeLists.txt" "${content}")