Compare commits

..

11 Commits
b7930 ... b7941

Author SHA1 Message Date
Sigbjørn Skjæret
11fb327bf3 vendor : add missing llama_add_compile_flags (#19322)
* add missing llama_add_compile_flags

* disable all warnings for ssl, crypto and fipsmodule
2026-02-05 02:27:38 +01:00
Aaron Teo
e6e934c5ea vendor: update cpp-httplib version (#19313)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2026-02-05 05:15:03 +08:00
Daniel Bevenius
b536eb0233 codeowners : add danbev for examples/debug (#19332)
* codeowners : add danbev for examples/debug

* Add @pwilkin to CODEOWNERS for debug

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-02-04 20:20:40 +01:00
Xuan-Son Nguyen
e0c93af2a0 debug: make common_debug_print_tensor readable (#19331)
* debug: make common_debug_print_tensor readable

* editorconfig
2026-02-04 17:55:31 +01:00
Georgi Gerganov
423bee462b ci : fix sanitize workflow to enable ggml sanitizers too (#19323) 2026-02-04 15:12:03 +02:00
Xuan-Son Nguyen
8abcc70a74 model: (qwen3next) correct vectorized key_gdiff calculation (#19324)
* model: (qwen3next) correct vectorized key_gdiff calculation

* move transpose to outside of loop
2026-02-04 13:09:58 +01:00
Georgi Gerganov
eaba92c3dc tests : add non-cont, inplace rope tests (#19296)
* tests : add non-cont, inplace rope tests

* cont : exercise dim 3

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

* cont : more dim3 exercises

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2026-02-04 12:45:21 +02:00
Daniel Bevenius
6ab881b7c3 model-conversion : add tensor-info.py utility (#18954)
This commit adds a new python script that can be used to print tensors
information from a tensor in a safetensors model.

The motivation for this is that during model conversion work it can
sometimes be useful to verify the shape of tensors in the original
model. While it is possible to print the tensors when loading the model
this can be slow when working with larger models.
With this script it is possible to quickly query tensor shapes.

Example usage:
```console
(venv) $ ./scripts/utils/tensor-info.py --help
usage: tensor-info.py [-h] [-m MODEL_PATH] [-l] [tensor_name]

Print tensor information from a safetensors model

positional arguments:
  tensor_name           Name of the tensor to inspect

options:
  -h, --help            show this help message and exit
  -m MODEL_PATH, --model-path MODEL_PATH
                        Path to the model directory (default: MODEL_PATH environment variable)
  -l, --list            List unique tensor patterns in the model (layer numbers replaced with #)
```

Listing tensor names:
```console
(venv) $ ./scripts/utils/tensor-info.py -m ~/work/ai/models/google/embeddinggemma-300m -l
embed_tokens.weight
layers.#.input_layernorm.weight
layers.#.mlp.down_proj.weight
layers.#.mlp.gate_proj.weight
layers.#.mlp.up_proj.weight
layers.#.post_attention_layernorm.weight
layers.#.post_feedforward_layernorm.weight
layers.#.pre_feedforward_layernorm.weight
layers.#.self_attn.k_norm.weight
layers.#.self_attn.k_proj.weight
layers.#.self_attn.o_proj.weight
layers.#.self_attn.q_norm.weight
layers.#.self_attn.q_proj.weight
layers.#.self_attn.v_proj.weight
norm.weight
```

Printing a specific tensor's information:
```console
(venv) $ ./scripts/utils/tensor-info.py -m ~/work/ai/models/google/embeddinggemma-300m layers.0.input_layernorm.weight
Tensor: layers.0.input_layernorm.weight
File:   model.safetensors
Shape:  [768]
```
2026-02-04 10:40:53 +01:00
Georgi Gerganov
d838c22bb3 spec : fix the check-rate logic of ngram-simple (#19261)
* spec : fix the check-rate logic of ngram-simple

* cont : refactor + fix checks
2026-02-04 10:39:53 +02:00
Daniel Bevenius
25f40ca65f completion : simplify batch (embd) processing (#19286)
* completion : simplify batch (embd) processing

This commit simplifies the processing of embd by removing the for loop
that currently exists which uses params.n_batch as its increment. This
commit also removes the clamping of n_eval as the size of embd is always
at most the size of params.n_batch.

The motivation is to clarify the code as it is currently a little
confusing when looking at this for loop in isolation and thinking that
it can process multiple batches.

* add an assert to verify n_eval is not greater than n_batch
2026-02-04 05:43:28 +01:00
Kevin Pouget
015deb9048 ggml-virtgpu: make the code thread safe (#19204)
* ggml-virtgpu: regenerate_remoting.py: add the ability to deprecate a function

* ggml-virtgpu: deprecate buffer_type is_host remoting

not necessary

* ggml-virtgpu: stop using static vars as cache

The static init isn't thread safe.

* ggml-virtgpu: protect the use of the shared memory to transfer data

* ggml-virtgpu: make the remote calls thread-safe

* ggml-virtgpu: backend: don't continue if couldn't allocate the tensor memory

* ggml-virtgpu: add a cleanup function for consistency

* ggml-virtgpu: backend: don't crash if buft->iface.get_max_size is missing

* fix style and ordering

* Remove the static variable in apir_device_get_count

* ggml-virtgpu: improve the logging

* fix review minor formatting changes
2026-02-04 10:46:18 +08:00
41 changed files with 920 additions and 358 deletions

View File

@@ -293,6 +293,7 @@ jobs:
cmake -B build \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON \
-DGGML_SANITIZE_${{ matrix.sanitizer }}=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }}
cmake --build build --config ${{ matrix.build_type }} -j $(nproc)
@@ -303,6 +304,7 @@ jobs:
cmake -B build \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON \
-DGGML_SANITIZE_${{ matrix.sanitizer }}=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \
-DGGML_OPENMP=OFF
cmake --build build --config ${{ matrix.build_type }} -j $(nproc)

View File

@@ -27,6 +27,7 @@
/examples/batched.swift/ @ggerganov
/examples/batched/ @ggerganov
/examples/convert-llama2c-to-ggml/ @ggerganov
/examples/debug/ @danbev @pwilkin
/examples/deprecation-warning/ @ggerganov
/examples/diffusion/ @am17an
/examples/embedding/ @ggerganov

View File

@@ -45,6 +45,8 @@ static float common_ggml_get_float_value(const uint8_t * data,
return v;
}
#define INDENT " "
template <bool abort>
void common_debug_print_tensor(uint8_t * data, ggml_type type, const int64_t * ne, const size_t * nb, int64_t n) {
GGML_ASSERT(n > 0);
@@ -60,41 +62,41 @@ void common_debug_print_tensor(uint8_t * data, ggml_type type, const int64_t * n
}
}
for (int64_t i3 = 0; i3 < ne[3]; i3++) {
LOG_ERR(" [\n");
LOG(INDENT "[\n");
for (int64_t i2 = 0; i2 < ne[2]; i2++) {
if (i2 == n && ne[2] > 2 * n) {
LOG_ERR(" ..., \n");
LOG(INDENT INDENT "..., \n");
i2 = ne[2] - n;
}
LOG_ERR(" [\n");
LOG(INDENT INDENT "[\n");
for (int64_t i1 = 0; i1 < ne[1]; i1++) {
if (i1 == n && ne[1] > 2 * n) {
LOG_ERR(" ..., \n");
LOG(INDENT INDENT INDENT "..., \n");
i1 = ne[1] - n;
}
LOG_ERR(" [");
LOG(INDENT INDENT INDENT "[");
for (int64_t i0 = 0; i0 < ne[0]; i0++) {
if (i0 == n && ne[0] > 2 * n) {
LOG_ERR("..., ");
LOG(" ..., ");
i0 = ne[0] - n;
}
const float v = common_ggml_get_float_value(data, type, nb, i0, i1, i2, i3);
LOG_ERR("%12.4f", v);
LOG("%12.4f", v);
if (i0 < ne[0] - 1) {
LOG_ERR(", ");
LOG(", ");
}
}
LOG_ERR("],\n");
LOG(" ],\n");
}
LOG_ERR(" ],\n");
LOG(INDENT INDENT "],\n");
}
LOG_ERR(" ]\n");
LOG_ERR(" sum = %f\n", sum);
LOG(INDENT "]\n");
LOG(INDENT "sum = %f\n", sum);
}
if constexpr (abort) {
if (std::isnan(sum)) {
LOG_ERR("encountered NaN - aborting\n");
LOG("encountered NaN - aborting\n");
exit(0);
}
}
@@ -137,9 +139,9 @@ template <bool abort_on_nan> bool common_debug_cb_eval(struct ggml_tensor * t, b
}
if (matches_filter) {
LOG_ERR("%s: %24s = (%s) %10s(%s{%s}, %s}) = {%s}\n", __func__, t->name, ggml_type_name(t->type),
ggml_op_desc(t), src0->name, common_ggml_ne_string(src0).c_str(), src1 ? src1_str : "",
common_ggml_ne_string(t).c_str());
LOG("%s: %24s = (%s) %10s(%s{%s}, %s}) = {%s}\n", __func__, t->name, ggml_type_name(t->type),
ggml_op_desc(t), src0->name, common_ggml_ne_string(src0).c_str(), src1 ? src1_str : "",
common_ggml_ne_string(t).c_str());
}
const bool is_host = ggml_backend_buffer_is_host(t->buffer);

View File

@@ -47,21 +47,15 @@ static std::string common_tokens_to_str(const llama_tokens & inp, size_t start,
* @return Vector of draft tokens, empty if no matching pattern is found
*/
llama_tokens common_ngram_simple_draft(
common_ngram_simple_state & state,
const common_ngram_simple_config & config,
const llama_tokens & tokens, llama_token sampled) {
// Simple implementation of self-speculative decoding without a draft model.
//
const size_t cur_len = tokens.size();
// Only check every check_rate tokens to save compute
// i.e., perform check if (cur_len - idx_last_check) >= check_rate
if (state.idx_last_check + state.config.check_rate > cur_len) {
llama_tokens draft_tokens;
return draft_tokens;
}
size_t n_draft_min = state.config.size_ngram; // size of n-gram to lookup in token history
size_t n_draft_max = state.config.size_mgram; // the m-gram following the found n-gram is used for draft
const size_t n_draft_min = config.size_ngram; // size of n-gram to lookup in token history
const size_t n_draft_max = config.size_mgram; // the m-gram following the found n-gram is used for draft
// vector for tokens we want to verify.
// return empty vector if there is no match.
@@ -80,9 +74,6 @@ llama_tokens common_ngram_simple_draft(
}
pattern.push_back(sampled); // add the last token to the pattern
// We do a search in the token history.
state.idx_last_check = cur_len;
size_t match_pos = 0; // we ignore position 0, position 0 == no match
// search backwards, but skip the current match (we are currently there)
for (size_t j = cur_len - n_draft_min - 1; j > 0; --j) {

View File

@@ -27,23 +27,9 @@ struct common_ngram_simple_config {
uint16_t check_rate; // check for speculative decoding without draft model for each check_rate token
};
// current state (and config) of n-gram simple.
struct common_ngram_simple_state {
common_ngram_simple_config config;
size_t idx_last_check = 0; // index of last check in context history (mutable)
common_ngram_simple_state(const common_ngram_simple_config & config)
: config(config) {}
};
// Searches for a n-gram in the history and checks whether a draft sequence should be generated.
// state: the ngram simple state to search in.
// inp: the tokens generated so far.
// sampled: the token that was just sampled.
// draft: vector to store the draft tokens, initially empty.
llama_tokens common_ngram_simple_draft(
common_ngram_simple_state & state,
const common_ngram_simple_config & config,
const llama_tokens & tokens, llama_token sampled);

View File

@@ -463,12 +463,14 @@ struct common_speculative_state_eagle3 : public common_speculative_state {
// state of self-speculation (simple implementation, not ngram-map)
struct common_speculative_state_ngram_simple : public common_speculative_state {
common_ngram_simple_state state;
common_ngram_simple_config config;
uint16_t check_id = 0; // used to control the frequency of generating drafts
common_speculative_state_ngram_simple(
enum common_speculative_type type,
common_ngram_simple_state state)
: common_speculative_state(type), state(state) {}
common_ngram_simple_config config)
: common_speculative_state(type), config(config) {}
void begin(const llama_tokens & prompt) override {
GGML_UNUSED(prompt);
@@ -479,7 +481,13 @@ struct common_speculative_state_ngram_simple : public common_speculative_state {
const llama_tokens & prompt_tgt,
llama_token id_last,
llama_tokens & result) override {
result = common_ngram_simple_draft(state, prompt_tgt, id_last);
++check_id;
if (check_id < config.check_rate) {
return;
}
check_id = 0;
result = common_ngram_simple_draft(config, prompt_tgt, id_last);
GGML_UNUSED(params);
}
@@ -889,14 +897,14 @@ common_speculative * common_speculative_init(
uint16_t mgram_size_value = ngram_map.size_value;
uint16_t check_rate = ngram_map.check_rate;
auto config_simple = common_ngram_simple_config{
auto config_simple = common_ngram_simple_config {
/* .size_ngram = */ ngram_size_key,
/* .size_mgram = */ mgram_size_value,
/* .check_rate = */ check_rate
};
auto state = std::make_unique<common_speculative_state_ngram_simple>(
/* .type = */ config.type,
/* .state = */ common_ngram_simple_state(config_simple)
/* .state = */ config_simple
);
impls.push_back(std::move(state));
break;

View File

@@ -0,0 +1,159 @@
#!/usr/bin/env python3
import argparse
import json
import os
import re
import sys
from pathlib import Path
from typing import Optional
from safetensors import safe_open
MODEL_SAFETENSORS_FILE = "model.safetensors"
MODEL_SAFETENSORS_INDEX = "model.safetensors.index.json"
def get_weight_map(model_path: Path) -> Optional[dict[str, str]]:
index_file = model_path / MODEL_SAFETENSORS_INDEX
if index_file.exists():
with open(index_file, 'r') as f:
index = json.load(f)
return index.get("weight_map", {})
return None
def get_all_tensor_names(model_path: Path) -> list[str]:
weight_map = get_weight_map(model_path)
if weight_map is not None:
return list(weight_map.keys())
single_file = model_path / MODEL_SAFETENSORS_FILE
if single_file.exists():
try:
with safe_open(single_file, framework="pt", device="cpu") as f:
return list(f.keys())
except Exception as e:
print(f"Error reading {single_file}: {e}")
sys.exit(1)
print(f"Error: No safetensors files found in {model_path}")
sys.exit(1)
def find_tensor_file(model_path: Path, tensor_name: str) -> Optional[str]:
weight_map = get_weight_map(model_path)
if weight_map is not None:
return weight_map.get(tensor_name)
single_file = model_path / MODEL_SAFETENSORS_FILE
if single_file.exists():
return single_file.name
return None
def normalize_tensor_name(tensor_name: str) -> str:
normalized = re.sub(r'\.\d+\.', '.#.', tensor_name)
normalized = re.sub(r'\.\d+$', '.#', normalized)
return normalized
def list_all_tensors(model_path: Path, unique: bool = False):
tensor_names = get_all_tensor_names(model_path)
if unique:
seen = set()
for tensor_name in sorted(tensor_names):
normalized = normalize_tensor_name(tensor_name)
if normalized not in seen:
seen.add(normalized)
print(normalized)
else:
for tensor_name in sorted(tensor_names):
print(tensor_name)
def print_tensor_info(model_path: Path, tensor_name: str):
tensor_file = find_tensor_file(model_path, tensor_name)
if tensor_file is None:
print(f"Error: Could not find tensor '{tensor_name}' in model index")
print(f"Model path: {model_path}")
sys.exit(1)
file_path = model_path / tensor_file
try:
with safe_open(file_path, framework="pt", device="cpu") as f:
if tensor_name in f.keys():
tensor_slice = f.get_slice(tensor_name)
shape = tensor_slice.get_shape()
print(f"Tensor: {tensor_name}")
print(f"File: {tensor_file}")
print(f"Shape: {shape}")
else:
print(f"Error: Tensor '{tensor_name}' not found in {tensor_file}")
sys.exit(1)
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
sys.exit(1)
except Exception as e:
print(f"An error occurred: {e}")
sys.exit(1)
def main():
parser = argparse.ArgumentParser(
description="Print tensor information from a safetensors model"
)
parser.add_argument(
"tensor_name",
nargs="?", # optional (if --list is used for example)
help="Name of the tensor to inspect"
)
parser.add_argument(
"-m", "--model-path",
type=Path,
help="Path to the model directory (default: MODEL_PATH environment variable)"
)
parser.add_argument(
"-l", "--list",
action="store_true",
help="List unique tensor patterns in the model (layer numbers replaced with #)"
)
args = parser.parse_args()
model_path = args.model_path
if model_path is None:
model_path_str = os.environ.get("MODEL_PATH")
if model_path_str is None:
print("Error: --model-path not provided and MODEL_PATH environment variable not set")
sys.exit(1)
model_path = Path(model_path_str)
if not model_path.exists():
print(f"Error: Model path does not exist: {model_path}")
sys.exit(1)
if not model_path.is_dir():
print(f"Error: Model path is not a directory: {model_path}")
sys.exit(1)
if args.list:
list_all_tensors(model_path, unique=True)
else:
if args.tensor_name is None:
print("Error: tensor_name is required when not using --list")
sys.exit(1)
print_tensor_info(model_path, args.tensor_name)
if __name__ == "__main__":
main()

View File

@@ -7,8 +7,6 @@
extern "C" {
#endif
#define GGML_REMOTING_FRONTEND_NAME "RemotingFrontend"
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_virtgpu_reg();
#ifdef __cplusplus

View File

@@ -36,7 +36,7 @@ apir_rpc_tensor apir_serialize_tensor(const ggml_tensor * tensor) {
result.data = reinterpret_cast<uint64_t>(tensor->data);
if (tensor->data) {
if (!tensor->buffer) {
GGML_ABORT("tensor has data but not buffer");
GGML_ABORT("%s: tensor has data but not buffer", __func__);
}
// tensor->data is serialized as an offset to the buffer base address
result.data -= reinterpret_cast<uint64_t>(BUFFER_TO_GGML_CONTEXT(tensor->buffer)->base);

View File

@@ -27,7 +27,7 @@ uint32_t backend_backend_graph_compute(apir_encoder * enc, apir_decoder * dec, v
const void * shmem_data = ctx->iface->get_shmem_ptr(ctx->ctx_id, shmem_res_id);
if (!shmem_data) {
GGML_LOG_ERROR("Couldn't get the shmem addr from virgl\n");
GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Couldn't get the shmem addr from virgl\n", __func__);
apir_decoder_set_fatal(dec);
return 1;
}
@@ -45,7 +45,7 @@ uint32_t backend_backend_graph_compute(apir_encoder * enc, apir_decoder * dec, v
if (dev->iface.supports_op(dev, op)) {
continue;
}
GGML_LOG_ERROR("Graph node %d (%s) not supported by the backend\n", idx, ggml_op_desc(op));
GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Graph node %d (%s) not supported by the backend\n", idx, ggml_op_desc(op));
status = GGML_STATUS_ABORTED;
apir_encode_ggml_status(enc, &status);

View File

@@ -36,18 +36,22 @@ uint32_t backend_buffer_type_get_max_size(apir_encoder * enc, apir_decoder * dec
ggml_backend_buffer_type_t buft;
buft = apir_decode_ggml_buffer_type(dec);
size_t value = buft->iface.get_max_size(buft);
size_t value = SIZE_MAX;
if (buft->iface.get_max_size) {
value = buft->iface.get_max_size(buft);
}
apir_encode_size_t(enc, &value);
return 0;
}
/* APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST is deprecated. Keeping the handler for backward compatibility. */
uint32_t backend_buffer_type_is_host(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx) {
GGML_UNUSED(ctx);
ggml_backend_buffer_type_t buft;
buft = apir_decode_ggml_buffer_type(dec);
GGML_UNUSED(dec);
const bool is_host = false;
bool is_host = buft->iface.is_host(buft);
apir_encode_bool_t(enc, &is_host);
return 0;

View File

@@ -40,7 +40,7 @@ uint32_t backend_buffer_set_tensor(apir_encoder * enc, apir_decoder * dec, virgl
void * shmem_data = ctx->iface->get_shmem_ptr(ctx->ctx_id, shmem_res_id);
if (!shmem_data) {
GGML_LOG_ERROR("Couldn't get the shmem addr from virgl\n");
GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Couldn't get the shmem addr from virgl\n", __func__);
return 1;
}
@@ -71,7 +71,7 @@ uint32_t backend_buffer_get_tensor(apir_encoder * enc, apir_decoder * dec, virgl
void * shmem_data = ctx->iface->get_shmem_ptr(ctx->ctx_id, shmem_res_id);
if (!shmem_data) {
GGML_LOG_ERROR("Couldn't get the shmem addr from virgl\n");
GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Couldn't get the shmem addr from virgl\n", __func__);
return 1;
}
@@ -121,7 +121,7 @@ uint32_t backend_buffer_free_buffer(apir_encoder * enc, apir_decoder * dec, virg
buffer = apir_decode_ggml_buffer(dec);
if (!apir_untrack_backend_buffer(buffer)) {
GGML_LOG_WARN("%s: unknown buffer %p\n", __func__, (void *) buffer);
GGML_LOG_WARN(GGML_VIRTGPU_BCK "%s: unknown buffer %p\n", __func__, (void *) buffer);
return 1;
}

View File

@@ -124,7 +124,7 @@ uint32_t backend_device_buffer_from_ptr(apir_encoder * enc, apir_decoder * dec,
void * shmem_ptr = ctx->iface->get_shmem_ptr(ctx->ctx_id, shmem_res_id);
if (!shmem_ptr) {
GGML_LOG_ERROR("Couldn't get the shmem addr from virgl\n");
GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: Couldn't get the shmem addr from virgl\n", __func__);
apir_decoder_set_fatal(dec);
return 1;
}

View File

@@ -17,26 +17,26 @@ uint64_t timer_count = 0;
uint32_t backend_dispatch_initialize(void * ggml_backend_reg_fct_p) {
if (reg != NULL) {
GGML_LOG_WARN("%s: already initialized\n", __func__);
GGML_LOG_WARN(GGML_VIRTGPU_BCK "%s: already initialized\n", __func__);
return APIR_BACKEND_INITIALIZE_ALREADY_INITED;
}
ggml_backend_reg_t (*ggml_backend_reg_fct)(void) = (ggml_backend_reg_t (*)()) ggml_backend_reg_fct_p;
reg = ggml_backend_reg_fct();
if (reg == NULL) {
GGML_LOG_ERROR("%s: backend registration failed\n", __func__);
GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: backend registration failed\n", __func__);
return APIR_BACKEND_INITIALIZE_BACKEND_REG_FAILED;
}
if (!reg->iface.get_device_count(reg)) {
GGML_LOG_ERROR("%s: backend initialization failed: no device found\n", __func__);
GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: backend initialization failed: no device found\n", __func__);
return APIR_BACKEND_INITIALIZE_NO_DEVICE;
}
dev = reg->iface.get_device(reg, 0);
if (!dev) {
GGML_LOG_ERROR("%s: backend initialization failed: no device received\n", __func__);
GGML_LOG_ERROR(GGML_VIRTGPU_BCK "%s: backend initialization failed: no device received\n", __func__);
return APIR_BACKEND_INITIALIZE_NO_DEVICE;
}

View File

@@ -16,6 +16,7 @@ uint32_t backend_device_buffer_from_ptr(apir_encoder * enc, apir_decoder * dec,
uint32_t backend_buffer_type_get_name(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx);
uint32_t backend_buffer_type_get_alignment(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx);
uint32_t backend_buffer_type_get_max_size(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx);
/* APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST is deprecated. Keeping the handler for backward compatibility. */
uint32_t backend_buffer_type_is_host(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx);
uint32_t backend_buffer_type_alloc_buffer(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx);
uint32_t backend_buffer_type_get_alloc_size(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx);
@@ -62,7 +63,7 @@ static inline const char * backend_dispatch_command_name(ApirBackendCommandType
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_MAX_SIZE:
return "backend_buffer_type_get_max_size";
case APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST:
return "backend_buffer_type_is_host";
return "backend_buffer_type_is_host (DEPRECATED)";
case APIR_COMMAND_TYPE_BUFFER_TYPE_ALLOC_BUFFER:
return "backend_buffer_type_alloc_buffer";
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_ALLOC_SIZE:
@@ -110,7 +111,7 @@ static const backend_dispatch_t apir_backend_dispatch_table[APIR_BACKEND_DISPATC
/* APIR_COMMAND_TYPE_BUFFER_TYPE_GET_NAME = */ backend_buffer_type_get_name,
/* APIR_COMMAND_TYPE_BUFFER_TYPE_GET_ALIGNMENT = */ backend_buffer_type_get_alignment,
/* APIR_COMMAND_TYPE_BUFFER_TYPE_GET_MAX_SIZE = */ backend_buffer_type_get_max_size,
/* APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST = */ backend_buffer_type_is_host,
/* APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST = */ backend_buffer_type_is_host /* DEPRECATED */,
/* APIR_COMMAND_TYPE_BUFFER_TYPE_ALLOC_BUFFER = */ backend_buffer_type_alloc_buffer,
/* APIR_COMMAND_TYPE_BUFFER_TYPE_GET_ALLOC_SIZE = */ backend_buffer_type_get_alloc_size,

View File

@@ -11,6 +11,8 @@
#include "shared/apir_cs.h"
#include "shared/apir_cs_ggml.h"
#define GGML_VIRTGPU_BCK "ggml-virtgpu-backend: "
struct virgl_apir_context {
uint32_t ctx_id;
virgl_apir_callbacks * iface;

View File

@@ -35,14 +35,8 @@ void apir_backend_deinit(uint32_t virgl_ctx_id) {
buffer->iface.free_buffer(buffer);
}
if (dev) {
size_t free, total;
dev->iface.get_memory(dev, &free, &total);
GGML_LOG_INFO("%s: free memory: %ld MB\n", __func__, (size_t) free / 1024 / 1024);
}
if (backend_library_handle) {
GGML_LOG_INFO("%s: The GGML backend library was loaded. Unloading it.\n", __func__);
GGML_LOG_INFO(GGML_VIRTGPU_BCK "The GGML backend library was loaded. Unloading it.\n");
dlclose(backend_library_handle);
backend_library_handle = NULL;
}
@@ -65,7 +59,7 @@ ApirLoadLibraryReturnCode apir_backend_initialize(uint32_t virgl_ctx_id, struct
if (apir_logfile) {
ggml_log_set(log_to_file_callback, apir_logfile);
} else {
GGML_LOG_INFO("Could not open the log file at '%s'\n", apir_log_to_file);
GGML_LOG_INFO(GGML_VIRTGPU_BCK "Could not open the log file at '%s'\n", apir_log_to_file);
}
}
@@ -74,7 +68,10 @@ ApirLoadLibraryReturnCode apir_backend_initialize(uint32_t virgl_ctx_id, struct
const char * library_reg = virgl_library_reg ? virgl_library_reg : GGML_DEFAULT_BACKEND_REG;
if (!library_name) {
GGML_LOG_ERROR("cannot open the GGML library: env var '%s' not defined\n", APIR_LLAMA_CPP_GGML_LIBRARY_PATH_ENV);
GGML_LOG_ERROR(GGML_VIRTGPU_BCK
"%s: cannot open the GGML library: env var '%s' not defined\n",
__func__, APIR_LLAMA_CPP_GGML_LIBRARY_PATH_ENV);
return APIR_LOAD_LIBRARY_ENV_VAR_MISSING;
}
@@ -82,13 +79,16 @@ ApirLoadLibraryReturnCode apir_backend_initialize(uint32_t virgl_ctx_id, struct
backend_library_handle = dlopen(library_name, RTLD_LAZY);
if (!backend_library_handle) {
GGML_LOG_ERROR("cannot open the GGML library: %s\n", dlerror());
GGML_LOG_ERROR(GGML_VIRTGPU_BCK
"%s: cannot open the GGML library: %s\n", __func__, dlerror());
return APIR_LOAD_LIBRARY_CANNOT_OPEN;
}
if (!library_reg) {
GGML_LOG_ERROR("cannot register the GGML library: env var '%s' not defined\n", APIR_LLAMA_CPP_GGML_LIBRARY_REG_ENV);
GGML_LOG_ERROR(GGML_VIRTGPU_BCK
"%s: cannot register the GGML library: env var '%s' not defined\n",
__func__, APIR_LLAMA_CPP_GGML_LIBRARY_REG_ENV);
return APIR_LOAD_LIBRARY_ENV_VAR_MISSING;
}
@@ -96,8 +96,10 @@ ApirLoadLibraryReturnCode apir_backend_initialize(uint32_t virgl_ctx_id, struct
void * ggml_backend_reg_fct = dlsym(backend_library_handle, library_reg);
dlsym_error = dlerror();
if (dlsym_error) {
GGML_LOG_ERROR("cannot find the GGML backend registration symbol '%s' (from %s): %s\n", library_reg,
APIR_LLAMA_CPP_GGML_LIBRARY_REG_ENV, dlsym_error);
GGML_LOG_ERROR(GGML_VIRTGPU_BCK
"%s: cannot find the GGML backend registration symbol '%s' (from %s): %s\n",
__func__, library_reg, APIR_LLAMA_CPP_GGML_LIBRARY_REG_ENV, dlsym_error);
return APIR_LOAD_LIBRARY_SYMBOL_MISSING;
}
@@ -134,7 +136,9 @@ uint32_t apir_backend_dispatcher(uint32_t virgl_ctx_id,
};
if (cmd_type >= APIR_BACKEND_DISPATCH_TABLE_COUNT) {
GGML_LOG_ERROR("Received an invalid dispatch index (%d >= %d)\n", cmd_type, APIR_BACKEND_DISPATCH_TABLE_COUNT);
GGML_LOG_ERROR(GGML_VIRTGPU_BCK
"%s: Received an invalid dispatch index (%d >= %d)\n",
__func__, cmd_type, APIR_BACKEND_DISPATCH_TABLE_COUNT);
return APIR_BACKEND_FORWARD_INDEX_INVALID;
}

View File

@@ -86,7 +86,7 @@ static inline bool apir_decoder_peek_internal(apir_decoder * dec,
assert(val_size <= size);
if (unlikely(size > (size_t) (dec->end - dec->cur))) {
GGML_LOG_ERROR("reading too much from the decoder ...\n");
GGML_LOG_ERROR("%s: reading too much from the decoder ...\n", __func__);
apir_decoder_set_fatal(dec);
memset(val, 0, val_size);
return false;
@@ -103,7 +103,7 @@ static inline void apir_decoder_peek(apir_decoder * dec, size_t size, void * val
static inline const void * apir_decoder_use_inplace(apir_decoder * dec, size_t size) {
if (unlikely(size > (size_t) (dec->end - dec->cur))) {
GGML_LOG_ERROR("reading too much from the decoder ...\n");
GGML_LOG_ERROR("%s: reading too much from the decoder ...\n", __func__);
apir_decoder_set_fatal(dec);
return NULL;
}
@@ -221,7 +221,7 @@ static inline uint64_t apir_decode_array_size(apir_decoder * dec, uint64_t expec
uint64_t size;
apir_decode_uint64_t(dec, &size);
if (size != expected_size) {
GGML_LOG_ERROR("Couldn't decode array from the decoder\n");
GGML_LOG_ERROR("%s: Couldn't decode array from the decoder\n", __func__);
apir_decoder_set_fatal(dec);
size = 0;
}
@@ -322,7 +322,7 @@ static inline void apir_decode_char_array(apir_decoder * dec, char * val, size_t
if (size) {
val[size - 1] = '\0';
} else {
GGML_LOG_ERROR("Couldn't decode the blog array\n");
GGML_LOG_ERROR("%s: Couldn't decode the blog array\n", __func__);
apir_decoder_set_fatal(dec);
}
}
@@ -332,7 +332,8 @@ static inline void apir_decode_char_array(apir_decoder * dec, char * val, size_t
static inline void * apir_decoder_alloc_array(size_t size, size_t count) {
size_t alloc_size;
if (unlikely(__builtin_mul_overflow(size, count, &alloc_size))) {
GGML_LOG_ERROR("overflow in array allocation of %zu * %zu bytes\n", size, count);
GGML_LOG_ERROR("%s: overflow in array allocation of %zu * %zu bytes\n",
__func__, size, count);
return NULL;
}

View File

@@ -39,11 +39,17 @@ static inline void apir_encode_ggml_tensor(apir_encoder * enc, const ggml_tensor
static inline const ggml_tensor * apir_decode_ggml_tensor(apir_decoder * dec) {
const apir_rpc_tensor * apir_rpc_tensor = apir_decode_apir_rpc_tensor_inplace(dec);
if (!apir_rpc_tensor) {
return NULL;
}
ggml_init_params params{
/*.mem_size =*/ ggml_tensor_overhead(),
/*.mem_buffer =*/ NULL,
/*.no_alloc =*/ true,
};
ggml_context * ctx = ggml_init(params);
const ggml_tensor * tensor = apir_deserialize_tensor(ctx, apir_rpc_tensor);
@@ -71,6 +77,10 @@ static inline ggml_backend_buffer_type_t apir_decode_ggml_buffer_type(apir_decod
return (ggml_backend_buffer_type_t) handle;
}
static inline void apir_encode_apir_buffer_type_host_handle(apir_encoder * enc, apir_buffer_type_host_handle_t handle) {
apir_encoder_write(enc, sizeof(handle), &handle, sizeof(handle));
}
static inline apir_buffer_type_host_handle_t apir_decode_apir_buffer_type_host_handle(apir_decoder * dec) {
apir_buffer_type_host_handle_t handle;
@@ -154,13 +164,13 @@ static inline void apir_encode_ggml_tensor_inline(apir_encoder * enc, const ggml
size_t tensor_size = sizeof(*tensor);
if (tensor->extra) {
GGML_ABORT("Cannot pass tensors with extra");
GGML_ABORT("%s: Cannot pass tensors with extra", __func__);
}
if (tensor->src[0] && tensor->buffer) {
static int first = 1;
if (first) {
GGML_LOG_WARN("Cannot pass tensors with src and buffer\n");
GGML_LOG_WARN("%s: Cannot pass tensors with src and buffer\n", __func__);
first = 0;
}
}

View File

@@ -6,7 +6,7 @@ static ggml_backend_buffer_t ggml_backend_remoting_buffer_type_alloc_buffer(ggml
ggml_backend_remoting_buffer_context * context = (ggml_backend_remoting_buffer_context *) malloc(sizeof(*context));
if (!context) {
GGML_ABORT("Couldn't allocate the buffer context ...");
GGML_ABORT(GGML_VIRTGPU "%s: Couldn't allocate the buffer context ...", __func__);
}
context->gpu = gpu;
@@ -20,7 +20,7 @@ static ggml_backend_buffer_t ggml_backend_remoting_buffer_type_alloc_buffer(ggml
context->base = context->apir_context.shmem.mmap_ptr;
context->is_from_ptr = true;
} else {
context->apir_context = apir_buffer_type_alloc_buffer(gpu, buft, size);
context->apir_context = apir_buffer_type_alloc_buffer(gpu, gpu->cached_buffer_type.host_handle, size);
context->is_from_ptr = false;
context->base = NULL;
}
@@ -34,36 +34,19 @@ static ggml_backend_buffer_t ggml_backend_remoting_buffer_type_alloc_buffer(ggml
static const char * ggml_backend_remoting_buffer_type_get_name(ggml_backend_buffer_type_t buft) {
virtgpu * gpu = BUFT_TO_GPU(buft);
return apir_buffer_type_get_name(gpu, buft);
return gpu->cached_buffer_type.name;
}
static size_t ggml_backend_remoting_buffer_type_get_alignment(ggml_backend_buffer_type_t buft) {
virtgpu * gpu = BUFT_TO_GPU(buft);
static size_t align = 0;
if (align == 0) {
align = apir_buffer_type_get_alignment(gpu, buft);
}
return align;
return gpu->cached_buffer_type.alignment;
}
static size_t ggml_backend_remoting_buffer_type_get_max_size(ggml_backend_buffer_type_t buft) {
virtgpu * gpu = BUFT_TO_GPU(buft);
static size_t max_size = 0;
if (max_size == 0) {
max_size = apir_buffer_type_get_max_size(gpu, buft);
}
return max_size;
}
static bool ggml_backend_remoting_buffer_type_is_host(ggml_backend_buffer_type_t buft) {
virtgpu * gpu = BUFT_TO_GPU(buft);
return apir_buffer_type_is_host(gpu, buft);
return gpu->cached_buffer_type.max_size;
}
static size_t ggml_backend_remoting_buffer_type_get_alloc_size(ggml_backend_buffer_type_t buft,
@@ -76,7 +59,7 @@ static size_t ggml_backend_remoting_buffer_type_get_alloc_size(ggml_backend_buff
return ggml_nbytes(tensor);
}
return apir_buffer_type_get_alloc_size(gpu, buft, tensor);
return apir_buffer_type_get_alloc_size(gpu, gpu->cached_buffer_type.host_handle, tensor);
}
const ggml_backend_buffer_type_i ggml_backend_remoting_buffer_type_interface = {

View File

@@ -3,32 +3,27 @@
static const char * ggml_backend_remoting_device_get_name(ggml_backend_dev_t dev) {
virtgpu * gpu = DEV_TO_GPU(dev);
return apir_device_get_name(gpu);
return gpu->cached_device_info.name;
}
static const char * ggml_backend_remoting_device_get_description(ggml_backend_dev_t dev) {
virtgpu * gpu = DEV_TO_GPU(dev);
return apir_device_get_description(gpu);
// Return the pre-cached description from the virtgpu structure
return gpu->cached_device_info.description;
}
static enum ggml_backend_dev_type ggml_backend_remoting_device_get_type(ggml_backend_dev_t dev) {
virtgpu * gpu = DEV_TO_GPU(dev);
static enum ggml_backend_dev_type type;
static bool has_type = false;
if (!has_type) {
has_type = true;
type = (enum ggml_backend_dev_type) apir_device_get_type(gpu);
}
return type;
return (enum ggml_backend_dev_type) gpu->cached_device_info.type;
}
static void ggml_backend_remoting_device_get_memory(ggml_backend_dev_t dev, size_t * free, size_t * total) {
virtgpu * gpu = DEV_TO_GPU(dev);
return apir_device_get_memory(gpu, free, total);
*free = gpu->cached_device_info.memory_free;
*total = gpu->cached_device_info.memory_total;
}
static bool ggml_backend_remoting_device_supports_op(ggml_backend_dev_t dev, const ggml_tensor * op) {
@@ -77,13 +72,22 @@ static void ggml_backend_remoting_device_get_props(ggml_backend_dev_t dev, ggml_
ggml_backend_buffer_type_t ggml_backend_remoting_device_get_buffer_type(ggml_backend_dev_t dev) {
virtgpu * gpu = DEV_TO_GPU(dev);
apir_buffer_type_host_handle_t ctx = apir_device_get_buffer_type(gpu);
static std::atomic<bool> initialized = false;
static ggml_backend_buffer_type buft;
static ggml_backend_buffer_type buft{
/* .iface = */ ggml_backend_remoting_buffer_type_interface,
/* .device = */ dev,
/* .context = */ (void *) ctx,
};
if (!initialized) {
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
if (!initialized) {
buft = {
/* .iface = */ ggml_backend_remoting_buffer_type_interface,
/* .device = */ dev,
/* .context = */ (void *) gpu->cached_buffer_type.host_handle,
};
initialized = true;
}
}
return &buft;
}
@@ -91,13 +95,22 @@ ggml_backend_buffer_type_t ggml_backend_remoting_device_get_buffer_type(ggml_bac
static ggml_backend_buffer_type_t ggml_backend_remoting_device_get_buffer_from_ptr_type(ggml_backend_dev_t dev) {
virtgpu * gpu = DEV_TO_GPU(dev);
apir_buffer_type_host_handle_t ctx = apir_device_get_buffer_type(gpu);
static std::atomic<bool> initialized = false;
static ggml_backend_buffer_type buft;
static ggml_backend_buffer_type buft{
/* .iface = */ ggml_backend_remoting_buffer_from_ptr_type_interface,
/* .device = */ dev,
/* .context = */ (void *) ctx,
};
if (!initialized) {
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
if (!initialized) {
buft = {
/* .iface = */ ggml_backend_remoting_buffer_from_ptr_type_interface,
/* .device = */ dev,
/* .context = */ (void *) gpu->cached_buffer_type.host_handle,
};
initialized = true;
}
}
return &buft;
}
@@ -110,7 +123,7 @@ static ggml_backend_buffer_t ggml_backend_remoting_device_buffer_from_ptr(ggml_b
ggml_backend_remoting_buffer_context * context = (ggml_backend_remoting_buffer_context *) malloc(sizeof(*context));
if (!context) {
GGML_ABORT("Couldn't allocate the buffer context ...");
GGML_ABORT(GGML_VIRTGPU "%s: Couldn't allocate the buffer context ...", __func__);
}
context->gpu = gpu;

View File

@@ -4,37 +4,70 @@
#include <iostream>
#include <mutex>
void ggml_virtgpu_cleanup(virtgpu * gpu);
static virtgpu * apir_initialize() {
static virtgpu * apir_gpu_instance = NULL;
static bool apir_initialized = false;
static virtgpu * gpu = NULL;
static std::atomic<bool> initialized = false;
if (initialized) {
// fast track
return gpu;
}
{
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
if (apir_initialized) {
return apir_gpu_instance;
if (initialized) {
// thread safe
return gpu;
}
apir_gpu_instance = create_virtgpu();
if (!apir_gpu_instance) {
GGML_ABORT("failed to initialize the virtgpu");
gpu = create_virtgpu();
if (!gpu) {
initialized = true;
return NULL;
}
apir_initialized = true;
// Pre-fetch and cache all device information, it will not change
gpu->cached_device_info.description = apir_device_get_description(gpu);
if (!gpu->cached_device_info.description) {
GGML_ABORT(GGML_VIRTGPU "%s: failed to initialize the virtgpu device description", __func__);
}
gpu->cached_device_info.name = apir_device_get_name(gpu);
if (!gpu->cached_device_info.name) {
GGML_ABORT(GGML_VIRTGPU "%s: failed to initialize the virtgpu device name", __func__);
}
gpu->cached_device_info.device_count = apir_device_get_count(gpu);
gpu->cached_device_info.type = apir_device_get_type(gpu);
apir_device_get_memory(gpu,
&gpu->cached_device_info.memory_free,
&gpu->cached_device_info.memory_total);
apir_buffer_type_host_handle_t buft_host_handle = apir_device_get_buffer_type(gpu);
gpu->cached_buffer_type.host_handle = buft_host_handle;
gpu->cached_buffer_type.name = apir_buffer_type_get_name(gpu, buft_host_handle);
if (!gpu->cached_buffer_type.name) {
GGML_ABORT(GGML_VIRTGPU "%s: failed to initialize the virtgpu buffer type name", __func__);
}
gpu->cached_buffer_type.alignment = apir_buffer_type_get_alignment(gpu, buft_host_handle);
gpu->cached_buffer_type.max_size = apir_buffer_type_get_max_size(gpu, buft_host_handle);
initialized = true;
}
return apir_gpu_instance;
return gpu;
}
static int ggml_backend_remoting_get_device_count() {
virtgpu * gpu = apir_initialize();
if (!gpu) {
GGML_LOG_WARN("apir_initialize failed\n");
return 0;
}
return apir_device_get_count(gpu);
return gpu->cached_device_info.device_count;
}
static size_t ggml_backend_remoting_reg_get_device_count(ggml_backend_reg_t reg) {
@@ -52,17 +85,21 @@ ggml_backend_dev_t ggml_backend_remoting_get_device(size_t device) {
static void ggml_backend_remoting_reg_init_devices(ggml_backend_reg_t reg) {
if (devices.size() > 0) {
GGML_LOG_INFO("%s: already initialized\n", __func__);
GGML_LOG_INFO(GGML_VIRTGPU "%s: already initialized\n", __func__);
return;
}
virtgpu * gpu = apir_initialize();
if (!gpu) {
GGML_LOG_ERROR("apir_initialize failed\n");
GGML_LOG_ERROR(GGML_VIRTGPU "%s: apir_initialize failed\n", __func__);
return;
}
static bool initialized = false;
static std::atomic<bool> initialized = false;
if (initialized) {
return; // fast track
}
{
static std::mutex mutex;
@@ -70,10 +107,10 @@ static void ggml_backend_remoting_reg_init_devices(ggml_backend_reg_t reg) {
if (!initialized) {
for (int i = 0; i < ggml_backend_remoting_get_device_count(); i++) {
ggml_backend_remoting_device_context * ctx = new ggml_backend_remoting_device_context;
char desc[256] = "API Remoting device";
char desc[256] = "ggml-virtgpu API Remoting device";
ctx->device = i;
ctx->name = GGML_REMOTING_FRONTEND_NAME + std::to_string(i);
ctx->name = GGML_VIRTGPU_NAME + std::to_string(i);
ctx->description = desc;
ctx->gpu = gpu;
@@ -98,7 +135,7 @@ static ggml_backend_dev_t ggml_backend_remoting_reg_get_device(ggml_backend_reg_
static const char * ggml_backend_remoting_reg_get_name(ggml_backend_reg_t reg) {
UNUSED(reg);
return GGML_REMOTING_FRONTEND_NAME;
return GGML_VIRTGPU_NAME;
}
static const ggml_backend_reg_i ggml_backend_remoting_reg_i = {
@@ -111,8 +148,7 @@ static const ggml_backend_reg_i ggml_backend_remoting_reg_i = {
ggml_backend_reg_t ggml_backend_virtgpu_reg() {
virtgpu * gpu = apir_initialize();
if (!gpu) {
GGML_LOG_ERROR("virtgpu_apir_initialize failed\n");
return NULL;
GGML_LOG_ERROR(GGML_VIRTGPU "%s: virtgpu_apir_initialize failed\n", __func__);
}
static ggml_backend_reg reg = {
@@ -129,9 +165,25 @@ ggml_backend_reg_t ggml_backend_virtgpu_reg() {
ggml_backend_remoting_reg_init_devices(&reg);
GGML_LOG_INFO("%s: initialized\n", __func__);
return &reg;
}
// public function, not exposed in the GGML interface at the moment
void ggml_virtgpu_cleanup(virtgpu * gpu) {
if (gpu->cached_device_info.name) {
free(gpu->cached_device_info.name);
gpu->cached_device_info.name = NULL;
}
if (gpu->cached_device_info.description) {
free(gpu->cached_device_info.description);
gpu->cached_device_info.description = NULL;
}
if (gpu->cached_buffer_type.name) {
free(gpu->cached_buffer_type.name);
gpu->cached_buffer_type.name = NULL;
}
mtx_destroy(&gpu->data_shmem_mutex);
}
GGML_BACKEND_DL_IMPL(ggml_backend_virtgpu_reg)

View File

@@ -8,6 +8,9 @@
#include <memory>
#include <string>
#define GGML_VIRTGPU_NAME "ggml-virtgpu"
#define GGML_VIRTGPU "ggml-virtgpu: "
// USE_ALWAYS_TRUE_SUPPORTS_OP: 1 is fast, 0 avoid micro-benchmark crashes
#define USE_ALWAYS_TRUE_SUPPORTS_OP 1
@@ -62,7 +65,7 @@ static inline apir_buffer_type_host_handle_t ggml_buffer_type_to_apir_handle(ggm
static inline apir_buffer_host_handle_t ggml_buffer_to_apir_handle(ggml_backend_buffer_t buffer) {
if (!buffer->context) {
GGML_ABORT("%s: no context available :/", __func__);
GGML_ABORT(GGML_VIRTGPU "%s: no context available :/", __func__);
}
return BUFFER_TO_HOST_HANDLE(buffer);
}

View File

@@ -24,10 +24,10 @@ functions:
frontend_return: "int"
get_name:
frontend_return: "const char *"
frontend_return: "char *"
get_description:
frontend_return: "const char *"
frontend_return: "char *"
get_type:
frontend_return: "uint32_t"
@@ -64,35 +64,33 @@ functions:
group_description: "buffer-type"
functions:
get_name:
frontend_return: "const char *"
frontend_return: "char *"
frontend_extra_params:
- "ggml_backend_buffer_type_t buft"
- "apir_buffer_type_host_handle_t host_handle"
get_alignment:
frontend_return: "size_t"
frontend_extra_params:
- "ggml_backend_buffer_type_t buft"
- "apir_buffer_type_host_handle_t host_handle"
get_max_size:
frontend_return: "size_t"
frontend_extra_params:
- "ggml_backend_buffer_type_t buft"
- "apir_buffer_type_host_handle_t host_handle"
is_host:
frontend_return: "bool"
frontend_extra_params:
- "ggml_backend_buffer_type_t buft"
deprecated: true
alloc_buffer:
frontend_return: "apir_buffer_context_t"
frontend_extra_params:
- "ggml_backend_buffer_type_t buffer_buft"
- "apir_buffer_type_host_handle_t host_handle"
- "size_t size"
get_alloc_size:
frontend_return: "size_t"
frontend_extra_params:
- "ggml_backend_buffer_type_t buft"
- "apir_buffer_type_host_handle_t host_handle"
- "const ggml_tensor *op"
buffer:

View File

@@ -116,7 +116,7 @@ class RemotingCodebaseGenerator:
'frontend_return': func_metadata.get('frontend_return', 'void'),
'frontend_extra_params': func_metadata.get('frontend_extra_params', []),
'group_description': group_description,
'newly_added': func_metadata.get('newly_added', False)
'deprecated': func_metadata.get('deprecated', False),
})
enum_value += 1
@@ -165,6 +165,9 @@ class RemotingCodebaseGenerator:
signature = "uint32_t"
params = "apir_encoder *enc, apir_decoder *dec, virgl_apir_context *ctx"
if func['deprecated']:
decl_lines.append(f"/* {func['enum_name']} is deprecated. Keeping the handler for backward compatibility. */")
decl_lines.append(f"{signature} {func['backend_function']}({params});")
# Switch cases
@@ -176,7 +179,9 @@ class RemotingCodebaseGenerator:
switch_lines.append(f" /* {func['group_description']} */")
current_group = func['group_name']
switch_lines.append(f" case {func['enum_name']}: return \"{func['backend_function']}\";")
deprecated = " (DEPRECATED)" if func['deprecated'] else ""
switch_lines.append(f" case {func['enum_name']}: return \"{func['backend_function']}{deprecated}\";")
# Dispatch table
table_lines = []
@@ -188,7 +193,8 @@ class RemotingCodebaseGenerator:
table_lines.append("")
current_group = func['group_name']
table_lines.append(f" /* {func['enum_name']} = */ {func['backend_function']},")
deprecated = " /* DEPRECATED */" if func['deprecated'] else ""
table_lines.append(f" /* {func['enum_name']} = */ {func['backend_function']}{deprecated},")
header_content = f'''\
#pragma once
@@ -225,6 +231,10 @@ static const backend_dispatch_t apir_backend_dispatch_table[APIR_BACKEND_DISPATC
decl_lines.append(f"/* {func['group_description']} */")
current_group = func['group_name']
if func['deprecated']:
decl_lines.append(f"/* {func['frontend_function']} is deprecated. */")
continue
# Build parameter list
params = [self.naming_patterns['frontend_base_param']]
params.extend(func['frontend_extra_params'])
@@ -287,7 +297,7 @@ static const backend_dispatch_t apir_backend_dispatch_table[APIR_BACKEND_DISPATC
generated_files = [apir_backend_path, backend_dispatched_path, virtgpu_forward_path]
if not self.clang_format_available:
logging.warning("\nclang-format not found in PATH. Generated files will not be formatted."
logging.warning("\nclang-format not found in PATH. Generated files will not be formatted.\n"
" Install clang-format to enable automatic code formatting.")
else:
logging.info("\n🎨 Formatting files with clang-format...")

View File

@@ -18,12 +18,17 @@ ggml_status apir_backend_graph_compute(virtgpu * gpu, ggml_cgraph * cgraph) {
virtgpu_shmem temp_shmem; // Local storage for large buffers
virtgpu_shmem * shmem = &temp_shmem;
bool using_shared_shmem = false;
if (cgraph_size <= gpu->data_shmem.mmap_size) {
// prefer the init-time allocated page, if large enough
// Lock mutex before using shared data_shmem buffer
if (mtx_lock(&gpu->data_shmem_mutex) != thrd_success) {
GGML_ABORT(GGML_VIRTGPU "%s: Failed to lock data_shmem mutex", __func__);
}
using_shared_shmem = true;
shmem = &gpu->data_shmem;
} else if (virtgpu_shmem_create(gpu, cgraph_size, shmem)) {
GGML_ABORT("Couldn't allocate the guest-host shared buffer");
GGML_ABORT(GGML_VIRTGPU "%s: Couldn't allocate the guest-host shared buffer", __func__);
}
apir_encode_virtgpu_shmem_res_id(encoder, shmem->res_id);
@@ -42,7 +47,10 @@ ggml_status apir_backend_graph_compute(virtgpu * gpu, ggml_cgraph * cgraph) {
remote_call_finish(gpu, encoder, decoder);
if (shmem != &gpu->data_shmem) {
// Unlock mutex before cleanup
if (using_shared_shmem) {
mtx_unlock(&gpu->data_shmem_mutex);
} else {
virtgpu_shmem_destroy(gpu, shmem);
}

View File

@@ -1,20 +1,20 @@
#include "virtgpu-forward-impl.h"
const char * apir_buffer_type_get_name(virtgpu * gpu, ggml_backend_buffer_type_t buft) {
char * apir_buffer_type_get_name(virtgpu * gpu, apir_buffer_type_host_handle_t host_handle) {
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_BUFFER_TYPE_GET_NAME);
apir_encode_ggml_buffer_type(encoder, buft);
apir_encode_apir_buffer_type_host_handle(encoder, host_handle);
REMOTE_CALL(gpu, encoder, decoder, ret);
const size_t string_size = apir_decode_array_size_unchecked(decoder);
char * string = (char *) apir_decoder_alloc_array(sizeof(char), string_size);
if (!string) {
GGML_LOG_ERROR("%s: Could not allocate the device name buffer\n", __func__);
GGML_LOG_ERROR(GGML_VIRTGPU "%s: Could not allocate the device name buffer\n", __func__);
apir_decoder_set_fatal(decoder);
}
apir_decode_char_array(decoder, string, string_size);
@@ -24,14 +24,14 @@ const char * apir_buffer_type_get_name(virtgpu * gpu, ggml_backend_buffer_type_t
return string;
}
size_t apir_buffer_type_get_alignment(virtgpu * gpu, ggml_backend_buffer_type_t buft) {
size_t apir_buffer_type_get_alignment(virtgpu * gpu, apir_buffer_type_host_handle_t host_handle) {
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_BUFFER_TYPE_GET_ALIGNMENT);
apir_encode_ggml_buffer_type(encoder, buft);
apir_encode_apir_buffer_type_host_handle(encoder, host_handle);
REMOTE_CALL(gpu, encoder, decoder, ret);
@@ -43,14 +43,14 @@ size_t apir_buffer_type_get_alignment(virtgpu * gpu, ggml_backend_buffer_type_t
return alignment;
}
size_t apir_buffer_type_get_max_size(virtgpu * gpu, ggml_backend_buffer_type_t buft) {
size_t apir_buffer_type_get_max_size(virtgpu * gpu, apir_buffer_type_host_handle_t host_handle) {
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_BUFFER_TYPE_GET_MAX_SIZE);
apir_encode_ggml_buffer_type(encoder, buft);
apir_encode_apir_buffer_type_host_handle(encoder, host_handle);
REMOTE_CALL(gpu, encoder, decoder, ret);
@@ -62,26 +62,7 @@ size_t apir_buffer_type_get_max_size(virtgpu * gpu, ggml_backend_buffer_type_t b
return max_size;
}
bool apir_buffer_type_is_host(virtgpu * gpu, ggml_backend_buffer_type_t buft) {
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST);
apir_encode_ggml_buffer_type(encoder, buft);
REMOTE_CALL(gpu, encoder, decoder, ret);
bool is_host;
apir_decode_bool_t(decoder, &is_host);
remote_call_finish(gpu, encoder, decoder);
return is_host;
}
apir_buffer_context_t apir_buffer_type_alloc_buffer(virtgpu * gpu, ggml_backend_buffer_type_t buft, size_t size) {
apir_buffer_context_t apir_buffer_type_alloc_buffer(virtgpu * gpu, apir_buffer_type_host_handle_t host_handle, size_t size) {
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
@@ -90,7 +71,7 @@ apir_buffer_context_t apir_buffer_type_alloc_buffer(virtgpu * gpu, ggml_backend_
REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_BUFFER_TYPE_ALLOC_BUFFER);
apir_encode_ggml_buffer_type(encoder, buft);
apir_encode_apir_buffer_type_host_handle(encoder, host_handle);
apir_encode_size_t(encoder, &size);
@@ -103,14 +84,14 @@ apir_buffer_context_t apir_buffer_type_alloc_buffer(virtgpu * gpu, ggml_backend_
return buffer_context;
}
size_t apir_buffer_type_get_alloc_size(virtgpu * gpu, ggml_backend_buffer_type_t buft, const ggml_tensor * op) {
size_t apir_buffer_type_get_alloc_size(virtgpu * gpu, apir_buffer_type_host_handle_t host_handle, const ggml_tensor * op) {
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_BUFFER_TYPE_GET_ALLOC_SIZE);
apir_encode_ggml_buffer_type(encoder, buft);
apir_encode_apir_buffer_type_host_handle(encoder, host_handle);
apir_encode_ggml_tensor_inline(encoder, op);

View File

@@ -36,13 +36,18 @@ void apir_buffer_set_tensor(virtgpu * gpu,
virtgpu_shmem temp_shmem; // Local storage for large buffers
virtgpu_shmem * shmem = &temp_shmem;
bool using_shared_shmem = false;
if (size <= gpu->data_shmem.mmap_size) {
// prefer the init-time allocated page, if large enough
// Lock mutex before using shared data_shmem buffer
if (mtx_lock(&gpu->data_shmem_mutex) != thrd_success) {
GGML_ABORT(GGML_VIRTGPU "%s: Failed to lock data_shmem mutex", __func__);
}
using_shared_shmem = true;
shmem = &gpu->data_shmem;
} else if (virtgpu_shmem_create(gpu, size, shmem)) {
GGML_ABORT("Couldn't allocate the guest-host shared buffer");
GGML_ABORT(GGML_VIRTGPU "%s: Couldn't allocate the guest-host shared buffer", __func__);
}
memcpy(shmem->mmap_ptr, data, size);
@@ -55,7 +60,10 @@ void apir_buffer_set_tensor(virtgpu * gpu,
remote_call_finish(gpu, encoder, decoder);
if (shmem != &gpu->data_shmem) {
// Unlock mutex before cleanup
if (using_shared_shmem) {
mtx_unlock(&gpu->data_shmem_mutex);
} else {
virtgpu_shmem_destroy(gpu, shmem);
}
@@ -79,13 +87,18 @@ void apir_buffer_get_tensor(virtgpu * gpu,
virtgpu_shmem temp_shmem; // Local storage for large buffers
virtgpu_shmem * shmem = &temp_shmem;
bool using_shared_shmem = false;
if (size <= gpu->data_shmem.mmap_size) {
// prefer the init-time allocated page, if large enough
// Lock mutex before using shared data_shmem buffer
if (mtx_lock(&gpu->data_shmem_mutex) != thrd_success) {
GGML_ABORT(GGML_VIRTGPU "%s: Failed to lock data_shmem mutex", __func__);
}
using_shared_shmem = true;
shmem = &gpu->data_shmem;
} else if (virtgpu_shmem_create(gpu, size, shmem)) {
GGML_ABORT("Couldn't allocate the guest-host shared buffer");
GGML_ABORT(GGML_VIRTGPU "%s: Couldn't allocate the guest-host shared buffer", __func__);
}
apir_encode_virtgpu_shmem_res_id(encoder, shmem->res_id);
@@ -98,7 +111,10 @@ void apir_buffer_get_tensor(virtgpu * gpu,
remote_call_finish(gpu, encoder, decoder);
if (shmem != &gpu->data_shmem) {
// Unlock mutex before cleanup
if (using_shared_shmem) {
mtx_unlock(&gpu->data_shmem_mutex);
} else {
virtgpu_shmem_destroy(gpu, shmem);
}
}

View File

@@ -2,11 +2,6 @@
#include "virtgpu-shm.h"
int apir_device_get_count(virtgpu * gpu) {
static int32_t dev_count = -1;
if (dev_count != -1) {
return dev_count;
}
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
@@ -14,6 +9,7 @@ int apir_device_get_count(virtgpu * gpu) {
REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_DEVICE_GET_COUNT);
REMOTE_CALL(gpu, encoder, decoder, ret);
int32_t dev_count = -1;
apir_decode_int32_t(decoder, &dev_count);
remote_call_finish(gpu, encoder, decoder);
@@ -21,11 +17,7 @@ int apir_device_get_count(virtgpu * gpu) {
return dev_count;
}
const char * apir_device_get_name(virtgpu * gpu) {
static char * string = nullptr;
if (string) {
return string;
}
char * apir_device_get_name(virtgpu * gpu) {
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
@@ -34,9 +26,9 @@ const char * apir_device_get_name(virtgpu * gpu) {
REMOTE_CALL(gpu, encoder, decoder, ret);
const size_t string_size = apir_decode_array_size_unchecked(decoder);
string = (char *) apir_decoder_alloc_array(sizeof(char), string_size);
char * string = (char *) apir_decoder_alloc_array(sizeof(char), string_size);
if (!string) {
GGML_LOG_ERROR("%s: Could not allocate the device name buffer\n", __func__);
GGML_LOG_ERROR(GGML_VIRTGPU "%s: Could not allocate the device name buffer\n", __func__);
return NULL;
}
apir_decode_char_array(decoder, string, string_size);
@@ -46,7 +38,7 @@ const char * apir_device_get_name(virtgpu * gpu) {
return string;
}
const char * apir_device_get_description(virtgpu * gpu) {
char * apir_device_get_description(virtgpu * gpu) {
apir_encoder * encoder;
apir_decoder * decoder;
ApirForwardReturnCode ret;
@@ -58,7 +50,7 @@ const char * apir_device_get_description(virtgpu * gpu) {
const size_t string_size = apir_decode_array_size_unchecked(decoder);
char * string = (char *) apir_decoder_alloc_array(sizeof(char), string_size);
if (!string) {
GGML_LOG_ERROR("%s: Could not allocate the device description buffer\n", __func__);
GGML_LOG_ERROR(GGML_VIRTGPU "%s: Could not allocate the device description buffer\n", __func__);
return NULL;
}
@@ -181,7 +173,7 @@ apir_buffer_context_t apir_device_buffer_from_ptr(virtgpu * gpu, size_t size, si
REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_DEVICE_BUFFER_FROM_PTR);
if (virtgpu_shmem_create(gpu, size, &buffer_context.shmem)) {
GGML_ABORT("Couldn't allocate the guest-host shared buffer");
GGML_ABORT(GGML_VIRTGPU "Couldn't allocate the guest-host shared buffer");
}
apir_encode_virtgpu_shmem_res_id(encoder, buffer_context.shmem.res_id);

View File

@@ -11,7 +11,7 @@
int32_t forward_flag = (int32_t) apir_command_type__; \
encoder_name = remote_call_prepare(gpu_dev_name, APIR_COMMAND_TYPE_FORWARD, forward_flag); \
if (!encoder_name) { \
GGML_ABORT("%s: failed to prepare the remote call encoder", __func__); \
GGML_ABORT(GGML_VIRTGPU "%s: failed to prepare the remote call encoder", __func__); \
} \
} while (0)
@@ -19,10 +19,10 @@
do { \
ret_name = (ApirForwardReturnCode) remote_call(gpu_dev_name, encoder_name, &decoder_name, 0, NULL); \
if (!decoder_name) { \
GGML_ABORT("%s: failed to kick the remote call", __func__); \
GGML_ABORT(GGML_VIRTGPU "%s: failed to kick the remote call", __func__); \
} \
if (ret_name < APIR_FORWARD_BASE_INDEX) { \
GGML_ABORT("%s: failed to forward the API call: %s: code %d", __func__, \
GGML_ABORT(GGML_VIRTGPU "%s: failed to forward the API call: %s: code %d", __func__, \
apir_forward_error(ret_name), ret_name); \
} \
ret_name = (ApirForwardReturnCode) (ret_name - APIR_FORWARD_BASE_INDEX); \

View File

@@ -3,8 +3,8 @@
/* device */
void apir_device_get_device_count(struct virtgpu * gpu);
int apir_device_get_count(struct virtgpu * gpu);
const char * apir_device_get_name(struct virtgpu * gpu);
const char * apir_device_get_description(struct virtgpu * gpu);
char * apir_device_get_name(struct virtgpu * gpu);
char * apir_device_get_description(struct virtgpu * gpu);
uint32_t apir_device_get_type(struct virtgpu * gpu);
void apir_device_get_memory(struct virtgpu * gpu, size_t * free, size_t * total);
bool apir_device_supports_op(struct virtgpu * gpu, const ggml_tensor * op);
@@ -17,14 +17,15 @@ void apir_device_get_props(struct virtgpu * gpu,
apir_buffer_context_t apir_device_buffer_from_ptr(struct virtgpu * gpu, size_t size, size_t max_tensor_size);
/* buffer-type */
const char * apir_buffer_type_get_name(struct virtgpu * gpu, ggml_backend_buffer_type_t buft);
size_t apir_buffer_type_get_alignment(struct virtgpu * gpu, ggml_backend_buffer_type_t buft);
size_t apir_buffer_type_get_max_size(struct virtgpu * gpu, ggml_backend_buffer_type_t buft);
bool apir_buffer_type_is_host(struct virtgpu * gpu, ggml_backend_buffer_type_t buft);
apir_buffer_context_t apir_buffer_type_alloc_buffer(struct virtgpu * gpu,
ggml_backend_buffer_type_t buffer_buft,
size_t size);
size_t apir_buffer_type_get_alloc_size(struct virtgpu * gpu, ggml_backend_buffer_type_t buft, const ggml_tensor * op);
char * apir_buffer_type_get_name(struct virtgpu * gpu, apir_buffer_type_host_handle_t host_handle);
size_t apir_buffer_type_get_alignment(struct virtgpu * gpu, apir_buffer_type_host_handle_t host_handle);
size_t apir_buffer_type_get_max_size(struct virtgpu * gpu, apir_buffer_type_host_handle_t host_handle);
apir_buffer_context_t apir_buffer_type_alloc_buffer(struct virtgpu * gpu,
apir_buffer_type_host_handle_t host_handle,
size_t size);
size_t apir_buffer_type_get_alloc_size(struct virtgpu * gpu,
apir_buffer_type_host_handle_t host_handle,
const ggml_tensor * op);
/* buffer */
void * apir_buffer_get_base(struct virtgpu * gpu, apir_buffer_context_t * buffer_context);

View File

@@ -85,8 +85,7 @@ int virtgpu_shmem_create(virtgpu * gpu, size_t size, virtgpu_shmem * shmem) {
void * ptr = virtgpu_ioctl_map(gpu, gem_handle, size);
if (!ptr) {
virtgpu_ioctl_gem_close(gpu, gem_handle);
GGML_LOG_ERROR("virtgpu_ioctl_map FAILED\n");
exit(1);
GGML_LOG_ERROR(GGML_VIRTGPU "%s: virtgpu_ioctl_map failed\n", __func__);
return 1;
}

View File

@@ -33,7 +33,7 @@ static int virtgpu_handshake(virtgpu * gpu) {
encoder = remote_call_prepare(gpu, APIR_COMMAND_TYPE_HANDSHAKE, 0);
if (!encoder) {
GGML_ABORT("%s: failed to prepare the remote call encoder", __func__);
GGML_ABORT(GGML_VIRTGPU "%s: failed to prepare the remote call encoder", __func__);
return 1;
}
@@ -52,7 +52,7 @@ static int virtgpu_handshake(virtgpu * gpu) {
log_call_duration(call_duration_ns, "API Remoting handshake");
if (!decoder) {
GGML_ABORT(
GGML_ABORT(GGML_VIRTGPU
"%s: failed to initiate the communication with the virglrenderer library. "
"Most likely, the wrong virglrenderer library was loaded in the hypervisor.",
__func__);
@@ -65,7 +65,8 @@ static int virtgpu_handshake(virtgpu * gpu) {
uint32_t host_minor;
if (ret_magic != APIR_HANDSHAKE_MAGIC) {
GGML_ABORT("%s: handshake with the virglrenderer failed (code=%d | %s)", __func__, ret_magic,
GGML_ABORT(GGML_VIRTGPU
"%s: handshake with the virglrenderer failed (code=%d | %s)", __func__, ret_magic,
apir_backend_initialize_error(ret_magic));
} else {
apir_decode_uint32_t(decoder, &host_major);
@@ -78,13 +79,13 @@ static int virtgpu_handshake(virtgpu * gpu) {
return 1;
}
GGML_LOG_INFO("%s: Guest is running with %u.%u\n", __func__, guest_major, guest_minor);
GGML_LOG_INFO("%s: Host is running with %u.%u\n", __func__, host_major, host_minor);
GGML_LOG_INFO(GGML_VIRTGPU "%s: Guest is running with %u.%u\n", __func__, guest_major, guest_minor);
GGML_LOG_INFO(GGML_VIRTGPU "%s: Host is running with %u.%u\n", __func__, host_major, host_minor);
if (guest_major != host_major) {
GGML_LOG_ERROR("Host major (%d) and guest major (%d) version differ\n", host_major, guest_major);
GGML_LOG_ERROR(GGML_VIRTGPU "Host major (%d) and guest major (%d) version differ\n", host_major, guest_major);
} else if (guest_minor != host_minor) {
GGML_LOG_WARN("Host minor (%d) and guest minor (%d) version differ\n", host_minor, guest_minor);
GGML_LOG_WARN(GGML_VIRTGPU "Host minor (%d) and guest minor (%d) version differ\n", host_minor, guest_minor);
}
return 0;
@@ -97,7 +98,7 @@ static ApirLoadLibraryReturnCode virtgpu_load_library(virtgpu * gpu) {
encoder = remote_call_prepare(gpu, APIR_COMMAND_TYPE_LOADLIBRARY, 0);
if (!encoder) {
GGML_ABORT("%s: hypercall error: failed to prepare the remote call encoder", __func__);
GGML_ABORT(GGML_VIRTGPU "%s: hypercall error: failed to prepare the API Remoting command encoder", __func__);
return APIR_LOAD_LIBRARY_HYPERCALL_INITIALIZATION_ERROR;
}
@@ -108,36 +109,67 @@ static ApirLoadLibraryReturnCode virtgpu_load_library(virtgpu * gpu) {
log_call_duration(call_duration_ns, "API Remoting LoadLibrary");
if (!decoder) {
GGML_ABORT("%s: hypercall error: failed to kick the API remoting hypercall.\n", __func__);
GGML_ABORT(GGML_VIRTGPU "%s: hypercall error: failed to trigger the API Remoting hypercall.\n", __func__);
return APIR_LOAD_LIBRARY_HYPERCALL_INITIALIZATION_ERROR;
}
remote_call_finish(gpu, encoder, decoder);
if (ret == APIR_LOAD_LIBRARY_SUCCESS) {
GGML_LOG_INFO("%s: The API Remoting backend was successfully loaded and initialized\n", __func__);
GGML_LOG_INFO(GGML_VIRTGPU "The API Remoting backend was successfully loaded and initialized\n");
return ret;
}
// something wrong happened, find out what.
if (ret < APIR_LOAD_LIBRARY_INIT_BASE_INDEX) {
GGML_ABORT("%s: virglrenderer could not load the API Remoting backend library: %s (code %d)", __func__,
apir_load_library_error(ret), ret);
if (ret == APIR_LOAD_LIBRARY_ENV_VAR_MISSING) {
GGML_ABORT(GGML_VIRTGPU
"%s: virglrenderer could not open the API Remoting backend library, "
"some environment variables are missing. "
"Make sure virglrenderer is correctly configured by the hypervisor. (%s)",
__func__, apir_load_library_error(ret));
} else if (ret == APIR_LOAD_LIBRARY_CANNOT_OPEN) {
GGML_ABORT(GGML_VIRTGPU
"%s: virglrenderer could not open the API Remoting backend library. "
"Make sure virglrenderer is correctly configured by the hypervisor. (%s)",
__func__, apir_load_library_error(ret));
} else if (ret == APIR_LOAD_LIBRARY_ENV_VAR_MISSING) {
GGML_ABORT(GGML_VIRTGPU
"%s: could not load the backend library, some symbols are missing. "
"Make sure virglrenderer is correctly configured by the hypervisor. (%s) ",
__func__, apir_load_library_error(ret));
} else {
GGML_ABORT(GGML_VIRTGPU
"%s: virglrenderer could not load the API Remoting backend library. (%s - code %d)", __func__,
apir_load_library_error(ret), ret);
}
return ret;
}
GGML_LOG_INFO("%s: virglrenderer successfully loaded the API Remoting backend library", __func__);
GGML_LOG_INFO(GGML_VIRTGPU
"%s: virglrenderer successfully loaded the API Remoting backend library.\n", __func__);
ApirLoadLibraryReturnCode apir_ret = (ApirLoadLibraryReturnCode) (ret - APIR_LOAD_LIBRARY_INIT_BASE_INDEX);
if (apir_ret < APIR_LOAD_LIBRARY_INIT_BASE_INDEX) {
GGML_ABORT("%s: the API Remoting backend library couldn't load the backend library: apir code=%d | %s)",
if (apir_ret == APIR_LOAD_LIBRARY_CANNOT_OPEN) {
GGML_ABORT(GGML_VIRTGPU
"%s: the API Remoting backend library couldn't load the GGML backend library. "
"Make sure virglrenderer is correctly configured by the hypervisor. (%s)",
__func__, apir_load_library_error(apir_ret));
} else if (apir_ret == APIR_LOAD_LIBRARY_SYMBOL_MISSING) {
GGML_ABORT(GGML_VIRTGPU
"%s: the API Remoting backend library couldn't load the GGML backend library, some symbols are missing. "
"Make sure virglrenderer is correctly configured by the hypervisor. (%s)",
__func__, apir_load_library_error(apir_ret));
} else if (apir_ret < APIR_LOAD_LIBRARY_INIT_BASE_INDEX) {
GGML_ABORT(GGML_VIRTGPU
"%s: the API Remoting backend library couldn't load the GGML backend library: apir code=%d | %s)",
__func__, apir_ret, apir_load_library_error(apir_ret));
} else {
uint32_t lib_ret = apir_ret - APIR_LOAD_LIBRARY_INIT_BASE_INDEX;
GGML_ABORT("%s: the API Remoting backend library initialize its backend library: apir code=%d)", __func__,
GGML_ABORT(GGML_VIRTGPU
"%s: the API Remoting backend library initialize its backend library: apir code=%d)", __func__,
lib_ret);
}
return ret;
@@ -149,38 +181,58 @@ virtgpu * create_virtgpu() {
gpu->use_apir_capset = getenv("GGML_REMOTING_USE_APIR_CAPSET") != nullptr;
util_sparse_array_init(&gpu->shmem_array, sizeof(virtgpu_shmem), 1024);
// Initialize mutex to protect shared data_shmem buffer
if (mtx_init(&gpu->data_shmem_mutex, mtx_plain) != thrd_success) {
delete gpu;
GGML_ABORT(GGML_VIRTGPU
"%s: failed to initialize data_shmem mutex", __func__);
return NULL;
}
if (virtgpu_open(gpu) != APIR_SUCCESS) {
GGML_ABORT("%s: failed to open the virtgpu device", __func__);
GGML_LOG_ERROR(GGML_VIRTGPU
"%s: failed to open the virtgpu device\n", __func__);
return NULL;
}
if (virtgpu_init_capset(gpu) != APIR_SUCCESS) {
GGML_ABORT("%s: failed to initialize the GPU capset", __func__);
if (gpu->use_apir_capset) {
GGML_ABORT(GGML_VIRTGPU
"%s: failed to initialize the virtgpu APIR capset. Make sure that the virglrenderer library supports it.", __func__);
} else {
GGML_ABORT(GGML_VIRTGPU
"%s: failed to initialize the virtgpu Venus capset", __func__);
}
return NULL;
}
if (virtgpu_init_context(gpu) != APIR_SUCCESS) {
GGML_ABORT("%s: failed to initialize the GPU context", __func__);
GGML_ABORT(GGML_VIRTGPU
"%s: failed to initialize the GPU context", __func__);
return NULL;
}
if (virtgpu_shmem_create(gpu, SHMEM_REPLY_SIZE, &gpu->reply_shmem)) {
GGML_ABORT("%s: failed to create the shared reply memory pages", __func__);
GGML_ABORT(GGML_VIRTGPU
"%s: failed to create the shared reply memory pages", __func__);
return NULL;
}
if (virtgpu_shmem_create(gpu, SHMEM_DATA_SIZE, &gpu->data_shmem)) {
GGML_ABORT("%s: failed to create the shared data memory pages", __func__);
GGML_ABORT(GGML_VIRTGPU
"%s: failed to create the shared data memory pages", __func__);
return NULL;
}
if (virtgpu_handshake(gpu)) {
GGML_ABORT("%s: failed to handshake with the virglrenderer library", __func__);
GGML_ABORT(GGML_VIRTGPU
"%s: failed to handshake with the virglrenderer library", __func__);
return NULL;
}
if (virtgpu_load_library(gpu) != APIR_LOAD_LIBRARY_SUCCESS) {
GGML_ABORT("%s: failed to load the backend library", __func__);
GGML_ABORT(GGML_VIRTGPU
"%s: failed to load the backend library", __func__);
return NULL;
}
@@ -191,7 +243,8 @@ static virt_gpu_result_t virtgpu_open(virtgpu * gpu) {
drmDevicePtr devs[8];
int count = drmGetDevices2(0, devs, ARRAY_SIZE(devs));
if (count < 0) {
GGML_LOG_ERROR("%s: failed to enumerate DRM devices\n", __func__);
GGML_LOG_ERROR(GGML_VIRTGPU
"%s: failed to enumerate DRM devices\n", __func__);
return APIR_ERROR_INITIALIZATION_FAILED;
}
@@ -213,16 +266,19 @@ static virt_gpu_result_t virtgpu_open_device(virtgpu * gpu, const drmDevicePtr d
int fd = open(node_path, O_RDWR | O_CLOEXEC);
if (fd < 0) {
GGML_ABORT("failed to open %s", node_path);
GGML_ABORT(GGML_VIRTGPU
"%s: failed to open %s", __func__, node_path);
return APIR_ERROR_INITIALIZATION_FAILED;
}
drmVersionPtr version = drmGetVersion(fd);
if (!version || strcmp(version->name, "virtio_gpu") || version->version_major != 0) {
if (version) {
GGML_ABORT("unknown DRM driver %s version %d", version->name, version->version_major);
GGML_LOG_ERROR(GGML_VIRTGPU
"%s: unknown DRM driver %s version %d\n", __func__, version->name, version->version_major);
} else {
GGML_ABORT("failed to get DRM driver version");
GGML_LOG_ERROR(GGML_VIRTGPU
"%s: failed to get DRM driver version\n", __func__);
}
if (version) {
@@ -236,7 +292,7 @@ static virt_gpu_result_t virtgpu_open_device(virtgpu * gpu, const drmDevicePtr d
drmFreeVersion(version);
GGML_LOG_INFO("using DRM device %s\n", node_path);
GGML_LOG_INFO(GGML_VIRTGPU "using DRM device %s\n", node_path);
return APIR_SUCCESS;
}
@@ -245,7 +301,7 @@ static virt_gpu_result_t virtgpu_init_context(virtgpu * gpu) {
assert(!gpu->capset.version);
const int ret = virtgpu_ioctl_context_init(gpu, gpu->capset.id);
if (ret) {
GGML_LOG_INFO("failed to initialize context: %s\n", strerror(errno));
GGML_LOG_ERROR(GGML_VIRTGPU "%s: failed to initialize context: %s\n", __func__, strerror(errno));
return APIR_ERROR_INITIALIZATION_FAILED;
}
@@ -254,10 +310,10 @@ static virt_gpu_result_t virtgpu_init_context(virtgpu * gpu) {
static virt_gpu_result_t virtgpu_init_capset(virtgpu * gpu) {
if (gpu->use_apir_capset) {
GGML_LOG_INFO("Using the APIR capset\n");
GGML_LOG_INFO(GGML_VIRTGPU "Using the APIR capset\n");
gpu->capset.id = VIRTGPU_DRM_CAPSET_APIR;
} else {
GGML_LOG_INFO("Using the Venus capset\n");
GGML_LOG_INFO(GGML_VIRTGPU "Using the Venus capset\n");
gpu->capset.id = VIRTGPU_DRM_CAPSET_VENUS;
}
gpu->capset.version = 0;
@@ -266,7 +322,9 @@ static virt_gpu_result_t virtgpu_init_capset(virtgpu * gpu) {
virtgpu_ioctl_get_caps(gpu, gpu->capset.id, gpu->capset.version, &gpu->capset.data, sizeof(gpu->capset.data));
if (ret) {
GGML_LOG_INFO("failed to get APIR v%d capset: %s\n", gpu->capset.version, strerror(errno));
GGML_LOG_ERROR(GGML_VIRTGPU
"%s: failed to get APIR v%d capset: %s\n",
__func__, gpu->capset.version, strerror(errno));
return APIR_ERROR_INITIALIZATION_FAILED;
}
@@ -333,9 +391,9 @@ apir_encoder * remote_call_prepare(virtgpu * gpu, ApirCommandType apir_cmd_type,
* Prepare the command encoder and its buffer
*/
static char encoder_buffer[4096];
thread_local char encoder_buffer[4096];
static apir_encoder enc;
thread_local apir_encoder enc;
enc = {
.cur = encoder_buffer,
.start = encoder_buffer,
@@ -369,19 +427,19 @@ void remote_call_finish(virtgpu * gpu, apir_encoder * enc, apir_decoder * dec) {
UNUSED(gpu);
if (!enc) {
GGML_LOG_ERROR("Invalid (null) encoder\n");
GGML_ABORT(GGML_VIRTGPU "%s: Invalid (null) encoder", __func__);
}
if (!dec) {
GGML_LOG_ERROR("Invalid (null) decoder\n");
GGML_ABORT(GGML_VIRTGPU "%s: Invalid (null) decoder", __func__);
}
if (apir_encoder_get_fatal(enc)) {
GGML_LOG_ERROR("Failed to encode the output parameters.\n");
GGML_LOG_ERROR(GGML_VIRTGPU "%s: Failed to encode the output parameters.", __func__);
}
if (apir_decoder_get_fatal(dec)) {
GGML_LOG_ERROR("Failed to decode the input parameters.\n");
GGML_LOG_ERROR(GGML_VIRTGPU "%s: Failed to decode the input parameters.", __func__);
}
}
@@ -423,7 +481,7 @@ uint32_t remote_call(virtgpu * gpu,
int ret = drmIoctl(gpu->fd, DRM_IOCTL_VIRTGPU_EXECBUFFER, &args);
if (ret != 0) {
GGML_ABORT("%s: the virtgpu EXECBUFFER ioctl failed (%d)", __func__, ret);
GGML_ABORT(GGML_VIRTGPU "%s: the virtgpu EXECBUFFER ioctl failed (%d)", __func__, ret);
}
/*
@@ -467,7 +525,7 @@ uint32_t remote_call(virtgpu * gpu,
}
if (max_wait_ms && timedout) {
GGML_LOG_ERROR("timed out waiting for the host answer...\n");
GGML_LOG_ERROR(GGML_VIRTGPU "%s: timed out waiting for the host answer...\n", __func__);
return APIR_FORWARD_TIMEOUT;
}
@@ -489,10 +547,13 @@ static void log_call_duration(long long call_duration_ns, const char * name) {
double call_duration_s = (double) call_duration_ns / 1e9; // 1 second = 1e9 nanoseconds
if (call_duration_s > 1) {
GGML_LOG_INFO("%s: waited %.2fs for the %s host reply...\n", __func__, call_duration_s, name);
GGML_LOG_INFO(GGML_VIRTGPU
"waited %.2fs for the %s host reply...\n", call_duration_s, name);
} else if (call_duration_ms > 1) {
GGML_LOG_INFO("%s: waited %.2fms for the %s host reply...\n", __func__, call_duration_ms, name);
GGML_LOG_INFO(GGML_VIRTGPU
"waited %.2fms for the %s host reply...\n", call_duration_ms, name);
} else {
GGML_LOG_INFO("%s: waited %lldns for the %s host reply...\n", __func__, call_duration_ns, name);
GGML_LOG_INFO(GGML_VIRTGPU
"waited %lldns for the %s host reply...\n", call_duration_ns, name);
}
}

View File

@@ -17,6 +17,8 @@
#include <cstring>
#include "ggml-remoting.h"
#define VIRGL_RENDERER_UNSTABLE_APIS 1
#include "apir_hw.h"
#include <drm/virtgpu_drm.h>
@@ -73,6 +75,27 @@ struct virtgpu {
/* APIR communication pages */
virtgpu_shmem reply_shmem;
virtgpu_shmem data_shmem;
/* Mutex to protect shared data_shmem buffer from concurrent access */
mtx_t data_shmem_mutex;
/* Cached device information to prevent memory leaks and race conditions */
struct {
char * description;
char * name;
int32_t device_count;
uint32_t type;
size_t memory_free;
size_t memory_total;
} cached_device_info;
/* Cached buffer type information to prevent memory leaks and race conditions */
struct {
apir_buffer_type_host_handle_t host_handle;
char * name;
size_t alignment;
size_t max_size;
} cached_buffer_type;
};
static inline int virtgpu_ioctl(virtgpu * gpu, unsigned long request, void * args) {

View File

@@ -12,8 +12,8 @@ vendor = {
# "https://github.com/mackron/miniaudio/raw/refs/tags/0.11.23/miniaudio.h": "vendor/miniaudio/miniaudio.h",
"https://github.com/mackron/miniaudio/raw/669ed3e844524fcd883231b13095baee9f6de304/miniaudio.h": "vendor/miniaudio/miniaudio.h",
"https://raw.githubusercontent.com/yhirose/cpp-httplib/refs/tags/v0.30.1/httplib.h": "vendor/cpp-httplib/httplib.h",
"https://raw.githubusercontent.com/yhirose/cpp-httplib/refs/tags/v0.30.1/LICENSE": "vendor/cpp-httplib/LICENSE",
"https://raw.githubusercontent.com/yhirose/cpp-httplib/refs/tags/v0.30.2/httplib.h": "vendor/cpp-httplib/httplib.h",
"https://raw.githubusercontent.com/yhirose/cpp-httplib/refs/tags/v0.30.2/LICENSE": "vendor/cpp-httplib/LICENSE",
"https://raw.githubusercontent.com/sheredom/subprocess.h/b49c56e9fe214488493021017bf3954b91c7c1f5/subprocess.h": "vendor/sheredom/subprocess.h",
}

View File

@@ -265,9 +265,15 @@ std::pair<ggml_tensor *, ggml_tensor *> llm_build_qwen3next::build_delta_net_chu
cb(g_diff, "g_diff", il); // shape: (chunk_size, 1, n_chunks, H_v * n_seqs)
ggml_tensor * g_diff_exp = ggml_exp(ctx0, g_diff);
ggml_tensor * key_gdiff = ggml_mul(ctx0, k, g_diff_exp);
ggml_tensor * g_diff_exp_t = ggml_reshape_4d(ctx0, g_diff_exp,
1, chunk_size, n_chunks, g_diff_exp->ne[3]);
ggml_tensor * key_gdiff = ggml_mul(ctx0, k, g_diff_exp_t);
cb(key_gdiff, "key_gdiff", il); // shape: (S_k, chunk_size, n_chunks, H_v * n_seqs)
ggml_tensor * key_gdiff_t = ggml_cont(ctx0, ggml_transpose(ctx0, key_gdiff));
cb(key_gdiff_t, "key_gdiff_t", il); // shape: (chunk_size, S_k, n_chunks, H_v * n_seqs)
// state to be updated per chunk
ggml_tensor * new_state = state; // ggml_dup(ctx0, state);
@@ -322,9 +328,9 @@ std::pair<ggml_tensor *, ggml_tensor *> llm_build_qwen3next::build_delta_net_chu
: ggml_concat(ctx0, core_attn_out, core_attn_out_chunk, 2);
// kgdmulvnew = (key_gdiff).transpose(-1, -2) @ v_new
ggml_tensor * k_gdiff = ggml_cont(ctx0, get_slice_2d(ctx0, key_gdiff, chunk));
ggml_tensor * k_gdiff_t = get_slice_2d(ctx0, key_gdiff_t, chunk);
//ggml_tensor * kgdmulvnew = ggml_mul_mat(ctx0, k_gdiff, v_new); // this is slower on metal, why?
ggml_tensor * kgdmulvnew = ggml_mul_mat(ctx0, v_new_t, ggml_cont(ctx0, ggml_transpose(ctx0, k_gdiff)));
ggml_tensor * kgdmulvnew = ggml_mul_mat(ctx0, v_new_t, k_gdiff_t);
// last_recurrent_state = last_recurrent_state * g_last + kgdmulvnew
ggml_tensor * gexp_last_chunk = ggml_cont(ctx0, get_slice_2d(ctx0, g_last_exp, chunk));

View File

@@ -8032,6 +8032,8 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_eval() {
for (int mode : {GGML_ROPE_TYPE_NORMAL, GGML_ROPE_TYPE_NEOX, GGML_ROPE_TYPE_MROPE, GGML_ROPE_TYPE_IMROPE, GGML_ROPE_TYPE_VISION}) {
for (bool ff : {false, true}) {
test_cases.emplace_back(new test_rope(type, {128, 32, 2, 1}, 128, mode, 512, 1.4245f, 0.7465f, 1.4245f, ff, 0, true, true));
test_cases.emplace_back(new test_rope(type, {128, 32, 2, 1}, 128, mode, 512, 1.4245f, 0.7465f, 1.4245f, ff, 1, true, true));
test_cases.emplace_back(new test_rope(type, {128, 32, 2, 3}, 128, mode, 512, 1.4245f, 0.7465f, 1.4245f, ff, 1, true, true));
}
}
}

View File

@@ -674,15 +674,12 @@ int main(int argc, char ** argv) {
}
}
for (int i = 0; i < (int) embd.size(); i += params.n_batch) {
int n_eval = (int) embd.size() - i;
if (n_eval > params.n_batch) {
n_eval = params.n_batch;
}
if (!embd.empty()) {
int n_eval = (int) embd.size();
LOG_DBG("eval: %s\n", string_from(ctx, embd).c_str());
if (llama_decode(ctx, llama_batch_get_one(&embd[i], n_eval))) {
GGML_ASSERT(n_eval <= params.n_batch);
if (llama_decode(ctx, llama_batch_get_one(embd.data(), n_eval))) {
LOG_ERR("%s : failed to eval\n", __func__);
return 1;
}
@@ -743,7 +740,7 @@ int main(int argc, char ** argv) {
common_sampler_accept(smpl, embd_inp[n_consumed], /* accept_grammar= */ false);
++n_consumed;
if ((int) embd.size() >= params.n_batch) {
if ((int) embd.size() == params.n_batch) {
break;
}
}

View File

@@ -3,9 +3,14 @@ license_add_file("cpp-httplib" "LICENSE")
find_package(Threads REQUIRED)
llama_add_compile_flags()
add_library(${TARGET} STATIC httplib.cpp httplib.h)
if (NOT MSVC)
# disable warnings in 3rd party code
# disable warnings in 3rd party code
if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
target_compile_options(${TARGET} PRIVATE /w)
else()
target_compile_options(${TARGET} PRIVATE -w)
endif()
@@ -146,6 +151,23 @@ elseif (LLAMA_OPENSSL)
endif()
endif()
# disable warnings in 3rd party code
if(LLAMA_BUILD_BORINGSSL OR LLAMA_BUILD_LIBRESSL)
if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
target_compile_options(ssl PRIVATE /w)
target_compile_options(crypto PRIVATE /w)
if(LLAMA_BUILD_BORINGSSL)
target_compile_options(fipsmodule PRIVATE /w)
endif()
else()
target_compile_options(ssl PRIVATE -w)
target_compile_options(crypto PRIVATE -w)
if(LLAMA_BUILD_BORINGSSL)
target_compile_options(fipsmodule PRIVATE -w)
endif()
endif()
endif()
if (CPPHTTPLIB_OPENSSL_SUPPORT)
target_compile_definitions(${TARGET} PUBLIC CPPHTTPLIB_OPENSSL_SUPPORT) # used in server.cpp
if (APPLE AND CMAKE_SYSTEM_NAME STREQUAL "Darwin")

View File

@@ -117,6 +117,8 @@ time_t parse_http_date(const std::string &date_str) {
#ifdef _WIN32
return _mkgmtime(&tm_buf);
#elif defined _AIX
return mktime(&tm_buf);
#else
return timegm(&tm_buf);
#endif
@@ -1376,7 +1378,7 @@ int getaddrinfo_with_timeout(const char *node, const char *service,
// Allocate on the heap, so the resolver thread can keep using the data.
auto state = std::make_shared<GetAddrInfoState>();
state->node = node;
if (node) { state->node = node; }
state->service = service;
state->hints = *hints;
@@ -2896,10 +2898,20 @@ bool parse_range_header(const std::string &s, Ranges &ranges) try {
return;
}
const auto first =
static_cast<ssize_t>(lhs.empty() ? -1 : std::stoll(lhs));
const auto last =
static_cast<ssize_t>(rhs.empty() ? -1 : std::stoll(rhs));
ssize_t first = -1;
if (!lhs.empty()) {
ssize_t v;
auto res = detail::from_chars(lhs.data(), lhs.data() + lhs.size(), v);
if (res.ec == std::errc{}) { first = v; }
}
ssize_t last = -1;
if (!rhs.empty()) {
ssize_t v;
auto res = detail::from_chars(rhs.data(), rhs.data() + rhs.size(), v);
if (res.ec == std::errc{}) { last = v; }
}
if ((first == -1 && last == -1) ||
(first != -1 && last != -1 && first > last)) {
all_valid_ranges = false;
@@ -2974,25 +2986,17 @@ bool parse_accept_header(const std::string &s,
return;
}
#ifdef CPPHTTPLIB_NO_EXCEPTIONS
{
std::istringstream iss(quality_str);
iss >> accept_entry.quality;
// Check if conversion was successful and entire string was consumed
if (iss.fail() || !iss.eof()) {
double v = 0.0;
auto res = detail::from_chars(
quality_str.data(), quality_str.data() + quality_str.size(), v);
if (res.ec == std::errc{}) {
accept_entry.quality = v;
} else {
has_invalid_entry = true;
return;
}
}
#else
try {
accept_entry.quality = std::stod(quality_str);
} catch (...) {
has_invalid_entry = true;
return;
}
#endif
// Check if quality is in valid range [0.0, 1.0]
if (accept_entry.quality < 0.0 || accept_entry.quality > 1.0) {
has_invalid_entry = true;
@@ -5570,13 +5574,26 @@ bool Server::read_content(Stream &strm, Request &req, Response &res) {
strm, req, res,
// Regular
[&](const char *buf, size_t n) {
// Prevent arithmetic overflow when checking sizes.
// Avoid computing (req.body.size() + n) directly because
// adding two unsigned `size_t` values can wrap around and
// produce a small result instead of indicating overflow.
// Instead, check using subtraction: ensure `n` does not
// exceed the remaining capacity `max_size() - size()`.
if (req.body.size() >= req.body.max_size() ||
n > req.body.max_size() - req.body.size()) {
return false;
}
// Limit decompressed body size to payload_max_length_ to protect
// against "zip bomb" attacks where a small compressed payload
// decompresses to a massive size.
if (req.body.size() + n > payload_max_length_ ||
req.body.size() + n > req.body.max_size()) {
if (payload_max_length_ > 0 &&
(req.body.size() >= payload_max_length_ ||
n > payload_max_length_ - req.body.size())) {
return false;
}
req.body.append(buf, n);
return true;
},
@@ -5666,22 +5683,29 @@ bool Server::read_content_core(
// oversized request and fail early (causing connection close). For SSL
// builds we cannot reliably peek the decrypted application bytes, so keep
// the original behaviour.
#if !defined(CPPHTTPLIB_OPENSSL_SUPPORT) && !defined(_WIN32)
#if !defined(CPPHTTPLIB_OPENSSL_SUPPORT)
if (!req.has_header("Content-Length") &&
!detail::is_chunked_transfer_encoding(req.headers)) {
socket_t s = strm.socket();
if (s != INVALID_SOCKET) {
// Peek up to payload_max_length_ + 1 bytes. If more than
// payload_max_length_ bytes are pending, reject the request.
size_t to_peek =
(payload_max_length_ > 0)
? (std::min)(payload_max_length_ + 1, static_cast<size_t>(4096))
: 1;
std::vector<char> peekbuf(to_peek);
ssize_t n = ::recv(s, peekbuf.data(), to_peek, MSG_PEEK);
if (n > 0 && static_cast<size_t>(n) > payload_max_length_) {
// Indicate failure so connection will be closed.
return false;
// Only peek if payload_max_length is set to a finite value
if (payload_max_length_ > 0 &&
payload_max_length_ < (std::numeric_limits<size_t>::max)()) {
socket_t s = strm.socket();
if (s != INVALID_SOCKET) {
// Peek to check if there is any pending data
char peekbuf[1];
ssize_t n = ::recv(s, peekbuf, 1, MSG_PEEK);
if (n > 0) {
// There is data, so read it with payload limit enforcement
auto result = detail::read_content_without_length(
strm, payload_max_length_, out);
if (result == detail::ReadContentResult::PayloadTooLarge) {
res.status = StatusCode::PayloadTooLarge_413;
return false;
} else if (result != detail::ReadContentResult::Success) {
return false;
}
return true;
}
}
}
return true;
@@ -6656,7 +6680,8 @@ void ClientImpl::close_socket(Socket &socket) {
}
bool ClientImpl::read_response_line(Stream &strm, const Request &req,
Response &res) const {
Response &res,
bool skip_100_continue) const {
std::array<char, 2048> buf{};
detail::stream_line_reader line_reader(strm, buf.data(), buf.size());
@@ -6677,8 +6702,8 @@ bool ClientImpl::read_response_line(Stream &strm, const Request &req,
res.status = std::stoi(std::string(m[2]));
res.reason = std::string(m[3]);
// Ignore '100 Continue'
while (res.status == StatusCode::Continue_100) {
// Ignore '100 Continue' (only when not using Expect: 100-continue explicitly)
while (skip_100_continue && res.status == StatusCode::Continue_100) {
if (!line_reader.getline()) { return false; } // CRLF
if (!line_reader.getline()) { return false; } // next response line
@@ -7463,7 +7488,8 @@ bool ClientImpl::write_content_with_provider(Stream &strm,
}
bool ClientImpl::write_request(Stream &strm, Request &req,
bool close_connection, Error &error) {
bool close_connection, Error &error,
bool skip_body) {
// Prepare additional headers
if (close_connection) {
if (!req.has_header("Connection")) {
@@ -7582,7 +7608,59 @@ bool ClientImpl::write_request(Stream &strm, Request &req,
}
}
// After sending request line and headers, wait briefly for an early server
// response (e.g. 4xx) and avoid sending a potentially large request body
// unnecessarily. This workaround is only enabled on Windows because Unix
// platforms surface write errors (EPIPE) earlier; on Windows kernel send
// buffering can accept large writes even when the peer already responded.
// Check the stream first (which covers SSL via `is_readable()`), then
// fall back to select on the socket. Only perform the wait for very large
// request bodies to avoid interfering with normal small requests and
// reduce side-effects. Poll briefly (up to 50ms as default) for an early
// response. Skip this check when using Expect: 100-continue, as the protocol
// handles early responses properly.
#if defined(_WIN32)
if (!skip_body &&
req.body.size() > CPPHTTPLIB_WAIT_EARLY_SERVER_RESPONSE_THRESHOLD &&
req.path.size() > CPPHTTPLIB_REQUEST_URI_MAX_LENGTH) {
auto start = std::chrono::high_resolution_clock::now();
for (;;) {
// Prefer socket-level readiness to avoid SSL_pending() false-positives
// from SSL internals. If the underlying socket is readable, assume an
// early response may be present.
auto sock = strm.socket();
if (sock != INVALID_SOCKET && detail::select_read(sock, 0, 0) > 0) {
return false;
}
// Fallback to stream-level check for non-socket streams or when the
// socket isn't reporting readable. Avoid using `is_readable()` for
// SSL, since `SSL_pending()` may report buffered records that do not
// indicate a complete application-level response yet.
if (!is_ssl() && strm.is_readable()) { return false; }
auto now = std::chrono::high_resolution_clock::now();
auto elapsed =
std::chrono::duration_cast<std::chrono::milliseconds>(now - start)
.count();
if (elapsed >= CPPHTTPLIB_WAIT_EARLY_SERVER_RESPONSE_TIMEOUT_MSECOND) {
break;
}
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}
#endif
// Body
if (skip_body) { return true; }
return write_request_body(strm, req, error);
}
bool ClientImpl::write_request_body(Stream &strm, Request &req,
Error &error) {
if (req.body.empty()) {
return write_content_with_provider(strm, req, error);
}
@@ -7758,8 +7836,20 @@ void ClientImpl::output_error_log(const Error &err,
bool ClientImpl::process_request(Stream &strm, Request &req,
Response &res, bool close_connection,
Error &error) {
// Send request
if (!write_request(strm, req, close_connection, error)) { return false; }
// Auto-add Expect: 100-continue for large bodies
if (CPPHTTPLIB_EXPECT_100_THRESHOLD > 0 && !req.has_header("Expect")) {
auto body_size = req.body.empty() ? req.content_length_ : req.body.size();
if (body_size >= CPPHTTPLIB_EXPECT_100_THRESHOLD) {
req.set_header("Expect", "100-continue");
}
}
// Check for Expect: 100-continue
auto expect_100_continue = req.get_header_value("Expect") == "100-continue";
// Send request (skip body if using Expect: 100-continue)
auto write_request_success =
write_request(strm, req, close_connection, error, expect_100_continue);
#ifdef CPPHTTPLIB_OPENSSL_SUPPORT
if (is_ssl()) {
@@ -7774,14 +7864,48 @@ bool ClientImpl::process_request(Stream &strm, Request &req,
}
#endif
// Handle Expect: 100-continue with timeout
if (expect_100_continue && CPPHTTPLIB_EXPECT_100_TIMEOUT_MSECOND > 0) {
time_t sec = CPPHTTPLIB_EXPECT_100_TIMEOUT_MSECOND / 1000;
time_t usec = (CPPHTTPLIB_EXPECT_100_TIMEOUT_MSECOND % 1000) * 1000;
auto ret = detail::select_read(strm.socket(), sec, usec);
if (ret <= 0) {
// Timeout or error: send body anyway (server didn't respond in time)
if (!write_request_body(strm, req, error)) { return false; }
expect_100_continue = false; // Switch to normal response handling
}
}
// Receive response and headers
if (!read_response_line(strm, req, res) ||
// When using Expect: 100-continue, don't auto-skip `100 Continue` response
if (!read_response_line(strm, req, res, !expect_100_continue) ||
!detail::read_headers(strm, res.headers)) {
error = Error::Read;
if (write_request_success) { error = Error::Read; }
output_error_log(error, &req);
return false;
}
if (!write_request_success) { return false; }
// Handle Expect: 100-continue response
if (expect_100_continue) {
if (res.status == StatusCode::Continue_100) {
// Server accepted, send the body
if (!write_request_body(strm, req, error)) { return false; }
// Read the actual response
res.headers.clear();
res.body.clear();
if (!read_response_line(strm, req, res) ||
!detail::read_headers(strm, res.headers)) {
error = Error::Read;
output_error_log(error, &req);
return false;
}
}
// If not 100 Continue, server returned an error; proceed with that response
}
// Body
if ((res.status != StatusCode::NoContent_204) && req.method != "HEAD" &&
req.method != "CONNECT") {
@@ -9543,7 +9667,7 @@ bool SSLClient::load_certs() {
last_openssl_error_ = ERR_get_error();
ret = false;
}
} else {
} else if (!ca_cert_store_) {
auto loaded = false;
#ifdef _WIN32
loaded =
@@ -9790,7 +9914,11 @@ bool SSLClient::verify_host_with_common_name(X509 *server_cert) const {
bool SSLClient::check_host_name(const char *pattern,
size_t pattern_len) const {
if (host_.size() == pattern_len && host_ == pattern) { return true; }
// Exact match (case-insensitive)
if (host_.size() == pattern_len &&
detail::case_ignore::equal(host_, std::string(pattern, pattern_len))) {
return true;
}
// Wildcard match
// https://bugs.launchpad.net/ubuntu/+source/firefox-3.0/+bug/376484
@@ -9805,9 +9933,23 @@ bool SSLClient::check_host_name(const char *pattern,
auto itr = pattern_components.begin();
for (const auto &h : host_components_) {
auto &p = *itr;
if (p != h && p != "*") {
auto partial_match = (p.size() > 0 && p[p.size() - 1] == '*' &&
!p.compare(0, p.size() - 1, h));
if (!httplib::detail::case_ignore::equal(p, h) && p != "*") {
bool partial_match = false;
if (!p.empty() && p[p.size() - 1] == '*') {
const auto prefix_length = p.size() - 1;
if (prefix_length == 0) {
partial_match = true;
} else if (h.size() >= prefix_length) {
partial_match =
std::equal(p.begin(),
p.begin() + static_cast<std::string::difference_type>(
prefix_length),
h.begin(), [](const char ca, const char cb) {
return httplib::detail::case_ignore::to_lower(ca) ==
httplib::detail::case_ignore::to_lower(cb);
});
}
}
if (!partial_match) { return false; }
}
++itr;

View File

@@ -8,8 +8,8 @@
#ifndef CPPHTTPLIB_HTTPLIB_H
#define CPPHTTPLIB_HTTPLIB_H
#define CPPHTTPLIB_VERSION "0.30.1"
#define CPPHTTPLIB_VERSION_NUM "0x001E01"
#define CPPHTTPLIB_VERSION "0.30.2"
#define CPPHTTPLIB_VERSION_NUM "0x001E02"
/*
* Platform compatibility check
@@ -98,6 +98,22 @@
#define CPPHTTPLIB_CLIENT_MAX_TIMEOUT_MSECOND 0
#endif
#ifndef CPPHTTPLIB_EXPECT_100_THRESHOLD
#define CPPHTTPLIB_EXPECT_100_THRESHOLD 1024
#endif
#ifndef CPPHTTPLIB_EXPECT_100_TIMEOUT_MSECOND
#define CPPHTTPLIB_EXPECT_100_TIMEOUT_MSECOND 1000
#endif
#ifndef CPPHTTPLIB_WAIT_EARLY_SERVER_RESPONSE_THRESHOLD
#define CPPHTTPLIB_WAIT_EARLY_SERVER_RESPONSE_THRESHOLD (1024 * 1024)
#endif
#ifndef CPPHTTPLIB_WAIT_EARLY_SERVER_RESPONSE_TIMEOUT_MSECOND
#define CPPHTTPLIB_WAIT_EARLY_SERVER_RESPONSE_TIMEOUT_MSECOND 50
#endif
#ifndef CPPHTTPLIB_IDLE_INTERVAL_SECOND
#define CPPHTTPLIB_IDLE_INTERVAL_SECOND 0
#endif
@@ -286,8 +302,10 @@ using socket_t = int;
#include <atomic>
#include <cassert>
#include <cctype>
#include <chrono>
#include <climits>
#include <condition_variable>
#include <cstdlib>
#include <cstring>
#include <errno.h>
#include <exception>
@@ -305,6 +323,7 @@ using socket_t = int;
#include <sstream>
#include <string>
#include <sys/stat.h>
#include <system_error>
#include <thread>
#include <unordered_map>
#include <unordered_set>
@@ -494,6 +513,69 @@ private:
bool execute_on_destruction;
};
// Simple from_chars implementation for integer and double types (C++17
// substitute)
template <typename T> struct from_chars_result {
const char *ptr;
std::errc ec;
};
template <typename T>
inline from_chars_result<T> from_chars(const char *first, const char *last,
T &value, int base = 10) {
value = 0;
const char *p = first;
bool negative = false;
if (p != last && *p == '-') {
negative = true;
++p;
}
if (p == last) { return {first, std::errc::invalid_argument}; }
T result = 0;
for (; p != last; ++p) {
char c = *p;
int digit = -1;
if ('0' <= c && c <= '9') {
digit = c - '0';
} else if ('a' <= c && c <= 'z') {
digit = c - 'a' + 10;
} else if ('A' <= c && c <= 'Z') {
digit = c - 'A' + 10;
} else {
break;
}
if (digit < 0 || digit >= base) { break; }
if (result > ((std::numeric_limits<T>::max)() - digit) / base) {
return {p, std::errc::result_out_of_range};
}
result = result * base + digit;
}
if (p == first || (negative && p == first + 1)) {
return {first, std::errc::invalid_argument};
}
value = negative ? -result : result;
return {p, std::errc{}};
}
// from_chars for double (simple wrapper for strtod)
inline from_chars_result<double> from_chars(const char *first, const char *last,
double &value) {
std::string s(first, last);
char *endptr = nullptr;
errno = 0;
value = std::strtod(s.c_str(), &endptr);
if (endptr == s.c_str()) { return {first, std::errc::invalid_argument}; }
if (errno == ERANGE) {
return {first + (endptr - s.c_str()), std::errc::result_out_of_range};
}
return {first + (endptr - s.c_str()), std::errc{}};
}
} // namespace detail
enum SSLVerifierResponse {
@@ -1848,10 +1930,11 @@ private:
Result send_(Request &&req);
socket_t create_client_socket(Error &error) const;
bool read_response_line(Stream &strm, const Request &req,
Response &res) const;
bool read_response_line(Stream &strm, const Request &req, Response &res,
bool skip_100_continue = true) const;
bool write_request(Stream &strm, Request &req, bool close_connection,
Error &error);
Error &error, bool skip_body = false);
bool write_request_body(Stream &strm, Request &req, Error &error);
void prepare_default_headers(Request &r, bool for_stream,
const std::string &ct);
bool redirect(Request &req, Response &res, Error &error);
@@ -3243,10 +3326,11 @@ private:
msg.id = value;
} else if (field == "retry") {
// Parse retry interval in milliseconds
try {
retry_ms = std::stoi(value);
} catch (...) {
// Invalid retry value, ignore
{
int v = 0;
auto res =
detail::from_chars(value.data(), value.data() + value.size(), v);
if (res.ec == std::errc{}) { retry_ms = v; }
}
}
// Unknown fields are ignored per SSE spec