sync : ggml

ggml : bump version to 0.9.7 (ggml/1425)
ggml : bump version to 0.9.6 (ggml/1423)
2026-02-19 14:13:22 +02:00 · 2026-02-15 22:24:29 +02:00 · 2026-02-15 22:24:29 +02:00 · 2026-02-15 22:24:29 +02:00
9 changed files with 2 additions and 2032 deletions
--- a/examples/llama-eval/AGENTS.md
+++ b/examples/llama-eval/AGENTS.md
@@ -1,190 +0,0 @@
-# llama-eval Codebase Guidelines
-
-## Overview
-
-This directory contains Python evaluation tools for llama.cpp:
- `llama-eval.py` - Main evaluation tool with multiple datasets (AIME, AIME2025, GSM8K, GPQA)
- `llama-server-simulator.py` - Flask-based server simulator for testing
- `test-simulator.sh` - Test script for the simulator
-
-## Build/Run Commands
-
-### Virtual Environment
-The project uses a virtual environment located at `venv/`:
-```bash
-source venv/bin/activate
-```
-
-### Running the Main Evaluator
-```bash
-python llama-eval.py \
-  --server http://127.0.0.1:8013 \
-  --model gpt-oss-20b-hf-low \
-  --dataset aime \
-  --n_cases 10 \
-  --grader-type llm \
-  --seed 42
-```
-
-### Running the Simulator (for testing)
-```bash
-python llama-server-simulator.py --port 8033 --success-rate 0.8
-```
-
-### Running Tests
-```bash
-./test-simulator.sh
-```
-
-## Code Style Guidelines
-
-### Imports
- Standard library imports first (argparse, json, os, re, subprocess, sys, time)
- Third-party imports (requests, tqdm, datasets, flask) after standard library
- Relative imports not used
- Group imports by category with blank line between groups
-
-### Formatting
- 4-space indentation
- Max line length: 125 characters (per parent project's .flake8)
- Use double quotes for strings
- Use triple double quotes for docstrings
- Binary operators at the beginning of continued lines
-
-### Naming Conventions
- Classes: PascalCase (e.g., `AimeDataset`, `Grader`, `Processor`)
- Functions: snake_case (e.g., `normalize_number`, `get_prompt`)
- Variables: snake_case (e.g., `question_text`, `correct_count`)
- Constants: UPPER_SNAKE_CASE (e.g., `GRADER_PATTERNS`, `TEMPLATE_REGISTRY`)
- Private methods: prefix with underscore (e.g., `_load_dataset`, `_grade_regex`)
-
-### Types
- Use type hints for all function signatures
- Import from `typing` module: `Dict`, `List`, `Optional`, `Any`, `Tuple`
- Use `@dataclass` for data structures
- Prefer `Optional[T]` over `Union[T, None]`
-
-### Error Handling
- Use try/except for network requests and file operations
- Return `None` or `False` on errors when appropriate
- Use `ValueError` for invalid arguments
- Use `FileNotFoundError` for missing files
- CLI scripts should handle exceptions gracefully
-
-### Dataclasses
- Use `@dataclass` for structured data
- Define fields with explicit types
- Use `Optional[T]` for nullable fields
- Provide default values where appropriate
-
-### String Formatting
- Use f-strings for formatting (Python 3.6+)
- Use triple double quotes for multi-line strings
- Escape backslashes in regex patterns: `r'\\boxed{(\d+)}'`
-
-### File Paths
- Use `pathlib.Path` instead of string paths
- Create directories with `mkdir(parents=True, exist_ok=True)`
- Use `Path.home()` for user home directory
-
-### Logging
- Use `print()` for user-facing output
- Use `sys.stderr` for debug logging
- Simulator writes debug logs to `/tmp/simulator-debug.log`
-
-### Testing
-
- Test script uses bash with `set -e` for strict error handling
- Simulator runs in background with PID tracking
- Tests verify correct answers, error cases, and edge cases
- Use `curl` for HTTP testing in shell scripts
-
-### Whitespace Cleanup
- Remove trailing whitespace from all lines
- When making edits, do not leave trailing whitespace
-
-## Dataset Support
-
-### AIME Dataset
- 90 questions from 2025 AIME competition
- Answers in `\boxed{answer}` format
- Supports regex, CLI, and LLM grading
-
-### AIME2025 Dataset
- 30 questions from 2025 AIME I & II
- Answers in `\boxed{answer}` format
- Requires loading two config parts
-
-### GSM8K Dataset
- 7473 math word problems
- Answers numeric values with `####` separator
- Supports regex, CLI, and LLM grading
-
-### GPQA Dataset
- 198 questions from GPQA Diamond
- Multiple choice with shuffled options (A, B, C, D)
- **Requires LLM grader** (returns letter A/B/C/D)
-
-## Grading Types
-
-### Regex Grader
- Built-in patterns per dataset
- Prioritizes `\boxed{}` for AIME datasets
- Extracts last number for GSM8K
-
-### CLI Grader
- External script interface
- Call: `grader.sh --answer <pred> --expected <gold>`
- Exit code 0 = correct, non-zero = incorrect
-
-### LLM Grader
- Uses judge model for answer extraction
- Includes few-shot examples
- Case-insensitive comparison
- Required for GPQA
-
-## Configuration
-
-### Sampling Parameters (Optional)
- `--temperature`: Sampling temperature
- `--top-k`: Top K sampling
- `--top-p`: Top P sampling
- `--min-p`: Min P sampling
- Only passed to API if explicitly specified
-
-### Default Values
- `--n_predict`: -1 (infinite)
- `--grader-type`: llm
- `--seed`: 1234
- `--threads`: 32
- `--output`: llama-eval-state.json
-
-## Output Format
-
-### Progress Table
- Shows task ID, dataset, prompt (truncated to 43 chars), expected answer, status
- Uses `tqdm` for progress bars
-
-### Results Summary
- Format: `Results: X/Y correct (Z%)`
- Displayed after all tasks complete
-
-### JSON Output
- Complete eval state saved to output file
- Contains: task IDs, correctness, prompts, extracted answers, sampling config
- Uses `dataclasses.asdict()` for serialization
-
-## HuggingFace Datasets
-
- Cache directory: `~/.cache/huggingface/datasets`
- Set via `HF_DATASETS_CACHE` environment variable
- Telemetry disabled via `HF_HUB_DISABLE_TELEMETRY=1`
- Datasets loaded with `datasets.load_dataset()`
-
-## Flask Simulator
-
- Runs on configurable port (default: 5000)
- Endpoint: `/v1/chat/completions` (OpenAI-compatible)
- Uses Dice coefficient for question matching
- Configurable success rate for testing
- Debug logs to `/tmp/simulator-debug.log`
--- a/examples/llama-eval/IMPLEMENTATION.md
+++ b/examples/llama-eval/IMPLEMENTATION.md
@@ -1,94 +0,0 @@
-# llama-eval Implementation Summary
-
-## Overview
-
-Simple evaluation tool for llama.cpp with support for multiple datasets (AIME, GSM8K, GPQA) and flexible grading (regex, CLI, LLM).
-
-## Key Features
-
- **Multiple Datasets**: AIME, GSM8K, GPQA with proper answer extraction
- **Flexible Grading**: Regex, CLI, or LLM-based grading
- **Parallel Processing**: Configurable thread count for concurrent requests
- **Sampling Parameters**: Temperature, Top K, Top P, Min P (optional)
- **Real-time Feedback**: Progress tracking with detailed output
- **JSON Output**: Complete eval state saved for debugging
- **GPQA Support**: Answer shuffling with reproducible results
-
-## Architecture
-
-### Eval State
-```python
-@dataclass
-class EvalState:
-    id: str
-    tasks: List[str]
-    task_states: Dict[str, Dict[str, Any]]
-    sampling_config: Dict[str, Any]
-```
-
-### Processor
- Handles processing, grading, and state management
- Thread-safe concurrent execution
- Configurable sampling parameters
-
-### Grader
- Abstract grading interface supporting multiple types
- Regex grader with dataset-specific patterns
- CLI grader with external script interface
- LLM grader with configurable server and model
-
-### Datasets
- `AimeDataset`: 90 AIME 2025 questions
- `Aime2025Dataset`: 30 AIME 2025 I & II questions
- `Gsm8kDataset`: 7473 math word problems
- `GpqaDataset`: 198 GPQA Diamond questions with shuffling
-
-## Configuration
-
-### Sampling Parameters (Optional)
- `--temperature`: Sampling temperature
- `--top-k`: Top K sampling
- `--top-p`: Top P sampling
- `--min-p`: Min P sampling
- Only passed if explicitly specified
-
-### Grading Types
- **regex**: Built-in patterns for each dataset
- **cli**: External script with `--answer` and `--expected` args
- **llm**: LLM-based extraction with few-shot examples and configurable server/model
-
-### Dataset Requirements
- **AIME**: Supports regex, CLI, or LLM grader
- **AIME2025**: Supports regex, CLI, or LLM grader
- **GSM8K**: Supports regex, CLI, or LLM grader
- **GPQA**: Requires LLM grader
-
-## Output Format
-
-### Progress Table
-```
-  Task ID             Dataset  Prompt (first 43 chars)                        Expected    Status
-  aime_000_001         AIME   Complete the following reactions and sel...    A          pending
-```
-
-### Results Summary
-```
-============================================================
-Results: 8/10 correct (80.0%)
-============================================================
-```
-
-### JSON Output
-Complete eval state with task IDs, correctness, prompts, extracted answers, and sampling configuration.
-
-## Technical Details
-
- Default max tokens: -1 (infinite)
- Default grader type: llm
- Default seed: 1234
- Default threads: 32
- Prompt truncation: First 43 chars + padding + "..."
- Response truncation: Last 10 lines for grading
- GPQA requires LLM grader (returns letter A/B/C/D)
- Judge model defaults to evaluated model if not specified
- Sample answers defined in SAMPLE_ANSWERS dict for few-shot learning
--- a/examples/llama-eval/README.md
+++ b/examples/llama-eval/README.md
@@ -1,112 +0,0 @@
-# llama-eval Evaluation Tool
-
-Simple evaluation tool for llama.cpp with support for multiple datasets.
-
-## Features
-
- **Multiple Datasets**: AIME, GSM8K, GPQA
- **Flexible Grading**: Regex, CLI, or LLM-based grading
- **Parallel Processing**: Configurable thread count
- **Real-time Feedback**: Progress tracking with detailed output
- **Sampling Parameters**: Temperature, Top K, Top P, Min P
- **JSON Output**: Complete eval state saved for debugging
-
-## Usage
-
-```bash
-python llama-eval.py \
-  --server http://127.0.0.1:8013 \
-  --model gpt-oss-20b-hf-low \
-  --judge-model gpt-oss-20b-hf-medium \
-  --dataset aime \
-  --n_cases 10 \
-  --grader-type llm \
-  --seed 42
-```
-
-## CLI Arguments
-
- `--server`: llama-server URL (default: http://127.0.0.1:8013)
- `--model`: Model name for evaluation (default: llama)
- `--judge-model`: Model name for LLM judge (default: same as main model)
- `--judge-server`: Server URL for LLM judge (default: same as main server)
- `--dataset`: Dataset type (aime, aime2025, gsm8k, gpqa)
- `--n_cases`: Number of cases to evaluate (default: all)
- `--n_predict`: Max tokens to predict per prompt (default: -1, infinite)
- `--temperature`: Sampling temperature (default: not passed)
- `--top-k`: Top K sampling (default: not passed)
- `--top-p`: Top P sampling (default: not passed)
- `--min-p`: Min P sampling (default: not passed)
- `--threads`: Number of threads for parallel requests (default: 32)
- `--verbose`: Show detailed output for each case
- `--output`: Output file for eval state (default: llama-eval-state.json)
- `--grader-type`: Grader type (regex, cli, llm, default: llm)
- `--grader-script`: Path to CLI grader script (required for --grader-type cli)
- `--seed`: Random seed for shuffling (default: 1234)
-
-## Datasets
-
-### AIME
- 90 questions from 2025 AIME competition
- Answers in boxed format: `\boxed{answer}`
- Requires regex grader or LLM grader
-
-### AIME2025
- 30 questions from 2025 AIME I & II competitions
- Answers in boxed format: `\boxed{answer}`
- Supports regex, CLI, or LLM grader
-
-### GSM8K
- 7473 math word problems
- Answers are numeric values
- Requires regex grader or LLM grader
-
-### GPQA
- 198 questions from GPQA Diamond dataset
- Multiple choice with shuffled options
- Requires LLM grader (returns letter A, B, C, or D)
-
-## Grading Types
-
-### Regex Grader
-Built-in patterns for different datasets:
- AIME: `\boxed{(\d+)}|\b(\d+)\b`
- AIME2025: `\boxed{(\d+)}|\b(\d+)\b`
- GSM8K: `\b(\d+)\b`
- GPQA: Letter extraction (A, B, C, D)
-
-### CLI Grader
-External script interface:
-```bash
-./grader.sh --answer <pred> --expected <gold>
-```
-Returns exit code 0 if correct, non-zero if incorrect.
-
-### LLM Grader
-Uses LLM to extract and compare answers:
- Configurable server and model
- Includes few-shot examples from sample answers
- Case-insensitive comparison
- Required for GPQA dataset
-
-## Output
-
-### Progress Table
-```
-  Task ID             Dataset  Prompt (first 43 chars)                        Expected    Status
-  aime_000_001         AIME   Complete the following reactions and sel...    A          pending
-```
-
-### Results
-```
-============================================================
-Results: 8/10 correct (80.0%)
-============================================================
-```
-
-### JSON Output
-Complete eval state saved to output file with:
- Task IDs and correctness status
- Prompts and extracted answers
- Sampling configuration
- Processing metadata
--- a/examples/llama-eval/llama-eval.py
+++ b/examples/llama-eval/llama-eval.py
--- a/examples/llama-eval/llama-server-simulator-README.md
+++ b/examples/llama-eval/llama-server-simulator-README.md
@@ -1,36 +0,0 @@
-# llama-server-simulator
-
-Standalone Python script simulating llama-server HTTP endpoint for testing.
-
-## Features
-
- HTTP Server with OpenAI-compatible `/v1/chat/completions` endpoint
- AIME Dataset Integration - Loads 90 questions from HuggingFace
- Intelligent Question Matching - Uses exact matching, LaTeX removal, and Levenshtein distance
- Configurable Success Rate - Control correct/wrong answer generation (0-1)
- Debug Logging - Troubleshoot matching issues
-
-## Usage
-
-```bash
-python llama-server-simulator.py --success-rate 0.8
-```
-
-## Arguments
-
- `--success-rate`: Probability of returning correct answer (0.0-1.0, default: 0.8)
- `--port`: Server port (default: 8033)
- `--debug`: Enable debug logging (default: False)
-
-## Testing
-
-```bash
-./test-simulator.sh
-```
-
-## Implementation Details
-
- Uses Levenshtein distance for partial matching (threshold: 0.3)
- Automatic caching via HuggingFace datasets library
- Wrong answers generated by incrementing expected answer
- Debug output written to stderr
--- a/examples/llama-eval/llama-server-simulator.py
+++ b/examples/llama-eval/llama-server-simulator.py
@@ -1,283 +0,0 @@
-#!/usr/bin/env python3
-
-import argparse
-import json
-import random
-import re
-import time
-import sys
-import os
-from typing import Dict, List, Optional
-from dataclasses import dataclass, asdict
-from pathlib import Path
-
-import datasets
-from flask import Flask, request, jsonify
-
-# Set cache directory for HuggingFace datasets
-cache_dir = Path.home() / ".cache" / "huggingface" / "datasets"
-cache_dir.mkdir(parents=True, exist_ok=True)
-os.environ["HF_DATASETS_CACHE"] = str(cache_dir)
-
-def dice(s1: str, s2: str) -> float:
-    """Calculate Dice coefficient between two strings based on bigram overlap."""
-    if not s1 and not s2:
-        return 1.0
-
-    def _bigrams(s: str):
-        return [s[i : i + 2] for i in range(len(s) - 1)]
-
-    bigrams1 = _bigrams(s1)
-    bigrams2 = _bigrams(s2)
-
-    if not bigrams1 and not bigrams2:
-        return 1.0
-
-    from collections import Counter
-
-    freq1 = Counter(bigrams1)
-    freq2 = Counter(bigrams2)
-
-    intersection = sum(min(freq1[bg], freq2[bg]) for bg in freq1)
-    dice_coeff = 2 * intersection / (len(bigrams1) + len(bigrams2))
-    return dice_coeff
-
-def debug_log(message: str):
-    """Log debug messages to both stdout and a file"""
-    print(message, file=sys.stderr)
-    with open("/tmp/simulator-debug.log", "a") as f:
-        f.write(message + "\n")
-
-app = Flask(__name__)
-
-@dataclass
-class EvalState:
-    id: str
-    tasks: List[str]
-    task_states: Dict[str, Dict]
-    sampling_config: Dict
-
-def normalize_number(s: str) -> Optional[int]:
-    match = re.match(r"\d+", s)  # match digits from the start
-    if not match:
-        return None
-    return int(match.group(0))
-
-class AimeDataset:
-    def __init__(self, split: str = "train"):
-        self.split = split
-        self.questions: List[Dict] = []
-        self._load_dataset()
-
-    def _load_dataset(self):
-        print(f"Loading AIME dataset (split: {self.split})...")
-
-        cache_path = Path.home() / ".cache" / "huggingface" / "datasets" / "AI-MO___aimo-validation-aime" / "default" / "0.0.0"
-        if cache_path.exists():
-            print(f"Using cached dataset from {cache_path}")
-            ds = datasets.load_dataset("AI-MO/aimo-validation-aime", split=self.split, cache_dir=str(cache_path))
-        else:
-            ds = datasets.load_dataset("AI-MO/aimo-validation-aime", split=self.split)
-
-        self.questions = list(ds)
-        print(f"AIME dataset loaded: {len(self.questions)} questions")
-
-    def find_question(self, request_text: str) -> Optional[Dict]:
-        best_match = None
-        best_distance = -1
-        best_index = -1
-
-        for i, question in enumerate(self.questions):
-            question_text = question["problem"]
-            request_lower = request_text.lower()
-            question_lower = question_text.lower()
-
-            # Exact match
-            if question_lower == request_lower:
-                debug_log(f"DEBUG: Found exact match at index {i}")
-                return question
-
-            # Remove LaTeX formatting for more flexible matching
-            question_no_latex = re.sub(r'\$[^$]+\$', '', question_text)
-            if question_no_latex.lower() == request_lower:
-                debug_log(f"DEBUG: Found match (no LaTeX) at index {i}")
-                return question
-
-            # Calculate Levenshtein distance for partial matches
-            # Only consider if request is at least 50% of question length
-            if len(request_lower) >= len(question_lower) * 0.5:
-                distance = dice(question_lower, request_lower)
-
-                if distance > best_distance:
-                    best_distance = distance
-                    best_match = question
-                    best_index = i
-
-        if best_match and best_distance > 0.3:  # Threshold for partial match
-            debug_log(f"DEBUG: Found best partial match at index {best_index} with distance {best_distance:.3f}")
-            return best_match
-
-        debug_log(f"DEBUG: No matching question found for: {request_text[:100]}...")
-        return None
-
-    def get_answer(self, question: Dict) -> str:
-        answer = question["answer"]
-        if isinstance(answer, str):
-            normalized = normalize_number(answer)
-            return str(normalized) if normalized is not None else answer
-        return str(answer)
-
-class Simulator:
-    def __init__(
-        self,
-        port: int = 8033,
-        host: str = "localhost",
-        success_rate: float = 0.8,
-        dataset_split: str = "train"
-    ):
-        self.port = port
-        self.host = host
-        self.success_rate = success_rate
-        self.dataset = AimeDataset(dataset_split)
-        self.eval_state = EvalState(
-            id="aime-2025",
-            tasks=["aime"],
-            task_states={},
-            sampling_config={"temperature": 0, "max_tokens": 2048}
-        )
-
-    def _generate_response(
-        self,
-        question: Dict,
-        should_be_correct: bool
-    ) -> Dict:
-        expected_answer = self.dataset.get_answer(question)
-
-        if should_be_correct:
-            response_text = expected_answer
-        else:
-            response_text = self._generate_wrong_answer(question)
-
-        return {
-            "id": f"chatcmpl-{int(time.time())}",
-            "object": "chat.completion",
-            "created": int(time.time()),
-            "model": "llama",
-            "choices": [
-                {
-                    "index": 0,
-                    "message": {
-                        "role": "assistant",
-                        "content": response_text
-                    },
-                    "finish_reason": "stop"
-                }
-            ],
-            "usage": {
-                "prompt_tokens": 100,
-                "completion_tokens": 50,
-                "total_tokens": 150
-            }
-        }
-
-    def _generate_wrong_answer(self, question: Dict) -> str:
-        expected_answer = self.dataset.get_answer(question)
-
-        if expected_answer.isdigit():
-            wrong_answer = str(int(expected_answer) + 1)
-        else:
-            wrong_answer = expected_answer + " (wrong)"
-
-        return wrong_answer
-
-    def _process_request(self, request_data: Dict) -> Dict:
-        messages = request_data.get("messages", [])
-        if not messages:
-            return {"error": "No messages in request"}
-
-        request_text = messages[0].get("content", "")
-        debug_log(f"DEBUG: Received request with content: {request_text[:150]}...")
-
-        question = self.dataset.find_question(request_text)
-        if not question:
-            debug_log(f"DEBUG: find_question returned None")
-            return {"error": "No matching question found"}
-
-        should_be_correct = random.random() < self.success_rate
-
-        response = self._generate_response(question, should_be_correct)
-
-        task_id = "aime"
-        self.eval_state.task_states[task_id] = {
-            "correct": should_be_correct,
-            "expected": self.dataset.get_answer(question),
-            "predicted": response["choices"][0]["message"]["content"]
-        }
-
-        return response
-
-@app.route('/v1/chat/completions', methods=['POST'])
-def chat_completions():
-    try:
-        request_data = request.get_json()
-
-        if not request_data:
-            return jsonify({"error": "Invalid JSON"}), 400
-
-        response = simulator._process_request(request_data)
-
-        return jsonify(response)
-
-    except Exception as e:
-        print(f"Error processing request: {e}")
-        return jsonify({"error": str(e)}), 500
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="llama-server simulator for testing eval scripts"
-    )
-    parser.add_argument(
-        "--port",
-        type=int,
-        default=8033,
-        help="Server port (default: 8033)"
-    )
-    parser.add_argument(
-        "--host",
-        type=str,
-        default="localhost",
-        help="Server host (default: localhost)"
-    )
-    parser.add_argument(
-        "--success-rate",
-        type=float,
-        default=0.8,
-        help="Success rate 0-1 (default: 0.8)"
-    )
-    parser.add_argument(
-        "--dataset-split",
-        type=str,
-        default="train",
-        help="AIME dataset split to use (default: train)"
-    )
-
-    args = parser.parse_args()
-
-    global simulator
-    simulator = Simulator(
-        port=args.port,
-        host=args.host,
-        success_rate=args.success_rate,
-        dataset_split=args.dataset_split
-    )
-
-    print("\n=== llama-server-simulator ===")
-    print(f"Server running on http://{args.host}:{args.port}")
-    print(f"Success rate: {args.success_rate}")
-    print(f"AIME dataset loaded: {len(simulator.dataset.questions)} questions")
-    print("\nPress Ctrl+C to stop\n")
-
-    app.run(host=args.host, port=args.port, debug=False)
-
-if __name__ == "__main__":
-    main()
--- a/examples/llama-eval/test-simulator.sh
+++ b/examples/llama-eval/test-simulator.sh
@@ -1,86 +0,0 @@
-#!/bin/bash
-
-set -e
-
-# Get the directory where this script is located
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-
-echo "=== llama-server-simulator Test Script ==="
-echo ""
-
-PORT=8033
-SUCCESS_RATE=0.8
-TEST_PORT=8034
-
-echo "Starting simulator on port $PORT with success rate $SUCCESS_RATE..."
-source "$SCRIPT_DIR/venv/bin/activate"
-python3 "$SCRIPT_DIR/llama-server-simulator.py" --port $PORT --success-rate $SUCCESS_RATE > /tmp/simulator-test.log 2>&1 &
-SIMULATOR_PID=$!
-
-echo "Waiting for simulator to start..."
-sleep 5
-
-# Helper function to make a request and extract the answer
-make_request() {
-  local question="$1"
-  curl -s -X POST http://localhost:$PORT/v1/chat/completions \
-    -H "Content-Type: application/json" \
-    -d "{
-      \"model\": \"llama\",
-      \"messages\": [
-        {\"role\": \"user\", \"content\": \"$question\"}
-      ],
-      \"temperature\": 0,
-      \"max_tokens\": 2048
-    }" | python3 -c "import sys, json; data = json.load(sys.stdin); print(data.get('choices', [{}])[0].get('message', {}).get('content', data.get('error', 'No response')))"
-}
-
-# Test question (repeated in multiple tests)
-TEST_QUESTION="Quadratic polynomials P(x) and Q(x) have leading coefficients 2 and -2, respectively. The graphs of both polynomials pass through the two points (16,54) and (20,53). Find P(0) + Q(0)."
-
-echo ""
-echo "=== Test 1: Correct Answer ==="
-echo "Sending request with known question..."
-answer=$(make_request "$TEST_QUESTION")
-echo "Answer: $answer"
-echo "Expected: 116"
-echo "Correct: $([ "$answer" == "116" ] && echo "Yes" || echo "No")"
-
-echo ""
-echo "=== Test 2: Wrong Answer ==="
-echo "Sending request with known question (success rate 0.0)..."
-answer=$(make_request "$TEST_QUESTION")
-echo "Answer: $answer"
-echo "Expected: 116"
-echo "Correct: $([ "$answer" == "116" ] && echo "Yes" || echo "No")"
-
-echo ""
-echo "=== Test 3: No Matching Question ==="
-echo "Sending request with non-matching text..."
-response=$(make_request "What is the capital of France?")
-echo "Response: $response"
-echo "Expected: No matching question found"
-echo "Correct: $([ "$response" == "No matching question found" ] && echo "Yes" || echo "No")"
-
-echo ""
-echo "=== Test 4: Success Rate Verification ==="
-echo "Sending 10 requests to test success rate..."
-correct_count=0
-for i in {1..10}; do
-  answer=$(make_request "$TEST_QUESTION")
-  if [ "$answer" == "116" ]; then
-    correct_count=$((correct_count + 1))
-  fi
-  echo "  Request $i: Answer = $answer"
-done
-echo "Correct answers: $correct_count/10"
-echo "Expected: ~8/10 (80% success rate)"
-echo "Success rate: $(echo "scale=1; $correct_count * 10" | bc)%"
-
-echo ""
-echo "=== Test Complete ==="
-echo "Stopping simulator..."
-kill $SIMULATOR_PID 2>/dev/null
-wait $SIMULATOR_PID 2>/dev/null || true
-
-echo "Simulator stopped."
--- a/ggml/CMakeLists.txt
+++ b/ggml/CMakeLists.txt
@@ -4,7 +4,7 @@ project("ggml" C CXX ASM)
 ### GGML Version
 set(GGML_VERSION_MAJOR 0)
 set(GGML_VERSION_MINOR 9)
-set(GGML_VERSION_PATCH 5)
+set(GGML_VERSION_PATCH 7)
 set(GGML_VERSION_BASE "${GGML_VERSION_MAJOR}.${GGML_VERSION_MINOR}.${GGML_VERSION_PATCH}")

 find_program(GIT_EXE NAMES git git.exe NO_CMAKE_FIND_ROOT_PATH)
--- a/scripts/sync-ggml.last
+++ b/scripts/sync-ggml.last
@@ -1 +1 @@
-a8db410a252c8c8f2d120c6f2e7133ebe032f35d
+d6754f3d0e6d0acd21c12442353c9fd2f94188e7
Author	SHA1	Message	Date
Georgi Gerganov	ff4affb4c1	sync : ggml	2026-02-15 22:24:29 +02:00
Georgi Gerganov	55d58599c8	ggml : bump version to 0.9.7 (ggml/1425)	2026-02-15 22:24:29 +02:00
Georgi Gerganov	1a8c700bfd	ggml : bump version to 0.9.6 (ggml/1423)	2026-02-15 22:24:29 +02:00