Merge mainline llama.cpp (#3)

* Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2026-02-06 22:40:09 +00:00 · 2024-07-27 07:55:01 +02:00
parent 0684c3e9c7
commit 154e0d75fc
612 changed files with 50817 additions and 165936 deletions
--- a/tests/test-double-float.cpp
+++ b/tests/test-double-float.cpp
@@ -14,7 +14,7 @@
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wdouble-promotion"

-// ggml.c::quantize_row_q4_0_reference
+// ggml.c::quantize_row_q4_0_ref
 inline static uint8_t round_orig(float v0) { return ((int8_t) (round(v0))) + 8; }

 // ggml.c::ggml_silu_f32
@@ -24,7 +24,7 @@ inline static float silu_orig(float x) {

 #pragma GCC diagnostic pop

-// ggml.c::quantize_row_q4_0_reference
+// ggml.c::quantize_row_q4_0_ref
 inline static uint8_t round_float(float v0) { return (int8_t)roundf(v0) + 8; }

 // ggml.c::ggml_silu_f32