* Merging mainline - WIP
* Merging mainline - WIP
AVX2 and CUDA appear to work.
CUDA performance seems slightly (~1-2%) lower as it is so often
the case with llama.cpp/ggml after some "improvements" have been made.
* Merging mainline - fix Metal
* Remove check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
* logging: add proper checks for clang to avoid errors and warnings with VA_ARGS
* build: add CMake Presets and toolchian files for Windows ARM64
* matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings
* ci: add support for optimized Windows ARM64 builds with MSVC and LLVM
* matmul-int8: fixed typos in q8_0_q8_0 matmuls
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* matmul-int8: remove unnecessary casts in q8_0_q8_0
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>