Files
ik_llama.cpp/examples
Kawrakow 4a73c25002 Various (#181)
* Adding gp option to llama-bench

Similar to pg, but it only looks at TG speed with a given
prompt length.

* Make q8_0_r4 work with tensor row sizes that are not a multiple of 128

They still need to be divisible by 32.

* Make q8_0_r4 work with tensor row sizes that are not a multiple of 128

.. on NEON

* Make q8_0_r4 work with tensor row sizes that are not a multiple of 128

.., on AVX2

* Make q4_0_r4 work with tensor row sizes that are not a multiple of 128

.., on AVX2

* Make q4_0_r4 work with tensor row sizes that are not a multiple of 128

... on NEON

* Make q4_0_r4 work with tensor row sizes that are not a multiple of 128

... on Zen4.

Also fix q8_0 K-cache for head sizes that are not multiple of 128.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-01-29 14:05:41 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-08-12 15:14:32 +02:00
2025-01-29 14:05:41 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-07-27 07:55:01 +02:00
2024-12-23 14:34:23 +01:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2024-08-12 15:14:32 +02:00
2023-03-29 20:21:09 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00