ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-24 23:24:13 +00:00

Files

Kawrakow cda24b58cb CPU FA improvements (#351 )

* FA: provide work buffer for K repacking

* Add header to avoid comp0iler warnings

* WIP

* WIP

* WIP

* WIP

* Slightly better

* WIP (Zen4)

* WIP

* Try to improve for unusual number of heads/number of threads

* Use mul_mat_qX_0_q8_2_Tx for q6_0 in FA

* Use mul_mat_qX_0_q8_2_Tx for q4_0 in FA

* Use Sum4q4 for q4_0

* WIP

* WIP

* Much better FA TG with q8_0 KV cache

Just repack it even for TG. But do the repacking for k_step rows,
not the whole K tensor.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-04-29 07:19:43 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Add copyright notices (#317 )

2025-04-07 10:43:26 +02:00

src

CPU FA improvements (#351 )

2025-04-29 07:19:43 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Compile time option to use bf16 for qunts without MMQ kernels (#261 )

2025-03-18 07:37:10 +01:00