Files
ik_llama.cpp/ggml
Kawrakow 81ea911f0d Graph parallel for Step-3.5-Flash (#1236)
* WIP

* This works but is slow

* Turn off the up / gate clamps for now

* OK we need the clamping

* Fuse the clamp (CUDA)

* Fuse the clamp (CPU)

* WIP

* Be able to use merged q, k, v

* Be able to use merged up/gate experts

* Fuse the clamp (CUDA mmvq)

* WIP: graph parallel for Step-3.5

* WIP

* This should be it

* Cleanup

* Fix merge
2026-02-06 06:56:51 +02:00
..
2024-07-27 07:55:01 +02:00
2026-01-22 13:20:23 +02:00
2024-07-27 07:55:01 +02:00