Files
ik_llama.cpp/ggml/src
Kawrakow f90d1fdd06 Split mode "graph" for Cohere2 (#1061)
* This works and TG is descent, but PP is low

* Better

* Apply f_logit_scale before mul mat with output tensor

* This is better for PP: 600 t/s -> 700 t/s

* To not lose this again

* WIP

* Equal split

* WIP

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-12-13 20:30:08 +01:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-11-24 06:55:14 +01:00
2024-07-27 07:55:01 +02:00
2025-08-09 08:40:18 +03:00
2025-08-09 08:40:18 +03:00
2025-08-09 08:40:18 +03:00
2025-08-27 08:03:47 +03:00
2025-12-13 20:30:08 +01:00