Files
ik_llama.cpp/ggml
Iwan Kawrakow 2a5552830b MoE improvements on Metal
This version beats mainline, there are things I don't understand:
* Mianline has effectively gone to GEMV for MUL_MAT_ID. We can do the
  same, but we are 30% slower. Why?
* Using actual GEMM, we beat mainline with ubtach size of 128. But then
  performance degrades. Why?
2025-04-02 15:26:19 +02:00
..
2024-07-27 07:55:01 +02:00
2025-04-02 15:26:19 +02:00
2024-07-27 07:55:01 +02:00