ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-19 04:40:09 +00:00

Files

Kawrakow 7e5af2073c Faster MoE inference (#112 )

* multi_sdd: WIP

* multi_sdd: CPU works

* multi_add: CUDA

* multi_add: simplify

* multi_add: Metal

* Metal: speed up mul_mat_id

For the Granite-1B MoE model PP-512 goes from
156 t/s to 890 t/s, so nearly a 6X speedup!

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2024-10-31 12:05:27 +01:00

CMakeLists.txt

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

llama-grammar.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

llama-grammar.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

llama-impl.h

Time to fix replace_all (#68 )