ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-08 23:40:10 +00:00

Files

Kawrakow 7a41b3b1f5 Various fused ops around expert selection (#840 )

* Fuse sigmoid+add+grouped_topk+get_rows (CPU)

* Fix CPU + CUDA

but CUDA is somehow not 100% correct as I get a slightly different
PPL (lower!)

* Minor

* Fuse sigmoid+add+topk+get_rows (CUDA)

* Fuse sigmoid+add+topk+get_rows (CPU)

* Fuse topk+view+get_rows+reshape+softmax (CPU)

* Fuse topk+view+get_rows+reshape+softmax (CUDA)

* cpu: turn off the openai topk fusing for now

Something is not right and I don't see the bug.
On the CPU one doesn't gain much if anything, so not a big loss.

* Also fuse sum_rows and div

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-10-19 19:02:46 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Grouped expert routing (CPU only) (#836 )

2025-10-16 14:57:02 +03:00

src

Various fused ops around expert selection (#840 )

2025-10-19 19:02:46 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Set default value of GGML_SCHED_MAX_COPIES to 1 (#751 )

2025-09-02 07:04:39 +02:00