ik_llama.cpp/396 - Best settings for Maverick - Dual CPU Xeon 8480_ - RTX 3090.md at eaa2510a28b60d43c2210c69cefdf750d5cc119f - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

1.7 KiB

Raw Blame History

🗣️ #396 - Best settings for Maverick - Dual CPU Xeon 8480+ - RTX 3090

Author	`justinjja`
Created	2025-05-07
Updated	2025-05-08

Description

With a single 8480+ and a 3090 I get excellent speeds ~40 T/s on Maverick After installing a second cpu and another 8 sticks of ram I cant get good speeds. numa distribute gives ~27 T/s numa isolate (and -t 56) is even slower at ~10 T/s (With cache cleared between tests)

This is with Sub-NUMA Clustering disabled, so only 2 numa nodes total.

Any recommendations for settings that will get over 40 T/s? Do I not understand what numa isolate does? I thought that would be the same as a single CPU.

llama-server -m Maverick-UD-IQ4_XS.gguf -c 32000 -fa -fmoe -amb 512 -rtr -ctk q8_0 -ctv q8_0 -ngl 99 -ot ".ffn_._exps.*=CPU" --numa isolate -t 56

🗣️ Discussion

👤 justinjja replied the 2025-05-08 at 01:11:10:

Small update,

I replaced --numa isolate with --numa numactl and added: numactl --physcpubind=0-55,112-167 --membind=0 before my command

This does what I thought isolate would do. I'm back at 40 T/s

Still no luck finding settings that actually both cpus.

👤 ikawrakow replied the 2025-05-08 at 08:26:39:

There have been a lot of discussions around the Internet about llama.cpp performance on dual-socket systems, and the conclusion appears to be that the best one can do is to just use one physical CPU.

I don't have access to a dual socket system, so have done nothing related to NUMA in ik_llama.cpp. Hence, being a fork of llama.cpp, I expect it to behave the same.

1.7 KiB Raw Blame History

🗣️ #396 - Best settings for Maverick - Dual CPU Xeon 8480+ - RTX 3090

Description

🗣️ Discussion

1.7 KiB

Raw Blame History