ik_llama.cpp/564 - Maybe an interesting CUDA PR here..md at main - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

952 B

Raw Permalink Blame History

🗣️ #564 - Maybe an interesting CUDA PR here.

Author	`Nexesenex`
Created	2025-06-29
Updated	2025-07-01

Description

Title : Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance. #11867 Link : https://github.com/ggml-org/llama.cpp/pull/11867 Author : @Aendk Use : a few % boost on Cuda PP and TG?

🗣️ Discussion

👤 ikawrakow replied the 2025-07-01 at 13:56:23:

Yes, I saw this PR. But to quote Diego's statement in the PR discussion

I still think that this change adds a significant amount of complexity, to code that is already too fragile and complex to reasonably maintain.

I fully agree with that. The back-end is really fragile, so performance gains must be way more than 2-3% to warrant a change such as that one.

952 B Raw Permalink Blame History

🗣️ #564 - Maybe an interesting CUDA PR here.

Description

🗣️ Discussion

952 B

Raw Permalink Blame History