mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-09 05:20:01 +00:00
Needed to model different head sizes for different
LLMs, batch sizes that are not a multiple of 8, stc.
I see 2-3% performance degradation.
It is one of those things
that I don't understand, but really would like to:
I have an implementation of a function that depends in a compile time
constant. I get performance X.
I then turn the implementation into a template, where the former
compile time constant is a template parameter, and I instantiate the template
for a bunch of different values, one of which is the former compile
time constants. I observe performance c*X, where c almost always is
less than 1, and depending on how unlucky we get, it can be as low
as 0.5 or somesuch. But in my simple-minded understanding, I expect
the template instantiation with the former compile time constant
to turn into the exact same function as the former non-templated
implementation, and so I expect the exact same performance.
i.e., if I have some function
void some_function(...) {
constexpr int N = 128;
... // code that depends on N
}
and I now write
template <int N>
void some_function_T(...) {
... // same code as in some_function() that depends on N
}
and I say
void wrapper_function(int N) {
switch (N) {
case 64: some_function_T< 64>(); break;
case 128: some_function_T<128>(); break;
...
}
}
I expect wrapper_function(128) to have the exact same performance as
some_function() (run time of some_function() is long enough to have the
additional function call overhead be completely negligible).
This is the reason I'm using a template in the first place instead
of just having void some_function(int N).
But no. Tough luck.