Files
ik_llama.cpp/examples/perplexity/perplexity.cpp
Iwan Kawrakow b9daa401d7 Be able to compute for more than 65535 tokens
On CUDA just a quick hack that allows us to cancatenate tensors
with more than 65535 rows along zroth dimension as needed by
FlashMLA-2. Also needed some care in the perplexity tool to
avoid int overflows when evaluating the computed logits.
2025-03-17 12:04:52 +02:00

79 KiB