ik_llama.cpp/examples/perplexity/perplexity.cpp at b9daa401d76221a568dd8a9ce2497b80e469d378

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-08 04:50:13 +00:00

Files

Iwan Kawrakow b9daa401d7 Be able to compute for more than 65535 tokens

On CUDA just a quick hack that allows us to cancatenate tensors
with more than 65535 rows along zroth dimension as needed by
FlashMLA-2. Also needed some care in the perplexity tool to
avoid int overflows when evaluating the computed logits.

2025-03-17 12:04:52 +02:00

79 KiB

Raw Blame History

View Raw

79 KiB Raw Blame History

79 KiB

Raw Blame History