ik_llama.cpp/examples/perplexity/perplexity.cpp at b147e31f5a2fddae9a3cd05ae449ccde8fdfd015

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-08 04:50:13 +00:00

Files

Iwan Kawrakow b9daa401d7 Be able to compute for more than 65535 tokens

On CUDA just a quick hack that allows us to cancatenate tensors
with more than 65535 rows along zroth dimension as needed by
FlashMLA-2. Also needed some care in the perplexity tool to
avoid int overflows when evaluating the computed logits.

2025-03-17 12:04:52 +02:00

79 KiB

Raw Blame History

View Raw

79 KiB Raw Blame History

79 KiB

Raw Blame History