Give the user the option to override where model weights are stored (#232)

* Give the user the option to override where model weights are stored

* Fix ggml_nbytes() problem and cleanup

For a tensor with zero elements ggml_nbytes() was returning
uint64_t::max, and this was causing graph allocation failure.

* Add timing info to CUDA graph evaluation

* Add more timing info

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-02-25 17:55:58 +02:00
committed by GitHub
parent 547eee81d9
commit 94b659a2f1
9 changed files with 848 additions and 621 deletions

File diff suppressed because it is too large Load Diff