Files
ik_llama.cpp/src
Iwan Kawrakow 8b3c66063f Apply platform specific modifications when repacking
E.g., on NEON it is useful to pre-apply q ^ 0x88 to q4_0.
This results in a ~3% performance improvement.
Hence,
* Changed the signature of the repack_X functions to take a
  bool argument indicating if the repacking is done online and,
  if so, apply modifications as appropriate while repacking.
* Added iqk_modify_tensor to apply modifications to models that
  have already been repacked while loading the model. Caveat:
  just like rtr, this needs to have mmap disabled (else one would
  need to move the data to a not mmap-ed buffer, so much more
  complicated).
2025-01-27 11:12:18 +02:00
..
2024-07-27 07:55:01 +02:00
2024-09-28 17:59:47 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-01-23 18:24:10 +02:00
2024-07-27 07:55:01 +02:00