ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-25 07:34:10 +00:00

Files

Iwan Kawrakow 8b3c66063f Apply platform specific modifications when repacking

E.g., on NEON it is useful to pre-apply q ^ 0x88 to q4_0.
This results in a ~3% performance improvement.
Hence,
* Changed the signature of the repack_X functions to take a
  bool argument indicating if the repacking is done online and,
  if so, apply modifications as appropriate while repacking.
* Added iqk_modify_tensor to apply modifications to models that
  have already been repacked while loading the model. Caveat:
  just like rtr, this needs to have mmap disabled (else one would
  need to move the data to a not mmap-ed buffer, so much more
  complicated).

2025-01-27 11:12:18 +02:00

CMakeLists.txt

Be able to repack tensors at run time (#147 )

2024-12-17 14:16:34 +01:00

llama-grammar.cpp

Merge mainline - Aug 12 2024 (#17 )

2024-08-12 15:14:32 +02:00

llama-grammar.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

llama-impl.h

Time to fix replace_all (#68 )