* llama_model and llama_hparams
* llama_build_context
Surprisingly small reduction in llama.cpp compile time given
the reduction in LOCs (22k -> 14k)
* LLM_TN
llama.cpp compilation: 50 s -> 33 s
* llama_quantize
* arch names
* All graph building is now in llm-build-context.cpp
* hparams loading
llama.cpp is now just 9300 LOC, but still takes 32 seconds to compile.
* We are now at 6 seconds to build the src folder
* load -> create
We are not actually loading the tensors, but just creating them.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>