Kawrakow
fcc2df11df
Adding ministral3: this seems to work ( #1030 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-12-03 11:01:21 +01:00
Kawrakow
3008fdf0b6
Allow distinct output tensor for Gemma models ( #969 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-11-16 12:12:41 +02:00
Kawrakow
9e2b21fbc9
DeepSeek: enable option to merge Q and K tensors ( #941 )
...
* Merge Q and K for DeepSeek
* Formatting
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-11-14 08:23:04 +02:00
Kawrakow
e4145c013f
Add support for SmolLM3 ( #934 )
...
* Convert from HF
* Model loading and compute graph
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-11-10 15:40:12 +02:00
firecoperana
0378f38c27
model : Port Minimax M2 from mainline ( #907 )
...
Co-authored-by: firecoperana <firecoperana>
2025-11-06 18:09:24 +02:00
Thireus ☠
5536e99d42
Port of Qwen3-VL support from mainline ( #883 )
...
* Port of Qwen3-VL for latest ik_llama.cpp
- convert_hf_to_gguf.py - Not touched, use llama.cpp to convert model instead
- sysl and metal support for imrope not added
- Vulkan support for imrope not tested
- Code not tested
* Bugfix n_embd was declared multiple times
https://github.com/ikawrakow/ik_llama.cpp/pull/883#issuecomment-3471179655
* Fix n_embd issue with qwen3vl
* model.output tensor not required
https://github.com/ikawrakow/ik_llama.cpp/pull/883#discussion_r2480388389
* Improved logic for qkv combined tensors
59ceaf8fcb (r2480395800)
59ceaf8fcb (r2480398187)
* Fix n_embd for merge_qkv() + cleaner code
https://github.com/ikawrakow/ik_llama.cpp/pull/883#discussion_r2481227395
* Revert TENSOR_NOT_REQUIRED
2025-11-04 19:20:54 +02:00
firecoperana
c7dbe3f2c1
Disable pipeline parallel for tensor override or allocation failed ( #879 )
...
* disable pipeline parallelism when tensor override present
* disable pipeline parallel if allocation failed
---------
Co-authored-by: firecoperana <firecoperana>
2025-10-31 14:20:48 +02:00
Kawrakow
9d364b88ba
Adding Ling/Ring (a.k.a., Bailing-MoE2) support ( #833 )
...
* Adding Ling/Ring (a.k.a., Bailing-MoE2)
* Add expert group selection (not working, so turned off)
* BailingMoE2 conversion
* WIP
* Bits and pieces
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-10-15 14:20:40 +03:00
Kawrakow
335a1f9b71
Refactor file llama.cpp ( #823 )
...
* llama_model and llama_hparams
* llama_build_context
Surprisingly small reduction in llama.cpp compile time given
the reduction in LOCs (22k -> 14k)
* LLM_TN
llama.cpp compilation: 50 s -> 33 s
* llama_quantize
* arch names
* All graph building is now in llm-build-context.cpp
* hparams loading
llama.cpp is now just 9300 LOC, but still takes 32 seconds to compile.
* We are now at 6 seconds to build the src folder
* load -> create
We are not actually loading the tensors, but just creating them.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-10-11 11:35:20 +03:00