Thireus ☠
d65d5fe29e
Add support for GLM-4.5 models ( #668 )
...
* GLM-4.5
* GLM-4.5
* GLM-4.5
* convert_hf_to_gguf.py compatibility bugfix with GLM-4.5
From @ubergarm - https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3145913701
* Add ubergarm comments + my own
* Revert to llama.cpp script version that produced good BF16
See: https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3147374559
* Support for jinja chat templates
See https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3148109962
* GLM-4.5 llama.cpp final port
* Handle TENSOR_SKIP
Ported the hanges from:
f129567dc0
dcbbd2cb05
Except op info since ik_llama.cpp doesn't support this operation.
* Bugfix for TENSOR_SKIP
skip loading if a tensor has the TENSOR_SKIP flag - @ubergarm via https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3155297198
* Update llama.cpp
Restore original GGLM_ASSERT
* Fix chat template detection
Changes suggested by @ubergarm - https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3155927840
* Revert to original GGML_ASSERT
2025-08-07 07:55:00 +03:00
Aleksey Nikiforov
da8998c6c6
Ported kimi-k2 support from llama.cpp ( #609 )
...
Original patch by @gabriellarson:
https://github.com/ggml-org/llama.cpp/pull/14654
Co-authored-by: anikifoss <anikifoss>
2025-07-14 18:43:52 +02:00
ubergarm
db49223e8c
add hunyuan moe support for 561 ( #565 )
...
* add hunyuan moe
* Don't reshape Vcur
* Apply chat template fix from mainline PR14584
2025-07-09 10:29:40 +02:00
Fizz~
27ff5bf57e
Special handling of Seed Coder FIM tokens ( #585 )
...
* Special handling of Seed Coder FIM tokens
* vocab: Add Seed Coder pretokenizer
* Formatting fix
* Update llama.h
2025-07-06 12:13:55 +02:00
Kawrakow
c8e6d9cfe7
Add Falcon-Edge support ( #555 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-06-26 08:48:52 +02:00
firecoperana
d1f92e24d3
add dry sampler ( #513 )
...
* add dry sampler
* use vocab instead of model in dry_init function
* fix compile error for build test
---------
Co-authored-by: firecoperana <firecoperana>
2025-06-19 10:24:53 +03:00
Kawrakow
5c127b279f
LlaMA-4 support (text only) ( #321 )
...
* llama4: WIP
* llama4: this seems to be working
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2025-04-10 09:05:21 +02:00
saood06
5c0a01bdaf
Deepseek V3 support added ( #176 )
...
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com >
2025-01-23 18:24:10 +02:00
Kawrakow
1a4cfbcc53
Merge mainline - Aug 12 2024 ( #17 )
...
* Merge mainline
* Fix after merge
* Remove CI check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2024-08-12 15:14:32 +02:00
Kawrakow
0ceeb11721
Merge mainline llama.cpp ( #3 )
...
* Merging mainline - WIP
* Merging mainline - WIP
AVX2 and CUDA appear to work.
CUDA performance seems slightly (~1-2%) lower as it is so often
the case with llama.cpp/ggml after some "improvements" have been made.
* Merging mainline - fix Metal
* Remove check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
2024-07-27 07:55:01 +02:00