Commit Graph

1444 Commits

Author SHA1 Message Date
turboderp
4b5dbecdc1 Allow key prefix for lm_head (Gemma3) 2025-03-15 00:06:56 +01:00
turboderp
4844f3873c Upcast MM embeddings when residual is FP32 2025-03-15 00:06:56 +01:00
turboderp
fe51a8f4b5 Correctly include Q/K norms when compiling model 2025-03-15 00:06:56 +01:00
turboderp
38f4d7c87d Allow loading transposed unquantized linear layer 2025-03-15 00:06:56 +01:00
turboderp
9669fa33c9 Allow component models to use learned pos embeddings without regarding LLM max_seq_len 2025-03-15 00:06:56 +01:00
turboderp
7b05acd233 Allow per-layer RoPE theta 2025-03-15 00:06:56 +01:00
turboderp
23395dfa42 Fix FP32 residual for paged attn 2025-03-14 23:09:31 +01:00
Thomas
eaf8ad1041 Update chat.py, include multi-line input support and context clearing through input (#738)
* Update chat.py, include multi-line input support and context clearing

- Enable multi-line input (mli) support through the -mli argument. When using mli, end input with the EOF char (return/Ctrl+D on Unix, return/Ctrl+Z/return on Windows)
- Allow context clearing outside of amnesia by inputting "clear"

* Adding qwq chat mode, adding the ability to forget thinking context
2025-03-10 15:28:33 +01:00
turboderp
d8fa1a8250 Support partial_rotary_factor (Phi-4 mini) 2025-02-28 08:51:11 +01:00
turboderp
2e630aefdd Fix alt pos embeddings and block diagonal mask when flash-attn is disabled 2025-02-13 22:13:48 +01:00
turboderp
1a80d38891 Update build actions 2025-02-08 03:29:05 +01:00
turboderp
f1c4126045 Update build actions 2025-02-08 03:14:32 +01:00
turboderp
f98a7b7099 Update build actions 2025-02-08 03:08:45 +01:00
turboderp
096076b3fd Update build actions 2025-02-08 02:49:21 +01:00
turboderp
0f4a9f0042 Update build actions 2025-02-08 01:36:51 +01:00
turboderp
f3de3cbd34 Update build actions 2025-02-08 01:05:22 +01:00
turboderp
94e57904bc Update build actions v0.2.8 2025-02-08 00:57:29 +01:00
turboderp
3a9618d471 Update build actions 2025-02-08 00:44:44 +01:00
turboderp
3486f9eb71 Merge branch 'refs/heads/dev' 2025-02-08 00:26:52 +01:00
turboderp
6e4a84a1e3 Bump to 0.2.8 2025-02-08 00:26:30 +01:00
turboderp
d05fbcc854 Fix Pixtral regression 2025-02-04 21:01:23 +01:00
turboderp
96b2f9df77 Add Qwen2.5 mode to grounding demo 2025-01-29 22:41:36 +01:00
turboderp
cce6f95cd3 Initial support for Qwen2.5-VL 2025-01-29 03:03:36 +01:00
turboderp
d0413b06f8 Check length of gpu_split in model_init 2025-01-09 11:36:25 +01:00
turboderp
c8fa853c89 Test script: Allow --eval_rows in wiki2 ppl test 2025-01-09 11:14:48 +01:00
turboderp
318435db81 Sampler: Remove superfluous pre-sort pass 2025-01-09 11:14:19 +01:00
turboderp
d302fa3d37 Optimizer: Ensure weight budget is fully used up 2025-01-09 11:14:03 +01:00
turboderp
b400394f06 Update build actions 2025-01-09 11:13:03 +01:00
turboderp
b9c025b4b1 Enable large runner 2024-12-30 05:47:11 +01:00
turboderp
c41acd5c11 Extra ROCm 6.2 actions 2024-12-30 04:31:44 +01:00
turboderp
7c08c6df71 Deactivate mamba 2024-12-30 04:07:50 +01:00
turboderp
c8075cabf4 Update conda-incubator 2024-12-30 04:00:15 +01:00
turboderp
ae241a9af5 Fix video example v0.2.7 2024-12-30 02:24:49 +01:00
turboderp
1ef618389b Bump to v0.2.7 2024-12-30 02:19:19 +01:00
turboderp
b010cb950f Fix compilation errors on aarch64 2024-12-29 20:30:59 +01:00
turboderp
fb5000ac62 Don't compile AVX2 functions when building without AVX2 support 2024-12-29 19:05:54 +01:00
turboderp
82bb648517 Fix Granite3 logit scaling 2024-12-27 19:54:19 +01:00
turboderp
bee449d116 Support Granite 3.x arch 2024-12-27 19:11:21 +01:00
turboderp
ab4d9e15eb Chat example Granite3 template 2024-12-27 18:32:46 +01:00
turboderp
ebfefc4bed Support Cohere2 architecture 2024-12-25 20:14:45 +01:00
turboderp
d815f5f9e1 Fix RoPE alpha after refactor in #4d25874 2024-12-25 18:09:11 +01:00
nintwentydo
b2dd5a7e06 Modify handling for Pixtral Large model params (#701)
* Modify handling for Pixtral Large model params.

* Fix multimodal_projector_bias to default to True if not in model config.json
2024-12-21 19:58:41 +01:00
turboderp
cf7fcd18d2 Fix chat example system prompt 2024-12-18 07:52:09 +01:00
turboderp
f76bc8537a Read number of vision tower layers from config for Pixtral (fix Pixtral-Large) 2024-12-18 01:29:20 +01:00
turboderp
4061c24373 Qwen2-VL: Basic video support 2024-12-15 23:32:41 +01:00
turboderp
c78d9027aa Fix ChatML template in multimodal example 2024-12-15 21:38:40 +01:00
turboderp
9934f06442 Refactoring 2024-12-15 21:37:29 +01:00
turboderp
edf1a3575a Util function to get byte size of MM embeddings object 2024-12-09 23:22:42 +01:00
turboderp
254e76b178 Merge remote-tracking branch 'origin/dev' into dev 2024-12-09 20:15:23 +01:00
turboderp
8bb283d319 Cleanup build actions 2024-12-09 20:14:35 +01:00