turboderp
9244003a40
Add support for Mistral 3.1 VLM
2025-04-18 22:47:47 +02:00
turboderp
68f7461985
Optional attn bias for GLM4
2025-04-16 01:24:45 +02:00
turboderp
6a5d303355
Merge remote-tracking branch 'origin/dev' into dev
2025-04-15 18:57:47 +02:00
turboderp
de19cbcc59
Add GLM4 architecture
2025-04-15 18:57:29 +02:00
MikeRoz47
61450b4860
concatenate the sin and cos tensors ( #758 )
2025-04-11 22:11:13 +02:00
turboderp
b148bb42b8
Fix Gemma3 head norm (RMS)
2025-04-11 00:18:06 +02:00
turboderp
d471d44f01
Gemma3 local RoPE fixes
2025-04-10 22:16:08 +02:00
turboderp
a03db457ef
Fix: Prioritize default head_dim when provided by architecture (Gemma3) over computed head_dim
2025-03-15 11:52:51 +01:00
turboderp
385a5162ba
Fix: Correctly read query_pre_attn_scalar from text_config (Gemma3)
2025-03-15 11:01:33 +01:00
turboderp
17762c177f
Merge remote-tracking branch 'origin/dev' into dev
2025-03-15 01:37:43 +01:00
turboderp
6f7623ff0e
Update examples
2025-03-15 01:30:52 +01:00
turboderp
77a1e2cb0c
Warn instead of failing for unsupported vision model
2025-03-15 00:13:52 +01:00
turboderp
578fd4234f
Support Gemma3 (vision)
2025-03-15 00:13:19 +01:00
turboderp
c0267e37fe
Support Gemma3 (text)
2025-03-15 00:06:56 +01:00
turboderp
565339101b
Allow text model to use Q/K norm while vision model doesn't
2025-03-15 00:06:56 +01:00
turboderp
07afc90788
Tensor renaming kludge (Gemma3 has one _weight tensor)
2025-03-15 00:06:56 +01:00
turboderp
e2fa480595
Auto expand Q/K norm weight to match number of heads
2025-03-15 00:06:56 +01:00
turboderp
a88c18cac1
Add architecture-specific config defaults (Gemma3 config.json is incomplete)
2025-03-15 00:06:56 +01:00
turboderp
b6c1912f29
Respect norm_constant_bias in Q/K norms (Gemma3)
2025-03-15 00:06:56 +01:00
turboderp
4b5dbecdc1
Allow key prefix for lm_head (Gemma3)
2025-03-15 00:06:56 +01:00
turboderp
4844f3873c
Upcast MM embeddings when residual is FP32
2025-03-15 00:06:56 +01:00
turboderp
fe51a8f4b5
Correctly include Q/K norms when compiling model
2025-03-15 00:06:56 +01:00
turboderp
38f4d7c87d
Allow loading transposed unquantized linear layer
2025-03-15 00:06:56 +01:00
turboderp
9669fa33c9
Allow component models to use learned pos embeddings without regarding LLM max_seq_len
2025-03-15 00:06:56 +01:00
turboderp
7b05acd233
Allow per-layer RoPE theta
2025-03-15 00:06:56 +01:00
turboderp
23395dfa42
Fix FP32 residual for paged attn
2025-03-14 23:09:31 +01:00
Thomas
eaf8ad1041
Update chat.py, include multi-line input support and context clearing through input ( #738 )
...
* Update chat.py, include multi-line input support and context clearing
- Enable multi-line input (mli) support through the -mli argument. When using mli, end input with the EOF char (return/Ctrl+D on Unix, return/Ctrl+Z/return on Windows)
- Allow context clearing outside of amnesia by inputting "clear"
* Adding qwq chat mode, adding the ability to forget thinking context
2025-03-10 15:28:33 +01:00
turboderp
d8fa1a8250
Support partial_rotary_factor (Phi-4 mini)
2025-02-28 08:51:11 +01:00
turboderp
2e630aefdd
Fix alt pos embeddings and block diagonal mask when flash-attn is disabled
2025-02-13 22:13:48 +01:00
turboderp
6e4a84a1e3
Bump to 0.2.8
2025-02-08 00:26:30 +01:00
turboderp
d05fbcc854
Fix Pixtral regression
2025-02-04 21:01:23 +01:00
turboderp
96b2f9df77
Add Qwen2.5 mode to grounding demo
2025-01-29 22:41:36 +01:00
turboderp
cce6f95cd3
Initial support for Qwen2.5-VL
2025-01-29 03:03:36 +01:00
turboderp
d0413b06f8
Check length of gpu_split in model_init
2025-01-09 11:36:25 +01:00
turboderp
c8fa853c89
Test script: Allow --eval_rows in wiki2 ppl test
2025-01-09 11:14:48 +01:00
turboderp
318435db81
Sampler: Remove superfluous pre-sort pass
2025-01-09 11:14:19 +01:00
turboderp
d302fa3d37
Optimizer: Ensure weight budget is fully used up
2025-01-09 11:14:03 +01:00
turboderp
b400394f06
Update build actions
2025-01-09 11:13:03 +01:00
turboderp
ae241a9af5
Fix video example
v0.2.7
2024-12-30 02:24:49 +01:00
turboderp
1ef618389b
Bump to v0.2.7
2024-12-30 02:19:19 +01:00
turboderp
b010cb950f
Fix compilation errors on aarch64
2024-12-29 20:30:59 +01:00
turboderp
fb5000ac62
Don't compile AVX2 functions when building without AVX2 support
2024-12-29 19:05:54 +01:00
turboderp
82bb648517
Fix Granite3 logit scaling
2024-12-27 19:54:19 +01:00
turboderp
bee449d116
Support Granite 3.x arch
2024-12-27 19:11:21 +01:00
turboderp
ab4d9e15eb
Chat example Granite3 template
2024-12-27 18:32:46 +01:00
turboderp
ebfefc4bed
Support Cohere2 architecture
2024-12-25 20:14:45 +01:00
turboderp
d815f5f9e1
Fix RoPE alpha after refactor in #4d25874
2024-12-25 18:09:11 +01:00
nintwentydo
b2dd5a7e06
Modify handling for Pixtral Large model params ( #701 )
...
* Modify handling for Pixtral Large model params.
* Fix multimodal_projector_bias to default to True if not in model config.json
2024-12-21 19:58:41 +01:00
turboderp
cf7fcd18d2
Fix chat example system prompt
2024-12-18 07:52:09 +01:00
turboderp
f76bc8537a
Read number of vision tower layers from config for Pixtral (fix Pixtral-Large)
2024-12-18 01:29:20 +01:00