Commit Graph

1427 Commits

Author SHA1 Message Date
kingbri
3d9fde2dd0 Actions: Go back to VS 17.9
Increase compatability with Windows systems.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-07 13:57:25 -04:00
Brian
af04f9a393 Merge pull request #776 from kingbri1/master
Fix the build action
2025-04-26 13:30:47 -04:00
kingbri
be765fab7a Actions: Fix CUDA build
The pytorch friendly cuda version wasn't being pushed to GH outputs.

In addition, add a simpler CUDA install method for Windows.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>

Test

Actions: Add short cuda install
2025-04-26 01:23:22 -04:00
turboderp
abbece178e Temporary build action for Torch 2.7 2025-04-24 18:39:46 +02:00
Brian
263c758ae5 Actions: Update install methods (#775)
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>

Actions: Test
2025-04-24 18:35:36 +02:00
turboderp
0374e367aa Upgrade large runner actions to 22.04 2025-04-24 02:34:40 +02:00
turboderp
569dcb2ec6 Change wheels to CUDA 12.8.1 2025-04-24 02:00:42 +02:00
turboderp
68e2b92d79 Build on ubuntu-22.04, enable Hopper and Blackwell on Torch 2.7.0 wheels v0.2.9 2025-04-24 00:52:11 +02:00
Brian
7b2e4d8ddc Actions: Add Torch 2.7 (#773)
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-24 00:39:31 +02:00
RaahimSiddiqi
ea6cc68ac1 Added Fix for Loading unquantized Huggingface models. (#771)
* Added banned_strings parameter to the generator.

* Added Fix for loading unquantized huggingface models.

---------

Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com>
2025-04-24 00:38:10 +02:00
turboderp
3a90264940 Bump to v0.2.9 2025-04-24 00:36:39 +02:00
turboderp
2c170bb6c6 Gemma3 local RoPE fixes 2025-04-24 00:31:40 +02:00
turboderp
2d48ccd23e Merge branch 'dev' 2025-04-18 23:29:41 +02:00
turboderp
9244003a40 Add support for Mistral 3.1 VLM 2025-04-18 22:47:47 +02:00
turboderp
68f7461985 Optional attn bias for GLM4 2025-04-16 01:24:45 +02:00
turboderp
6a5d303355 Merge remote-tracking branch 'origin/dev' into dev 2025-04-15 18:57:47 +02:00
turboderp
de19cbcc59 Add GLM4 architecture 2025-04-15 18:57:29 +02:00
RaahimSiddiqi
09c18e9c47 Added banned_strings parameter to the generator. (#756)
Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com>
2025-04-11 22:12:17 +02:00
MikeRoz47
61450b4860 concatenate the sin and cos tensors (#758) 2025-04-11 22:11:13 +02:00
turboderp
b148bb42b8 Fix Gemma3 head norm (RMS) 2025-04-11 00:18:06 +02:00
turboderp
d471d44f01 Gemma3 local RoPE fixes 2025-04-10 22:16:08 +02:00
turboderp
a03db457ef Fix: Prioritize default head_dim when provided by architecture (Gemma3) over computed head_dim 2025-03-15 11:52:51 +01:00
turboderp
385a5162ba Fix: Correctly read query_pre_attn_scalar from text_config (Gemma3) 2025-03-15 11:01:33 +01:00
turboderp
17762c177f Merge remote-tracking branch 'origin/dev' into dev 2025-03-15 01:37:43 +01:00
turboderp
6f7623ff0e Update examples 2025-03-15 01:30:52 +01:00
turboderp
77a1e2cb0c Warn instead of failing for unsupported vision model 2025-03-15 00:13:52 +01:00
turboderp
578fd4234f Support Gemma3 (vision) 2025-03-15 00:13:19 +01:00
turboderp
c0267e37fe Support Gemma3 (text) 2025-03-15 00:06:56 +01:00
turboderp
565339101b Allow text model to use Q/K norm while vision model doesn't 2025-03-15 00:06:56 +01:00
turboderp
07afc90788 Tensor renaming kludge (Gemma3 has one _weight tensor) 2025-03-15 00:06:56 +01:00
turboderp
e2fa480595 Auto expand Q/K norm weight to match number of heads 2025-03-15 00:06:56 +01:00
turboderp
a88c18cac1 Add architecture-specific config defaults (Gemma3 config.json is incomplete) 2025-03-15 00:06:56 +01:00
turboderp
b6c1912f29 Respect norm_constant_bias in Q/K norms (Gemma3) 2025-03-15 00:06:56 +01:00
turboderp
4b5dbecdc1 Allow key prefix for lm_head (Gemma3) 2025-03-15 00:06:56 +01:00
turboderp
4844f3873c Upcast MM embeddings when residual is FP32 2025-03-15 00:06:56 +01:00
turboderp
fe51a8f4b5 Correctly include Q/K norms when compiling model 2025-03-15 00:06:56 +01:00
turboderp
38f4d7c87d Allow loading transposed unquantized linear layer 2025-03-15 00:06:56 +01:00
turboderp
9669fa33c9 Allow component models to use learned pos embeddings without regarding LLM max_seq_len 2025-03-15 00:06:56 +01:00
turboderp
7b05acd233 Allow per-layer RoPE theta 2025-03-15 00:06:56 +01:00
turboderp
23395dfa42 Fix FP32 residual for paged attn 2025-03-14 23:09:31 +01:00
Thomas
eaf8ad1041 Update chat.py, include multi-line input support and context clearing through input (#738)
* Update chat.py, include multi-line input support and context clearing

- Enable multi-line input (mli) support through the -mli argument. When using mli, end input with the EOF char (return/Ctrl+D on Unix, return/Ctrl+Z/return on Windows)
- Allow context clearing outside of amnesia by inputting "clear"

* Adding qwq chat mode, adding the ability to forget thinking context
2025-03-10 15:28:33 +01:00
turboderp
d8fa1a8250 Support partial_rotary_factor (Phi-4 mini) 2025-02-28 08:51:11 +01:00
turboderp
2e630aefdd Fix alt pos embeddings and block diagonal mask when flash-attn is disabled 2025-02-13 22:13:48 +01:00
turboderp
1a80d38891 Update build actions 2025-02-08 03:29:05 +01:00
turboderp
f1c4126045 Update build actions 2025-02-08 03:14:32 +01:00
turboderp
f98a7b7099 Update build actions 2025-02-08 03:08:45 +01:00
turboderp
096076b3fd Update build actions 2025-02-08 02:49:21 +01:00
turboderp
0f4a9f0042 Update build actions 2025-02-08 01:36:51 +01:00
turboderp
f3de3cbd34 Update build actions 2025-02-08 01:05:22 +01:00
turboderp
94e57904bc Update build actions v0.2.8 2025-02-08 00:57:29 +01:00