turboderp
97e4fd90f1
Fail if tokenizer.json not found
2025-05-27 17:10:35 +02:00
turboderp
a811641c3b
Optimize paged cache defrag
2025-05-27 01:53:37 +02:00
turboderp
1adff7d827
Remove SentencePiece support
2025-05-14 15:30:57 +02:00
turboderp
a87ea02830
Remove SentencePiece support
2025-05-14 13:41:09 +02:00
kingbri
bb4206d5bc
Ext: Fix register call for float
...
ROCm doesn't support the register keyword. This should fix compile.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-05-12 14:15:56 -04:00
kingbri
0a7733110e
Ext: Fix CUDA type cast
...
The __half_as_ushort function isn't present in cuda < 12.4
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-05-12 12:36:40 -04:00
kingbri
9d621509ab
Project: Bump version
...
v0.3.0
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
v0.3.0
2025-05-12 10:11:57 -04:00
Brian
aa2d5aa471
Merge pull request #788 from turboderp-org/dev
...
Merge Dev into Master
2025-05-12 10:10:06 -04:00
kingbri
c820539b79
Actions: Add redirects to CUDA downloads
...
Some CUDA archives might redirect, so tell CuRL that.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-05-09 14:25:10 -04:00
kingbri
b4c6b39590
Actions: Update CUDA install commands
...
Fast install for cuda 11.8, 12.1, 12.4, 12.8
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-05-09 13:35:58 -04:00
kingbri
89d17b7ba3
Actions: Migrate to temp windows build action
...
Need to rebuild windows wheels with older MSVC
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-05-07 13:58:52 -04:00
kingbri
3d9fde2dd0
Actions: Go back to VS 17.9
...
Increase compatability with Windows systems.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-05-07 13:57:25 -04:00
turboderp
e312b74f15
Fix unload() for vision tower
2025-05-07 19:08:29 +02:00
turboderp
747fbadca9
Merge branch 'master' into dev
2025-05-03 18:29:14 +02:00
turboderp
68976a07d7
Add basic support for Qwen3MoE
2025-05-01 20:23:33 +02:00
turboderp
b422a85c47
Merge branch 'master' into dev
2025-04-29 20:44:36 +02:00
turboderp
a3440098a4
Add Qwen3ForCausalLM
2025-04-29 20:44:10 +02:00
Brian
af04f9a393
Merge pull request #776 from kingbri1/master
...
Fix the build action
2025-04-26 13:30:47 -04:00
kingbri
be765fab7a
Actions: Fix CUDA build
...
The pytorch friendly cuda version wasn't being pushed to GH outputs.
In addition, add a simpler CUDA install method for Windows.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
Test
Actions: Add short cuda install
2025-04-26 01:23:22 -04:00
turboderp
abbece178e
Temporary build action for Torch 2.7
2025-04-24 18:39:46 +02:00
Brian
263c758ae5
Actions: Update install methods ( #775 )
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
Actions: Test
2025-04-24 18:35:36 +02:00
turboderp
0374e367aa
Upgrade large runner actions to 22.04
2025-04-24 02:34:40 +02:00
turboderp
569dcb2ec6
Change wheels to CUDA 12.8.1
2025-04-24 02:00:42 +02:00
turboderp
68e2b92d79
Build on ubuntu-22.04, enable Hopper and Blackwell on Torch 2.7.0 wheels
v0.2.9
2025-04-24 00:52:11 +02:00
Brian
7b2e4d8ddc
Actions: Add Torch 2.7 ( #773 )
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-04-24 00:39:31 +02:00
RaahimSiddiqi
ea6cc68ac1
Added Fix for Loading unquantized Huggingface models. ( #771 )
...
* Added banned_strings parameter to the generator.
* Added Fix for loading unquantized huggingface models.
---------
Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com >
2025-04-24 00:38:10 +02:00
turboderp
3a90264940
Bump to v0.2.9
2025-04-24 00:36:39 +02:00
turboderp
2c170bb6c6
Gemma3 local RoPE fixes
2025-04-24 00:31:40 +02:00
turboderp
2d48ccd23e
Merge branch 'dev'
2025-04-18 23:29:41 +02:00
turboderp
9244003a40
Add support for Mistral 3.1 VLM
2025-04-18 22:47:47 +02:00
turboderp
68f7461985
Optional attn bias for GLM4
2025-04-16 01:24:45 +02:00
turboderp
6a5d303355
Merge remote-tracking branch 'origin/dev' into dev
2025-04-15 18:57:47 +02:00
turboderp
de19cbcc59
Add GLM4 architecture
2025-04-15 18:57:29 +02:00
RaahimSiddiqi
09c18e9c47
Added banned_strings parameter to the generator. ( #756 )
...
Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com >
2025-04-11 22:12:17 +02:00
MikeRoz47
61450b4860
concatenate the sin and cos tensors ( #758 )
2025-04-11 22:11:13 +02:00
turboderp
b148bb42b8
Fix Gemma3 head norm (RMS)
2025-04-11 00:18:06 +02:00
turboderp
d471d44f01
Gemma3 local RoPE fixes
2025-04-10 22:16:08 +02:00
turboderp
a03db457ef
Fix: Prioritize default head_dim when provided by architecture (Gemma3) over computed head_dim
2025-03-15 11:52:51 +01:00
turboderp
385a5162ba
Fix: Correctly read query_pre_attn_scalar from text_config (Gemma3)
2025-03-15 11:01:33 +01:00
turboderp
17762c177f
Merge remote-tracking branch 'origin/dev' into dev
2025-03-15 01:37:43 +01:00
turboderp
6f7623ff0e
Update examples
2025-03-15 01:30:52 +01:00
turboderp
77a1e2cb0c
Warn instead of failing for unsupported vision model
2025-03-15 00:13:52 +01:00
turboderp
578fd4234f
Support Gemma3 (vision)
2025-03-15 00:13:19 +01:00
turboderp
c0267e37fe
Support Gemma3 (text)
2025-03-15 00:06:56 +01:00
turboderp
565339101b
Allow text model to use Q/K norm while vision model doesn't
2025-03-15 00:06:56 +01:00
turboderp
07afc90788
Tensor renaming kludge (Gemma3 has one _weight tensor)
2025-03-15 00:06:56 +01:00
turboderp
e2fa480595
Auto expand Q/K norm weight to match number of heads
2025-03-15 00:06:56 +01:00
turboderp
a88c18cac1
Add architecture-specific config defaults (Gemma3 config.json is incomplete)
2025-03-15 00:06:56 +01:00
turboderp
b6c1912f29
Respect norm_constant_bias in Q/K norms (Gemma3)
2025-03-15 00:06:56 +01:00
turboderp
4b5dbecdc1
Allow key prefix for lm_head (Gemma3)
2025-03-15 00:06:56 +01:00