1454 Commits

Author SHA1 Message Date
kingbri
6a2d831140 Actions: Remove sentencepiece from other workflows
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
v0.3.2
2025-07-13 18:11:53 -04:00
kingbri
586bfc6744 ExllamaV2: Bump version
v0.3.2

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-07-13 18:06:03 -04:00
kingbri
186171e077 Merge branch 'dev' 2025-07-13 18:05:44 -04:00
kingbri
0f2eae558e Actions: Remove sentencepiece from build steps
Prevents bundling inside the wheel file which removes the dependency.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-07-09 13:20:21 -04:00
turboderp
d2a810f2ae Fix rounding error in Pixtral image preprocessor 2025-06-05 01:32:36 +02:00
turboderp
2ca8281c31 Merge branch 'dev' 2025-05-29 00:33:35 +02:00
turboderp
0efb999c24 Merge remote-tracking branch 'origin/dev' into dev 2025-05-29 00:33:05 +02:00
turboderp
b311d0aca4 Remove sentencepiece dep from setup.py 2025-05-29 00:32:51 +02:00
kingbri
2b20c24dcd Merge branch 'dev' v0.3.1 2025-05-27 11:13:29 -04:00
kingbri
a08ef4f1ed ExllamaV2: Bump version
v0.3.1

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-27 11:13:06 -04:00
turboderp
97e4fd90f1 Fail if tokenizer.json not found 2025-05-27 17:10:35 +02:00
turboderp
a811641c3b Optimize paged cache defrag 2025-05-27 01:53:37 +02:00
turboderp
1adff7d827 Remove SentencePiece support 2025-05-14 15:30:57 +02:00
turboderp
a87ea02830 Remove SentencePiece support 2025-05-14 13:41:09 +02:00
kingbri
0a3d4200e1 Actions: Build rocm only
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 14:17:24 -04:00
kingbri
bb4206d5bc Ext: Fix register call for float
ROCm doesn't support the register keyword. This should fix compile.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 14:15:56 -04:00
kingbri
0a7733110e Ext: Fix CUDA type cast
The __half_as_ushort function isn't present in cuda < 12.4

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 12:36:40 -04:00
kingbri
9d621509ab Project: Bump version
v0.3.0

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
v0.3.0
2025-05-12 10:11:57 -04:00
Brian
aa2d5aa471 Merge pull request #788 from turboderp-org/dev
Merge Dev into Master
2025-05-12 10:10:06 -04:00
kingbri
c820539b79 Actions: Add redirects to CUDA downloads
Some CUDA archives might redirect, so tell CuRL that.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 14:25:10 -04:00
kingbri
b4c6b39590 Actions: Update CUDA install commands
Fast install for cuda 11.8, 12.1, 12.4, 12.8

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 13:35:58 -04:00
kingbri
89d17b7ba3 Actions: Migrate to temp windows build action
Need to rebuild windows wheels with older MSVC

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-07 13:58:52 -04:00
kingbri
3d9fde2dd0 Actions: Go back to VS 17.9
Increase compatability with Windows systems.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-07 13:57:25 -04:00
turboderp
e312b74f15 Fix unload() for vision tower 2025-05-07 19:08:29 +02:00
turboderp
747fbadca9 Merge branch 'master' into dev 2025-05-03 18:29:14 +02:00
turboderp
68976a07d7 Add basic support for Qwen3MoE 2025-05-01 20:23:33 +02:00
turboderp
b422a85c47 Merge branch 'master' into dev 2025-04-29 20:44:36 +02:00
turboderp
a3440098a4 Add Qwen3ForCausalLM 2025-04-29 20:44:10 +02:00
Brian
af04f9a393 Merge pull request #776 from kingbri1/master
Fix the build action
2025-04-26 13:30:47 -04:00
kingbri
be765fab7a Actions: Fix CUDA build
The pytorch friendly cuda version wasn't being pushed to GH outputs.

In addition, add a simpler CUDA install method for Windows.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>

Test

Actions: Add short cuda install
2025-04-26 01:23:22 -04:00
turboderp
abbece178e Temporary build action for Torch 2.7 2025-04-24 18:39:46 +02:00
Brian
263c758ae5 Actions: Update install methods (#775)
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>

Actions: Test
2025-04-24 18:35:36 +02:00
turboderp
0374e367aa Upgrade large runner actions to 22.04 2025-04-24 02:34:40 +02:00
turboderp
569dcb2ec6 Change wheels to CUDA 12.8.1 2025-04-24 02:00:42 +02:00
turboderp
68e2b92d79 Build on ubuntu-22.04, enable Hopper and Blackwell on Torch 2.7.0 wheels v0.2.9 2025-04-24 00:52:11 +02:00
Brian
7b2e4d8ddc Actions: Add Torch 2.7 (#773)
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-24 00:39:31 +02:00
RaahimSiddiqi
ea6cc68ac1 Added Fix for Loading unquantized Huggingface models. (#771)
* Added banned_strings parameter to the generator.

* Added Fix for loading unquantized huggingface models.

---------

Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com>
2025-04-24 00:38:10 +02:00
turboderp
3a90264940 Bump to v0.2.9 2025-04-24 00:36:39 +02:00
turboderp
2c170bb6c6 Gemma3 local RoPE fixes 2025-04-24 00:31:40 +02:00
turboderp
2d48ccd23e Merge branch 'dev' 2025-04-18 23:29:41 +02:00
turboderp
9244003a40 Add support for Mistral 3.1 VLM 2025-04-18 22:47:47 +02:00
turboderp
68f7461985 Optional attn bias for GLM4 2025-04-16 01:24:45 +02:00
turboderp
6a5d303355 Merge remote-tracking branch 'origin/dev' into dev 2025-04-15 18:57:47 +02:00
turboderp
de19cbcc59 Add GLM4 architecture 2025-04-15 18:57:29 +02:00
RaahimSiddiqi
09c18e9c47 Added banned_strings parameter to the generator. (#756)
Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com>
2025-04-11 22:12:17 +02:00
MikeRoz47
61450b4860 concatenate the sin and cos tensors (#758) 2025-04-11 22:11:13 +02:00
turboderp
b148bb42b8 Fix Gemma3 head norm (RMS) 2025-04-11 00:18:06 +02:00
turboderp
d471d44f01 Gemma3 local RoPE fixes 2025-04-10 22:16:08 +02:00
turboderp
a03db457ef Fix: Prioritize default head_dim when provided by architecture (Gemma3) over computed head_dim 2025-03-15 11:52:51 +01:00
turboderp
385a5162ba Fix: Correctly read query_pre_attn_scalar from text_config (Gemma3) 2025-03-15 11:01:33 +01:00