exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-03-15 00:07:26 +00:00

Author	SHA1	Message	Date
turboderp	97e4fd90f1	Fail if tokenizer.json not found	2025-05-27 17:10:35 +02:00
turboderp	a811641c3b	Optimize paged cache defrag	2025-05-27 01:53:37 +02:00
turboderp	1adff7d827	Remove SentencePiece support	2025-05-14 15:30:57 +02:00
turboderp	a87ea02830	Remove SentencePiece support	2025-05-14 13:41:09 +02:00
kingbri	bb4206d5bc	Ext: Fix register call for float ROCm doesn't support the register keyword. This should fix compile. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 14:15:56 -04:00
kingbri	0a7733110e	Ext: Fix CUDA type cast The __half_as_ushort function isn't present in cuda < 12.4 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 12:36:40 -04:00
kingbri	9d621509ab	Project: Bump version v0.3.0 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> v0.3.0	2025-05-12 10:11:57 -04:00
Brian	aa2d5aa471	Merge pull request #788 from turboderp-org/dev Merge Dev into Master	2025-05-12 10:10:06 -04:00
kingbri	c820539b79	Actions: Add redirects to CUDA downloads Some CUDA archives might redirect, so tell CuRL that. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 14:25:10 -04:00
kingbri	b4c6b39590	Actions: Update CUDA install commands Fast install for cuda 11.8, 12.1, 12.4, 12.8 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 13:35:58 -04:00
kingbri	89d17b7ba3	Actions: Migrate to temp windows build action Need to rebuild windows wheels with older MSVC Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-07 13:58:52 -04:00
kingbri	3d9fde2dd0	Actions: Go back to VS 17.9 Increase compatability with Windows systems. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-07 13:57:25 -04:00
turboderp	e312b74f15	Fix unload() for vision tower	2025-05-07 19:08:29 +02:00
turboderp	747fbadca9	Merge branch 'master' into dev	2025-05-03 18:29:14 +02:00
turboderp	68976a07d7	Add basic support for Qwen3MoE	2025-05-01 20:23:33 +02:00
turboderp	b422a85c47	Merge branch 'master' into dev	2025-04-29 20:44:36 +02:00
turboderp	a3440098a4	Add Qwen3ForCausalLM	2025-04-29 20:44:10 +02:00
Brian	af04f9a393	Merge pull request #776 from kingbri1/master Fix the build action	2025-04-26 13:30:47 -04:00
kingbri	be765fab7a	Actions: Fix CUDA build The pytorch friendly cuda version wasn't being pushed to GH outputs. In addition, add a simpler CUDA install method for Windows. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Test Actions: Add short cuda install	2025-04-26 01:23:22 -04:00
turboderp	abbece178e	Temporary build action for Torch 2.7	2025-04-24 18:39:46 +02:00
Brian	263c758ae5	Actions: Update install methods (#775 ) Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Actions: Test	2025-04-24 18:35:36 +02:00
turboderp	0374e367aa	Upgrade large runner actions to 22.04	2025-04-24 02:34:40 +02:00
turboderp	569dcb2ec6	Change wheels to CUDA 12.8.1	2025-04-24 02:00:42 +02:00
turboderp	68e2b92d79	Build on ubuntu-22.04, enable Hopper and Blackwell on Torch 2.7.0 wheels v0.2.9	2025-04-24 00:52:11 +02:00
Brian	7b2e4d8ddc	Actions: Add Torch 2.7 (#773 ) Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 00:39:31 +02:00
RaahimSiddiqi	ea6cc68ac1	Added Fix for Loading unquantized Huggingface models. (#771 ) * Added banned_strings parameter to the generator. * Added Fix for loading unquantized huggingface models. --------- Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com>	2025-04-24 00:38:10 +02:00
turboderp	3a90264940	Bump to v0.2.9	2025-04-24 00:36:39 +02:00
turboderp	2c170bb6c6	Gemma3 local RoPE fixes	2025-04-24 00:31:40 +02:00
turboderp	2d48ccd23e	Merge branch 'dev'	2025-04-18 23:29:41 +02:00
turboderp	9244003a40	Add support for Mistral 3.1 VLM	2025-04-18 22:47:47 +02:00
turboderp	68f7461985	Optional attn bias for GLM4	2025-04-16 01:24:45 +02:00
turboderp	6a5d303355	Merge remote-tracking branch 'origin/dev' into dev	2025-04-15 18:57:47 +02:00
turboderp	de19cbcc59	Add GLM4 architecture	2025-04-15 18:57:29 +02:00
RaahimSiddiqi	09c18e9c47	Added banned_strings parameter to the generator. (#756 ) Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com>	2025-04-11 22:12:17 +02:00
MikeRoz47	61450b4860	concatenate the sin and cos tensors (#758 )	2025-04-11 22:11:13 +02:00
turboderp	b148bb42b8	Fix Gemma3 head norm (RMS)	2025-04-11 00:18:06 +02:00
turboderp	d471d44f01	Gemma3 local RoPE fixes	2025-04-10 22:16:08 +02:00
turboderp	a03db457ef	Fix: Prioritize default head_dim when provided by architecture (Gemma3) over computed head_dim	2025-03-15 11:52:51 +01:00
turboderp	385a5162ba	Fix: Correctly read query_pre_attn_scalar from text_config (Gemma3)	2025-03-15 11:01:33 +01:00
turboderp	17762c177f	Merge remote-tracking branch 'origin/dev' into dev	2025-03-15 01:37:43 +01:00
turboderp	6f7623ff0e	Update examples	2025-03-15 01:30:52 +01:00
turboderp	77a1e2cb0c	Warn instead of failing for unsupported vision model	2025-03-15 00:13:52 +01:00
turboderp	578fd4234f	Support Gemma3 (vision)	2025-03-15 00:13:19 +01:00
turboderp	c0267e37fe	Support Gemma3 (text)	2025-03-15 00:06:56 +01:00
turboderp	565339101b	Allow text model to use Q/K norm while vision model doesn't	2025-03-15 00:06:56 +01:00
turboderp	07afc90788	Tensor renaming kludge (Gemma3 has one _weight tensor)	2025-03-15 00:06:56 +01:00
turboderp	e2fa480595	Auto expand Q/K norm weight to match number of heads	2025-03-15 00:06:56 +01:00
turboderp	a88c18cac1	Add architecture-specific config defaults (Gemma3 config.json is incomplete)	2025-03-15 00:06:56 +01:00
turboderp	b6c1912f29	Respect norm_constant_bias in Q/K norms (Gemma3)	2025-03-15 00:06:56 +01:00
turboderp	4b5dbecdc1	Allow key prefix for lm_head (Gemma3)	2025-03-15 00:06:56 +01:00

1 2 3 4 5 ...

1443 Commits