exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-03-15 00:07:26 +00:00

Author	SHA1	Message	Date
turboderp	a3440098a4	Add Qwen3ForCausalLM	2025-04-29 20:44:10 +02:00
turboderp	3a90264940	Bump to v0.2.9	2025-04-24 00:36:39 +02:00
turboderp	2c170bb6c6	Gemma3 local RoPE fixes	2025-04-24 00:31:40 +02:00
turboderp	2d48ccd23e	Merge branch 'dev'	2025-04-18 23:29:41 +02:00
turboderp	9244003a40	Add support for Mistral 3.1 VLM	2025-04-18 22:47:47 +02:00
turboderp	68f7461985	Optional attn bias for GLM4	2025-04-16 01:24:45 +02:00
turboderp	6a5d303355	Merge remote-tracking branch 'origin/dev' into dev	2025-04-15 18:57:47 +02:00
turboderp	de19cbcc59	Add GLM4 architecture	2025-04-15 18:57:29 +02:00
RaahimSiddiqi	09c18e9c47	Added banned_strings parameter to the generator. (#756 ) Co-authored-by: RaahimSiddiqi <Raahim.siddiqi@vidizmo.com>	2025-04-11 22:12:17 +02:00
MikeRoz47	61450b4860	concatenate the sin and cos tensors (#758 )	2025-04-11 22:11:13 +02:00
turboderp	b148bb42b8	Fix Gemma3 head norm (RMS)	2025-04-11 00:18:06 +02:00
turboderp	d471d44f01	Gemma3 local RoPE fixes	2025-04-10 22:16:08 +02:00
turboderp	a03db457ef	Fix: Prioritize default head_dim when provided by architecture (Gemma3) over computed head_dim	2025-03-15 11:52:51 +01:00
turboderp	385a5162ba	Fix: Correctly read query_pre_attn_scalar from text_config (Gemma3)	2025-03-15 11:01:33 +01:00
turboderp	17762c177f	Merge remote-tracking branch 'origin/dev' into dev	2025-03-15 01:37:43 +01:00
turboderp	6f7623ff0e	Update examples	2025-03-15 01:30:52 +01:00
turboderp	77a1e2cb0c	Warn instead of failing for unsupported vision model	2025-03-15 00:13:52 +01:00
turboderp	578fd4234f	Support Gemma3 (vision)	2025-03-15 00:13:19 +01:00
turboderp	c0267e37fe	Support Gemma3 (text)	2025-03-15 00:06:56 +01:00
turboderp	565339101b	Allow text model to use Q/K norm while vision model doesn't	2025-03-15 00:06:56 +01:00
turboderp	07afc90788	Tensor renaming kludge (Gemma3 has one _weight tensor)	2025-03-15 00:06:56 +01:00
turboderp	e2fa480595	Auto expand Q/K norm weight to match number of heads	2025-03-15 00:06:56 +01:00
turboderp	a88c18cac1	Add architecture-specific config defaults (Gemma3 config.json is incomplete)	2025-03-15 00:06:56 +01:00
turboderp	b6c1912f29	Respect norm_constant_bias in Q/K norms (Gemma3)	2025-03-15 00:06:56 +01:00
turboderp	4b5dbecdc1	Allow key prefix for lm_head (Gemma3)	2025-03-15 00:06:56 +01:00
turboderp	4844f3873c	Upcast MM embeddings when residual is FP32	2025-03-15 00:06:56 +01:00
turboderp	fe51a8f4b5	Correctly include Q/K norms when compiling model	2025-03-15 00:06:56 +01:00
turboderp	38f4d7c87d	Allow loading transposed unquantized linear layer	2025-03-15 00:06:56 +01:00
turboderp	9669fa33c9	Allow component models to use learned pos embeddings without regarding LLM max_seq_len	2025-03-15 00:06:56 +01:00
turboderp	7b05acd233	Allow per-layer RoPE theta	2025-03-15 00:06:56 +01:00
turboderp	23395dfa42	Fix FP32 residual for paged attn	2025-03-14 23:09:31 +01:00
Thomas	eaf8ad1041	Update chat.py, include multi-line input support and context clearing through input (#738 ) * Update chat.py, include multi-line input support and context clearing - Enable multi-line input (mli) support through the -mli argument. When using mli, end input with the EOF char (return/Ctrl+D on Unix, return/Ctrl+Z/return on Windows) - Allow context clearing outside of amnesia by inputting "clear" * Adding qwq chat mode, adding the ability to forget thinking context	2025-03-10 15:28:33 +01:00
turboderp	d8fa1a8250	Support partial_rotary_factor (Phi-4 mini)	2025-02-28 08:51:11 +01:00
turboderp	2e630aefdd	Fix alt pos embeddings and block diagonal mask when flash-attn is disabled	2025-02-13 22:13:48 +01:00
turboderp	1a80d38891	Update build actions	2025-02-08 03:29:05 +01:00
turboderp	f1c4126045	Update build actions	2025-02-08 03:14:32 +01:00
turboderp	f98a7b7099	Update build actions	2025-02-08 03:08:45 +01:00
turboderp	096076b3fd	Update build actions	2025-02-08 02:49:21 +01:00
turboderp	0f4a9f0042	Update build actions	2025-02-08 01:36:51 +01:00
turboderp	f3de3cbd34	Update build actions	2025-02-08 01:05:22 +01:00
turboderp	94e57904bc	Update build actions v0.2.8	2025-02-08 00:57:29 +01:00
turboderp	3a9618d471	Update build actions	2025-02-08 00:44:44 +01:00
turboderp	3486f9eb71	Merge branch 'refs/heads/dev'	2025-02-08 00:26:52 +01:00
turboderp	6e4a84a1e3	Bump to 0.2.8	2025-02-08 00:26:30 +01:00
turboderp	d05fbcc854	Fix Pixtral regression	2025-02-04 21:01:23 +01:00
turboderp	96b2f9df77	Add Qwen2.5 mode to grounding demo	2025-01-29 22:41:36 +01:00
turboderp	cce6f95cd3	Initial support for Qwen2.5-VL	2025-01-29 03:03:36 +01:00
turboderp	d0413b06f8	Check length of gpu_split in model_init	2025-01-09 11:36:25 +01:00
turboderp	c8fa853c89	Test script: Allow --eval_rows in wiki2 ppl test	2025-01-09 11:14:48 +01:00
turboderp	318435db81	Sampler: Remove superfluous pre-sort pass	2025-01-09 11:14:19 +01:00

1 2 3 4 5 ...

1418 Commits