exllamav3

mirror of https://github.com/turboderp-org/exllamav3.git synced 2026-05-11 16:30:12 +00:00

Author	SHA1	Message	Date
turboderp	2d58a3cf35	convert.py: Fix parallel mode regression	2026-05-10 17:38:46 +02:00
turboderp	b8f0c93c37	Actions: Fix build tag for cu132 wheels	2026-05-10 16:03:06 +02:00
turboderp	e3410fa2bc	Actions: Add cu132 wheels (Python 3.12+, Torch 2.11)	2026-05-10 15:58:33 +02:00
turboderp	112612705c	Actions: cu132 wheel experiment, attempt #10	2026-05-10 14:51:55 +02:00
turboderp	4e7b6ed240	Actions: cu132 wheel experiment, attempt #9	2026-05-10 14:39:17 +02:00
turboderp	ca822f9400	Actions: cu132 wheel experiment, attempt #8	2026-05-10 14:27:15 +02:00
turboderp	efbcda2911	Actions: cu132 wheel experiment, attempt #6	2026-05-10 14:11:33 +02:00
turboderp	005ff30323	Actions: cu132 wheel experiment, attempt #5 (add cp312)	2026-05-10 13:59:24 +02:00
turboderp	287c3ddbd8	Actions: cu132 wheel experiment, attempt #5	2026-05-10 13:55:24 +02:00
turboderp	67bdf5633b	Actions: cu132 wheel experiment, attempt #4	2026-05-10 13:36:00 +02:00
turboderp	0313939d9b	Actions: cu132 wheel experiment, attempt #3	2026-05-10 13:08:42 +02:00
turboderp	5ddc4ea8dd	Actions: cu132 wheel experiment, attempt #2	2026-05-10 12:52:28 +02:00
turboderp	0a33b2fe3c	Actions: cu132 wheel experiment	2026-05-10 12:33:50 +02:00
turboderp	b0e7dd6c8a	Unpin pydantic dependency	2026-05-09 22:36:44 +02:00
turboderp	c9184a9801	Add .modules.attention to packages	2026-05-09 22:33:32 +02:00
turboderp	fa142797ce	Bump to v0.0.34 v0.0.34	2026-05-09 21:08:45 +02:00
turboderp	1fe53b96e6	Merge remote-tracking branch 'origin/dev' into dev	2026-05-09 21:07:38 +02:00
turboderp	fae1a7a9de	Remove redundant reconstruct during prefill (fixes regression after RYS)	2026-05-09 21:07:27 +02:00
turboderp	817b35dad9	Merge pull request #203 from so-dimm/patch-1 README.md: fix typo and update Gemma 4 support info	2026-05-09 20:39:17 +02:00
turboderp	ec7d5102ad	Bump to v0.0.33 v0.0.33	2026-05-09 19:52:30 +02:00
turboderp	9a3041c5b0	longctx eval: Allow draft model	2026-05-09 13:30:22 +02:00
turboderp	d48d859d37	Attn: Update fallback warning to suggest installing Triton	2026-05-09 12:47:50 +02:00
turboderp	befc9b1730	Tokenizer: Also scan generation_config.json for EOS tokens	2026-05-09 02:22:25 +02:00
turboderp	e9f5a09e91	Attn: Allow TP export when use_k_as_v==True	2026-05-09 02:05:39 +02:00
turboderp	35b66649fb	Attn: Refactor, new dispatch mechanism with multiple backends, remove explicit SDPA mode (keep as fallback), new paged Triton attn kernels for head_dim=512	2026-05-09 02:03:49 +02:00
turboderp	2631448e43	Attn: Increase bighead attn kernel threshold to 16 (for Gemma4 DFlash)	2026-05-07 23:13:14 +02:00
turboderp	11a2a849cc	Eval: Add `-gp/--gen_prompt` option to eval scripts	2026-05-07 20:34:02 +02:00
turboderp	972a2c514c	Bighead attn: Correct Q shape when bsz > 1 and no cache	2026-05-07 20:09:55 +02:00
turboderp	b943f85c36	DFlash: Allow quantization	2026-05-06 22:44:42 +02:00
turboderp	3ff37127f0	convert.py: Allow quantizing without calibration (DFlash)	2026-05-06 22:44:24 +02:00
turboderp	0dc28740ad	convert.py: Clear tensor temp directory before writing first set of quantized tensors	2026-05-06 22:42:37 +02:00
turboderp	770894a9cc	Autotune: Improve candidate selection	2026-05-06 19:58:28 +02:00
Optimal	10a45faa1f	README.md: fix typo and update Gemma 4 support info - Simple typo fix in README.md - Added a note that Gemma 4 E2B and E4B models are currently not supported.	2026-05-06 21:03:35 +09:00
turboderp	a960f4dbce	DFlash: Fix target layer indices	2026-05-06 09:53:45 +02:00
turboderp	72e479ddc4	Update README.md	2026-05-05 10:04:37 +02:00
turboderp	c6a54acbe1	SD benchmark: Reduce default cache size	2026-05-05 10:04:29 +02:00
turboderp	841897fe42	TP: Increase CPU reduce timeout to account for autotune	2026-05-03 18:12:48 +02:00
turboderp	e0d330d2ff	Update build actions	2026-05-03 01:51:56 +02:00
turboderp	d7d6d4240a	Autotune: Don't include full torch extension header to prevent build issues on Windows	2026-05-03 01:51:45 +02:00
turboderp	86253bdbed	Update build actions	2026-05-03 00:54:21 +02:00
turboderp	45d1c8572f	Suppress warnings in extension build	2026-05-03 00:50:37 +02:00
turboderp	b70b0476f5	Autotune: Fix typo breaking Windows build	2026-05-03 00:50:08 +02:00
turboderp	fae9d1d378	Update build actions: restore JIT wheel	2026-05-02 23:58:51 +02:00
turboderp	820e5063f3	Update build actions, try to pin xformers during wheel build to prevent Torch 2.11 dependency on all versions v0.0.32	2026-05-02 23:37:22 +02:00
turboderp	527ccc87eb	Bump to v0.0.32	2026-05-02 23:36:26 +02:00
turboderp	7bfa8e6a72	Autotune: Adjust tuning precision thresholds	2026-05-02 14:05:19 +02:00
turboderp	fd5b65b526	Generator: Advance filter when draft tokens are accepted	2026-05-02 13:57:05 +02:00
turboderp	61b211eaac	Generator: Guard against single stop condition not given as a list	2026-05-02 12:23:08 +02:00
turboderp	ca708d9064	TP: Increase reduce timeout to 45s to allow for autotune	2026-05-02 12:09:21 +02:00
turboderp	d7eb7c3f6b	TP: Fix cache layer regression after RYS	2026-05-02 12:08:02 +02:00

1 2 3 4 5 ...

973 Commits