exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-26 17:28:59 +00:00

Author	SHA1	Message	Date
turboderp	144c576bdb	Fix bottlenecks in quantized tensor loading	2024-09-17 00:00:27 +02:00
turboderp	10a8842b25	Fix JSON inference example	2024-09-14 21:35:02 +02:00
turboderp	b2c7cf280c	Add cp310 cu121 torch2.4 Windows wheel v0.2.2	2024-09-14 21:17:52 +02:00
turboderp	46eff43403	Merge branch 'refs/heads/dev'	2024-09-14 21:13:46 +02:00
turboderp	228ba34cec	Bump to 0.2.2	2024-09-14 21:13:22 +02:00
turboderp	a372fe1241	Skip superfluous set creation when possible, even if multiple filters used	2024-09-14 19:42:33 +02:00
turboderp	aadc454183	Fix sampling using multiple filters	2024-09-14 19:36:53 +02:00
turboderp	5ee983593b	Fix potential race condition with multithreaded sampling and lazy tokenizer initialization	2024-09-14 19:28:02 +02:00
turboderp	1df7b04821	Allow non-causal attn with SDPA	2024-09-13 02:20:24 +02:00
turboderp	1e18e803da	Merge branch 'refs/heads/dev' v0.2.1	2024-09-08 19:23:19 +02:00
turboderp	0d9adf96e8	Bump to 0.2.1	2024-09-08 19:22:43 +02:00
turboderp	f0dca9a862	Bit of cleanup	2024-09-08 16:31:47 +02:00
turboderp	f2c53efd34	Remove (experimental) Q-cache calibration feature	2024-09-08 15:13:00 +02:00
turboderp	a029bcd76e	Bit of cleanup	2024-09-08 15:11:34 +02:00
AlpinDale	361d2119b3	fix: support for Rank Stabilized LoRA (RSLoRA) (#619 ) If an adapter is trained with RSLoRA, the alpha value is multiplied by `sqrt(rank)`.	2024-09-08 01:00:26 +02:00
turboderp	c1fed2ed19	Add DRY range paramater	2024-09-07 16:06:46 +02:00
turboderp	3e8e181717	Add DRY (still needs testing)	2024-09-07 02:16:16 +02:00
turboderp	affdc0d16c	Ensure streams are always set during the forward pass for the active thread	2024-09-05 20:28:09 +02:00
turboderp	5c455c18a0	Filter base: Fix instance init	2024-09-05 20:25:51 +02:00
turboderp	c9ce168ce0	Merge remote-tracking branch 'origin/dev' into dev	2024-09-04 23:23:39 +02:00
turboderp	1e462f1f7f	Force loading tensors on default stream	2024-09-04 23:04:22 +02:00
Brian Dashore	c18400fa29	Issues: Add issue templates (#615 ) Should help encourage well-formatted issues and make development easier. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 14:58:58 +02:00
turboderp	0d5c0bcc8d	Asynchronous filter evaluation	2024-08-31 16:31:51 +02:00
turboderp	12f08dbbd8	Accept Sequence return type from ExLlamaV2Filter.next()	2024-08-31 12:06:21 +02:00
turboderp	ea27954767	TP: Add (slower) SDPA fallback mode when flash-attn is unavailable	2024-08-29 22:39:45 +02:00
turboderp	40e37f4944	Merge branch 'refs/heads/dev' v0.2.0	2024-08-28 22:56:08 +02:00
turboderp	c050aec764	Bump to 0.2.0	2024-08-28 22:56:01 +02:00
turboderp	1a82283df3	Merge branch 'refs/heads/dev'	2024-08-28 22:55:11 +02:00
turboderp	f1d8909809	Catch all exceptions for nvidia-smi and rocm-smi	2024-08-28 20:25:56 +02:00
turboderp	db14154fee	Don't use default stream for logit padding mask after all	2024-08-27 22:06:19 +02:00
turboderp	8d3d4c227e	Ensure logit padding happens on default stream	2024-08-27 21:48:30 +02:00
turboderp	d9f0ecc12c	TP: Fix vocab split for models with odd vocab sizes	2024-08-27 21:47:49 +02:00
turboderp	7319b6ea31	Fix graph update for MLP with post layernorm	2024-08-27 20:07:21 +02:00
turboderp	4230dab3c1	Fix model_init feedback	2024-08-27 19:16:23 +02:00
turboderp	69291d1333	Fix another possible sync issues with fasttensors (for Windows)	2024-08-25 22:01:53 +02:00
turboderp	e539f7cc28	Fix another possible sync issues with fasttensors	2024-08-25 21:56:02 +02:00
turboderp	d3fe9f25d2	Unmap tensors on CPU to reduce temp VRAM overhead while loading	2024-08-25 21:15:28 +02:00
turboderp	7e15947aa1	Fix possible sync issues with fasttensors	2024-08-25 14:51:46 +02:00
turboderp	57ee846672	Fix ROCm compile	2024-08-22 14:04:20 +02:00
turboderp	b926ed4c7e	Merge remote-tracking branch 'origin/master' v0.1.9	2024-08-22 13:50:47 +02:00
turboderp	42e1baeef0	Bump to 0.1.9	2024-08-22 13:48:36 +02:00
turboderp	d58d5ab723	Add some docstrings	2024-08-22 13:43:19 +02:00
turboderp	555c360798	Update TP example	2024-08-22 13:43:19 +02:00
turboderp	4117daa546	Cleanup	2024-08-22 12:49:37 +02:00
turboderp	9917403229	Bulk example: Compute immediate output tokens/second	2024-08-22 12:46:58 +02:00
turboderp	547135cc43	Skip sampling if only one token allowed by filters	2024-08-22 12:33:19 +02:00
turboderp	0978ba5f86	Optimize logit filtering in sampler	2024-08-22 11:17:45 +02:00
Michael Panchenko	f1d79c96d2	Fix in ext.py (index error on system without GPU) (#594 ) * Update README.md * Fix in ext.py (index error on system without GPU) --------- Co-authored-by: turboderp <11859846+turboderp@users.noreply.github.com>	2024-08-20 17:18:50 +02:00
turboderp	e7053198ba	Enable defrag for paged TP cache	2024-08-20 17:10:51 +02:00
turboderp	e89dc5b762	Fix for older Torch versions	2024-08-20 14:08:35 +02:00

... 3 4 5 6 7 ...

1444 Commits