exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-30 03:01:23 +00:00

Author	SHA1	Message	Date
Llama Enjoyer	e960dfd68d	Fix the temperature argument to accept values passed via the cmd line.	2024-09-24 17:18:08 +02:00
Sinan	7c7b1993b4	Added draft token count as parameter to chat.py (#635 )	2024-09-24 11:16:30 +02:00
turboderp	8361f3f4a0	Add missing cp310+cu118 torch 2.4 windows wheel	2024-09-23 17:38:53 +02:00
turboderp	15e54046ba	More stream gymnastics	2024-09-23 17:28:55 +02:00
turboderp	a5132d072e	Add XTC sampler	2024-09-22 23:09:19 +02:00
turboderp	6d7b2e8e7a	Revert snapshot interval	2024-09-22 19:11:32 +02:00
turboderp	43a0be35df	Make measurement less sensitive to very sparse inf values in reference fwd pass	2024-09-22 19:01:29 +02:00
turboderp	a17f6665cb	Fix streams in quantizer	2024-09-22 18:15:05 +02:00
turboderp	9946f45f1c	Force tensor loading onto priority stream	2024-09-20 22:05:20 +02:00
turboderp	e155e0a5b0	Fix loading in new thread	2024-09-18 19:46:56 +02:00
turboderp	c4a03e09f5	Tokenizer: Give priority to tokenizer.json instead of tokenizer.model	2024-09-18 00:41:43 +02:00
turboderp	12bceb9f4b	Cleanup	2024-09-18 00:32:00 +02:00
turboderp	0695f3a854	Fix potential bug in filter evaluation	2024-09-17 00:34:28 +02:00
turboderp	8a25e0f2b3	Merge branch 'refs/heads/master' into dev	2024-09-17 00:33:36 +02:00
turboderp	b25210778c	Remove fasttensors, add platform-agnostic multithreaded ST loader	2024-09-17 00:33:16 +02:00
turboderp	144c576bdb	Fix bottlenecks in quantized tensor loading	2024-09-17 00:00:27 +02:00
turboderp	10a8842b25	Fix JSON inference example	2024-09-14 21:35:02 +02:00
turboderp	b2c7cf280c	Add cp310 cu121 torch2.4 Windows wheel v0.2.2	2024-09-14 21:17:52 +02:00
turboderp	46eff43403	Merge branch 'refs/heads/dev'	2024-09-14 21:13:46 +02:00
turboderp	228ba34cec	Bump to 0.2.2	2024-09-14 21:13:22 +02:00
turboderp	a372fe1241	Skip superfluous set creation when possible, even if multiple filters used	2024-09-14 19:42:33 +02:00
turboderp	aadc454183	Fix sampling using multiple filters	2024-09-14 19:36:53 +02:00
turboderp	5ee983593b	Fix potential race condition with multithreaded sampling and lazy tokenizer initialization	2024-09-14 19:28:02 +02:00
turboderp	1df7b04821	Allow non-causal attn with SDPA	2024-09-13 02:20:24 +02:00
turboderp	1e18e803da	Merge branch 'refs/heads/dev' v0.2.1	2024-09-08 19:23:19 +02:00
turboderp	0d9adf96e8	Bump to 0.2.1	2024-09-08 19:22:43 +02:00
turboderp	f0dca9a862	Bit of cleanup	2024-09-08 16:31:47 +02:00
turboderp	f2c53efd34	Remove (experimental) Q-cache calibration feature	2024-09-08 15:13:00 +02:00
turboderp	a029bcd76e	Bit of cleanup	2024-09-08 15:11:34 +02:00
AlpinDale	361d2119b3	fix: support for Rank Stabilized LoRA (RSLoRA) (#619 ) If an adapter is trained with RSLoRA, the alpha value is multiplied by `sqrt(rank)`.	2024-09-08 01:00:26 +02:00
turboderp	c1fed2ed19	Add DRY range paramater	2024-09-07 16:06:46 +02:00
turboderp	3e8e181717	Add DRY (still needs testing)	2024-09-07 02:16:16 +02:00
turboderp	affdc0d16c	Ensure streams are always set during the forward pass for the active thread	2024-09-05 20:28:09 +02:00
turboderp	5c455c18a0	Filter base: Fix instance init	2024-09-05 20:25:51 +02:00
turboderp	c9ce168ce0	Merge remote-tracking branch 'origin/dev' into dev	2024-09-04 23:23:39 +02:00
turboderp	1e462f1f7f	Force loading tensors on default stream	2024-09-04 23:04:22 +02:00
Brian Dashore	c18400fa29	Issues: Add issue templates (#615 ) Should help encourage well-formatted issues and make development easier. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 14:58:58 +02:00
turboderp	0d5c0bcc8d	Asynchronous filter evaluation	2024-08-31 16:31:51 +02:00
turboderp	12f08dbbd8	Accept Sequence return type from ExLlamaV2Filter.next()	2024-08-31 12:06:21 +02:00
turboderp	ea27954767	TP: Add (slower) SDPA fallback mode when flash-attn is unavailable	2024-08-29 22:39:45 +02:00
turboderp	40e37f4944	Merge branch 'refs/heads/dev' v0.2.0	2024-08-28 22:56:08 +02:00
turboderp	c050aec764	Bump to 0.2.0	2024-08-28 22:56:01 +02:00
turboderp	1a82283df3	Merge branch 'refs/heads/dev'	2024-08-28 22:55:11 +02:00
turboderp	f1d8909809	Catch all exceptions for nvidia-smi and rocm-smi	2024-08-28 20:25:56 +02:00
turboderp	db14154fee	Don't use default stream for logit padding mask after all	2024-08-27 22:06:19 +02:00
turboderp	8d3d4c227e	Ensure logit padding happens on default stream	2024-08-27 21:48:30 +02:00
turboderp	d9f0ecc12c	TP: Fix vocab split for models with odd vocab sizes	2024-08-27 21:47:49 +02:00
turboderp	7319b6ea31	Fix graph update for MLP with post layernorm	2024-08-27 20:07:21 +02:00
turboderp	4230dab3c1	Fix model_init feedback	2024-08-27 19:16:23 +02:00
turboderp	69291d1333	Fix another possible sync issues with fasttensors (for Windows)	2024-08-25 22:01:53 +02:00

... 3 4 5 6 7 ...

1459 Commits