exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-03-15 00:07:26 +00:00

Author	SHA1	Message	Date
turboderp	c294f3436f	Fix model_diff script	2024-07-06 11:55:37 +02:00
turboderp	adefba1973	Optionally clamp hidden states (for Gemma2)	2024-07-06 11:55:23 +02:00
turboderp	8f5680dfca	Add measurement sanity check	2024-07-06 08:34:21 +02:00
turboderp	21f2a28b0a	Merge remote-tracking branch 'origin/dev' into dev	2024-07-06 08:06:25 +02:00
turboderp	adebcdbd9d	Use gelu_pytorch_tanh() instead of gelu()	2024-07-06 08:06:14 +02:00
turboderp	0963870252	Insist on eager attn for Gemma2 (until flash-attn gets support)	2024-07-06 07:15:29 +02:00
turboderp	bfc3cd9cf3	Support Gemma2	2024-07-06 07:14:47 +02:00
turboderp	01ce7bbb6e	Attn logit softcapping (for eager attn)	2024-07-06 07:13:58 +02:00
Ahmad Fahadh Ilyas	83f0d19cbd	make target_modules in lora usable (#534 )	2024-07-06 03:57:28 +02:00
turboderp	6095c0eb6e	Merge remote-tracking branch 'origin/dev' into dev	2024-07-05 23:57:40 +02:00
Brian Dashore	60eb8347b8	dynamic_async: Properly close the iterator loop on exit (#538 ) When the close method is called, the generator's iterator loop never actually exited. This is because the condition is not notified meaning the task is still running even though it's signalled to cancel. Therefore, add an extra pass if the task is cancelled and unlock the loop by forcing a notify on close. From there, normal cancellation handling will work. There might be a better way to do this, but this way minimizes the amount of added code and makes the most(?) sense. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-05 23:56:38 +02:00
turboderp	66c4a9c849	Support (alternating) SWA	2024-07-04 05:36:39 +02:00
turboderp	84d00cbbc0	Support pre_attn_scalar config entry	2024-07-04 05:36:39 +02:00
turboderp	c1a572bf89	Support pre and post layernorms	2024-07-04 05:31:43 +02:00
turboderp	c8e5cedfb3	Example Gemma template	2024-07-04 05:25:12 +02:00
turboderp	d2b17db5de	Read special tokens list from tokenizer_config.json	2024-07-04 05:25:12 +02:00
turboderp	a1aceaae20	Add final logit softcapping	2024-07-04 05:25:12 +02:00
turboderp	f1179ff200	Add ppl-over-seqlen test	2024-07-03 22:37:51 +02:00
turboderp	38f44096ba	Support InternLM2	2024-07-03 22:37:18 +02:00
turboderp	e56cfe2219	Chatbot: fix chatml template	2024-07-03 22:34:46 +02:00
turboderp	95e093a2b2	Chatbot: Ignore undefined special tokens	2024-07-03 22:34:34 +02:00
turboderp	8c2132453c	More debug output	2024-07-03 22:04:22 +02:00
turboderp	e737e23e30	Util function to sync only active devices	2024-07-01 02:15:34 +02:00
turboderp	198bbdb117	Fix type hint	2024-06-28 00:22:59 +02:00
turboderp	c387587e04	Don't cache encoding in lefttrim_token	2024-06-24 03:14:54 +02:00
turboderp	ef455a7bb9	Respect special tokens in WS server lefttrim_token()	2024-06-24 02:59:49 +02:00
turboderp	6a8172cfce	Bump to v0.1.6 v0.1.6	2024-06-24 02:33:12 +02:00
turboderp	0697196357	itertools.pairwise substitute for Python<3.10	2024-06-24 02:29:15 +02:00
turboderp	1552b06a7a	Add env variable override for no_sdpa	2024-06-24 02:18:48 +02:00
turboderp	2455d1de9b	Add option to disable SDPA	2024-06-24 02:14:17 +02:00
turboderp	6feebfb56e	Fix layernorm kernels for wave64 GPUs	2024-06-24 02:09:46 +02:00
turboderp	05b1f2194e	Fix imports in test_inference	2024-06-24 00:56:29 +02:00
turboderp	9b725dd5cc	Add rich dependency to setup.py	2024-06-24 00:39:57 +02:00
turboderp	547cc96db9	Try not to crash when bos_token_id is not configured	2024-06-23 23:45:10 +02:00
turboderp	bdfe1bd160	Merge branch 'refs/heads/master' into dev	2024-06-23 23:34:21 +02:00
turboderp	a0ea2b0db7	Move conversion script into exllamav2 package	2024-06-21 23:58:39 +02:00
turboderp	6509e90842	Fix xformers import	2024-06-21 23:30:05 +02:00
turboderp	1c1fd2d247	Slightly more correct metadata for quantized config.json	2024-06-21 23:23:56 +02:00
turboderp	847a4f7709	Fix broken --low_mem (#511 )	2024-06-18 01:12:05 +02:00
turboderp	81931e38c2	Remove PAD_TOKEN_ID warning	2024-06-17 01:11:11 +02:00
turboderp	f01e0d0736	Update example	2024-06-17 01:10:50 +02:00
turboderp	c2aac982e4	Globally set Torch number of threads to 1	2024-06-17 00:39:16 +02:00
turboderp	5b1b8d4169	Q GEMM: Initialize with bias when possible	2024-06-17 00:37:36 +02:00
turboderp	a2b2684e9a	Paged attn: Skip some flash-attn wrapper code	2024-06-17 00:34:52 +02:00
turboderp	843cec5206	Non-blocking host-device copies in forward pass	2024-06-16 19:18:01 +02:00
turboderp	522cab53fa	QMLP: Skip .view	2024-06-16 19:14:47 +02:00
turboderp	22d6823f98	Only convert blocked_tokens set to list once	2024-06-16 16:41:17 +02:00
turboderp	ec804a0291	Don't apply temperature in AVX2 softmax when temperature == 1	2024-06-16 16:14:58 +02:00
turboderp	67c270c724	Improve AVX2 softmax approximation	2024-06-16 16:13:42 +02:00
turboderp	3f805f511a	Unpin logit/ID buffers (pinning doesn't improve performance and is potentially problematic)	2024-06-16 14:56:52 +02:00

1 2 3 4 5 ...

1072 Commits