Commit Graph

1072 Commits

Author SHA1 Message Date
turboderp
c294f3436f Fix model_diff script 2024-07-06 11:55:37 +02:00
turboderp
adefba1973 Optionally clamp hidden states (for Gemma2) 2024-07-06 11:55:23 +02:00
turboderp
8f5680dfca Add measurement sanity check 2024-07-06 08:34:21 +02:00
turboderp
21f2a28b0a Merge remote-tracking branch 'origin/dev' into dev 2024-07-06 08:06:25 +02:00
turboderp
adebcdbd9d Use gelu_pytorch_tanh() instead of gelu() 2024-07-06 08:06:14 +02:00
turboderp
0963870252 Insist on eager attn for Gemma2 (until flash-attn gets support) 2024-07-06 07:15:29 +02:00
turboderp
bfc3cd9cf3 Support Gemma2 2024-07-06 07:14:47 +02:00
turboderp
01ce7bbb6e Attn logit softcapping (for eager attn) 2024-07-06 07:13:58 +02:00
Ahmad Fahadh Ilyas
83f0d19cbd make target_modules in lora usable (#534) 2024-07-06 03:57:28 +02:00
turboderp
6095c0eb6e Merge remote-tracking branch 'origin/dev' into dev 2024-07-05 23:57:40 +02:00
Brian Dashore
60eb8347b8 dynamic_async: Properly close the iterator loop on exit (#538)
When the close method is called, the generator's iterator loop never
actually exited. This is because the condition is not notified meaning
the task is still running even though it's signalled to cancel.

Therefore, add an extra pass if the task is cancelled and unlock the
loop by forcing a notify on close. From there, normal cancellation
handling will work.

There might be a better way to do this, but this way minimizes the amount
of added code and makes the most(?) sense.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-05 23:56:38 +02:00
turboderp
66c4a9c849 Support (alternating) SWA 2024-07-04 05:36:39 +02:00
turboderp
84d00cbbc0 Support pre_attn_scalar config entry 2024-07-04 05:36:39 +02:00
turboderp
c1a572bf89 Support pre and post layernorms 2024-07-04 05:31:43 +02:00
turboderp
c8e5cedfb3 Example Gemma template 2024-07-04 05:25:12 +02:00
turboderp
d2b17db5de Read special tokens list from tokenizer_config.json 2024-07-04 05:25:12 +02:00
turboderp
a1aceaae20 Add final logit softcapping 2024-07-04 05:25:12 +02:00
turboderp
f1179ff200 Add ppl-over-seqlen test 2024-07-03 22:37:51 +02:00
turboderp
38f44096ba Support InternLM2 2024-07-03 22:37:18 +02:00
turboderp
e56cfe2219 Chatbot: fix chatml template 2024-07-03 22:34:46 +02:00
turboderp
95e093a2b2 Chatbot: Ignore undefined special tokens 2024-07-03 22:34:34 +02:00
turboderp
8c2132453c More debug output 2024-07-03 22:04:22 +02:00
turboderp
e737e23e30 Util function to sync only active devices 2024-07-01 02:15:34 +02:00
turboderp
198bbdb117 Fix type hint 2024-06-28 00:22:59 +02:00
turboderp
c387587e04 Don't cache encoding in lefttrim_token 2024-06-24 03:14:54 +02:00
turboderp
ef455a7bb9 Respect special tokens in WS server lefttrim_token() 2024-06-24 02:59:49 +02:00
turboderp
6a8172cfce Bump to v0.1.6 v0.1.6 2024-06-24 02:33:12 +02:00
turboderp
0697196357 itertools.pairwise substitute for Python<3.10 2024-06-24 02:29:15 +02:00
turboderp
1552b06a7a Add env variable override for no_sdpa 2024-06-24 02:18:48 +02:00
turboderp
2455d1de9b Add option to disable SDPA 2024-06-24 02:14:17 +02:00
turboderp
6feebfb56e Fix layernorm kernels for wave64 GPUs 2024-06-24 02:09:46 +02:00
turboderp
05b1f2194e Fix imports in test_inference 2024-06-24 00:56:29 +02:00
turboderp
9b725dd5cc Add rich dependency to setup.py 2024-06-24 00:39:57 +02:00
turboderp
547cc96db9 Try not to crash when bos_token_id is not configured 2024-06-23 23:45:10 +02:00
turboderp
bdfe1bd160 Merge branch 'refs/heads/master' into dev 2024-06-23 23:34:21 +02:00
turboderp
a0ea2b0db7 Move conversion script into exllamav2 package 2024-06-21 23:58:39 +02:00
turboderp
6509e90842 Fix xformers import 2024-06-21 23:30:05 +02:00
turboderp
1c1fd2d247 Slightly more correct metadata for quantized config.json 2024-06-21 23:23:56 +02:00
turboderp
847a4f7709 Fix broken --low_mem (#511) 2024-06-18 01:12:05 +02:00
turboderp
81931e38c2 Remove PAD_TOKEN_ID warning 2024-06-17 01:11:11 +02:00
turboderp
f01e0d0736 Update example 2024-06-17 01:10:50 +02:00
turboderp
c2aac982e4 Globally set Torch number of threads to 1 2024-06-17 00:39:16 +02:00
turboderp
5b1b8d4169 Q GEMM: Initialize with bias when possible 2024-06-17 00:37:36 +02:00
turboderp
a2b2684e9a Paged attn: Skip some flash-attn wrapper code 2024-06-17 00:34:52 +02:00
turboderp
843cec5206 Non-blocking host-device copies in forward pass 2024-06-16 19:18:01 +02:00
turboderp
522cab53fa QMLP: Skip .view 2024-06-16 19:14:47 +02:00
turboderp
22d6823f98 Only convert blocked_tokens set to list once 2024-06-16 16:41:17 +02:00
turboderp
ec804a0291 Don't apply temperature in AVX2 softmax when temperature == 1 2024-06-16 16:14:58 +02:00
turboderp
67c270c724 Improve AVX2 softmax approximation 2024-06-16 16:13:42 +02:00
turboderp
3f805f511a Unpin logit/ID buffers (pinning doesn't improve performance and is potentially problematic) 2024-06-16 14:56:52 +02:00