turboderp
c294f3436f
Fix model_diff script
2024-07-06 11:55:37 +02:00
turboderp
adefba1973
Optionally clamp hidden states (for Gemma2)
2024-07-06 11:55:23 +02:00
turboderp
8f5680dfca
Add measurement sanity check
2024-07-06 08:34:21 +02:00
turboderp
21f2a28b0a
Merge remote-tracking branch 'origin/dev' into dev
2024-07-06 08:06:25 +02:00
turboderp
adebcdbd9d
Use gelu_pytorch_tanh() instead of gelu()
2024-07-06 08:06:14 +02:00
turboderp
0963870252
Insist on eager attn for Gemma2 (until flash-attn gets support)
2024-07-06 07:15:29 +02:00
turboderp
bfc3cd9cf3
Support Gemma2
2024-07-06 07:14:47 +02:00
turboderp
01ce7bbb6e
Attn logit softcapping (for eager attn)
2024-07-06 07:13:58 +02:00
Ahmad Fahadh Ilyas
83f0d19cbd
make target_modules in lora usable ( #534 )
2024-07-06 03:57:28 +02:00
turboderp
6095c0eb6e
Merge remote-tracking branch 'origin/dev' into dev
2024-07-05 23:57:40 +02:00
Brian Dashore
60eb8347b8
dynamic_async: Properly close the iterator loop on exit ( #538 )
...
When the close method is called, the generator's iterator loop never
actually exited. This is because the condition is not notified meaning
the task is still running even though it's signalled to cancel.
Therefore, add an extra pass if the task is cancelled and unlock the
loop by forcing a notify on close. From there, normal cancellation
handling will work.
There might be a better way to do this, but this way minimizes the amount
of added code and makes the most(?) sense.
Signed-off-by: kingbri <bdashore3@proton.me >
2024-07-05 23:56:38 +02:00
turboderp
66c4a9c849
Support (alternating) SWA
2024-07-04 05:36:39 +02:00
turboderp
84d00cbbc0
Support pre_attn_scalar config entry
2024-07-04 05:36:39 +02:00
turboderp
c1a572bf89
Support pre and post layernorms
2024-07-04 05:31:43 +02:00
turboderp
c8e5cedfb3
Example Gemma template
2024-07-04 05:25:12 +02:00
turboderp
d2b17db5de
Read special tokens list from tokenizer_config.json
2024-07-04 05:25:12 +02:00
turboderp
a1aceaae20
Add final logit softcapping
2024-07-04 05:25:12 +02:00
turboderp
f1179ff200
Add ppl-over-seqlen test
2024-07-03 22:37:51 +02:00
turboderp
38f44096ba
Support InternLM2
2024-07-03 22:37:18 +02:00
turboderp
e56cfe2219
Chatbot: fix chatml template
2024-07-03 22:34:46 +02:00
turboderp
95e093a2b2
Chatbot: Ignore undefined special tokens
2024-07-03 22:34:34 +02:00
turboderp
8c2132453c
More debug output
2024-07-03 22:04:22 +02:00
turboderp
e737e23e30
Util function to sync only active devices
2024-07-01 02:15:34 +02:00
turboderp
198bbdb117
Fix type hint
2024-06-28 00:22:59 +02:00
turboderp
c387587e04
Don't cache encoding in lefttrim_token
2024-06-24 03:14:54 +02:00
turboderp
ef455a7bb9
Respect special tokens in WS server lefttrim_token()
2024-06-24 02:59:49 +02:00
turboderp
6a8172cfce
Bump to v0.1.6
v0.1.6
2024-06-24 02:33:12 +02:00
turboderp
0697196357
itertools.pairwise substitute for Python<3.10
2024-06-24 02:29:15 +02:00
turboderp
1552b06a7a
Add env variable override for no_sdpa
2024-06-24 02:18:48 +02:00
turboderp
2455d1de9b
Add option to disable SDPA
2024-06-24 02:14:17 +02:00
turboderp
6feebfb56e
Fix layernorm kernels for wave64 GPUs
2024-06-24 02:09:46 +02:00
turboderp
05b1f2194e
Fix imports in test_inference
2024-06-24 00:56:29 +02:00
turboderp
9b725dd5cc
Add rich dependency to setup.py
2024-06-24 00:39:57 +02:00
turboderp
547cc96db9
Try not to crash when bos_token_id is not configured
2024-06-23 23:45:10 +02:00
turboderp
bdfe1bd160
Merge branch 'refs/heads/master' into dev
2024-06-23 23:34:21 +02:00
turboderp
a0ea2b0db7
Move conversion script into exllamav2 package
2024-06-21 23:58:39 +02:00
turboderp
6509e90842
Fix xformers import
2024-06-21 23:30:05 +02:00
turboderp
1c1fd2d247
Slightly more correct metadata for quantized config.json
2024-06-21 23:23:56 +02:00
turboderp
847a4f7709
Fix broken --low_mem ( #511 )
2024-06-18 01:12:05 +02:00
turboderp
81931e38c2
Remove PAD_TOKEN_ID warning
2024-06-17 01:11:11 +02:00
turboderp
f01e0d0736
Update example
2024-06-17 01:10:50 +02:00
turboderp
c2aac982e4
Globally set Torch number of threads to 1
2024-06-17 00:39:16 +02:00
turboderp
5b1b8d4169
Q GEMM: Initialize with bias when possible
2024-06-17 00:37:36 +02:00
turboderp
a2b2684e9a
Paged attn: Skip some flash-attn wrapper code
2024-06-17 00:34:52 +02:00
turboderp
843cec5206
Non-blocking host-device copies in forward pass
2024-06-16 19:18:01 +02:00
turboderp
522cab53fa
QMLP: Skip .view
2024-06-16 19:14:47 +02:00
turboderp
22d6823f98
Only convert blocked_tokens set to list once
2024-06-16 16:41:17 +02:00
turboderp
ec804a0291
Don't apply temperature in AVX2 softmax when temperature == 1
2024-06-16 16:14:58 +02:00
turboderp
67c270c724
Improve AVX2 softmax approximation
2024-06-16 16:13:42 +02:00
turboderp
3f805f511a
Unpin logit/ID buffers (pinning doesn't improve performance and is potentially problematic)
2024-06-16 14:56:52 +02:00