turboderp
|
4c93ce852f
|
Fix remaining time estimate
|
2024-02-01 04:56:00 +01:00 |
|
turboderp
|
735807e800
|
Use os.replace to swap checkpoint states in measure.py as well
|
2024-02-01 04:39:34 +01:00 |
|
turboderp
|
a1c8b790f1
|
Merge branch 'aiconvert'
|
2024-02-01 04:25:12 +01:00 |
|
turboderp
|
1e70113de3
|
Don't print avg accuracy, clarify "completed" -> "measured"
|
2024-02-01 04:24:10 +01:00 |
|
turboderp
|
a37bd8141f
|
Merge pull request #313 from bgorlick/safetensorai
Resolved compiler Warnings for typecasting for proper byte reading and comparison
|
2024-02-01 04:15:15 +01:00 |
|
Ben Gorlick
|
1e1be8b842
|
Added typecasting to ensure proper byte reading and comparison
|
2024-01-31 06:35:56 -08:00 |
|
Ben Gorlick
|
6c49870ec0
|
Micro-optimization in file handling when saving checkpoints in quantize.py by using os.replace for atomic operations
|
2024-01-31 03:22:08 -08:00 |
|
Ben Gorlick
|
56a0d6d995
|
Adding graceful exit signal handling and status box for estimating time remaining in quantization process
|
2024-01-30 17:33:54 -08:00 |
|
turboderp
|
9c3fd9df3a
|
Make quantizer sanity check slightly more forgiving
|
2024-01-30 20:24:40 +01:00 |
|
turboderp
|
305982de43
|
Expand range for quantized parameter search
|
2024-01-30 20:22:44 +01:00 |
|
turboderp
|
2bd5ef758f
|
Merge branch 'pr-263'
|
2024-01-30 17:54:50 +01:00 |
|
turboderp
|
8c9a3ecb49
|
Add dyn temp options to chat example
|
2024-01-30 17:51:59 +01:00 |
|
turboderp
|
5da0488e97
|
Formatting
|
2024-01-30 17:51:00 +01:00 |
|
turboderp
|
0a2c97a149
|
Fix variable name
|
2024-01-30 16:54:43 +01:00 |
|
turboderp
|
30fe6e7a7c
|
Restore kernel instances for m = 1..4
|
2024-01-30 08:10:44 +01:00 |
|
turboderp
|
2cc9710273
|
Fix total bits calculation
|
2024-01-30 08:00:02 +01:00 |
|
awtrisk
|
79a731f78e
|
Update sampler.py
|
2024-01-28 18:51:33 +05:30 |
|
awtrisk
|
7929ad343d
|
Update ext.cpp
|
2024-01-28 18:38:38 +05:30 |
|
awtrisk
|
d06d36c883
|
Merge branch 'turboderp:master' into dynatemp-test
|
2024-01-28 18:22:09 +05:30 |
|
turboderp
|
8be8867548
|
Fix build workflow
Fix build workflow
|
2024-01-22 21:42:19 +01:00 |
|
turboderp
|
f94efb3a0f
|
Bump to 0.0.12
|
2024-01-22 17:34:51 +01:00 |
|
turboderp
|
2707e28165
|
Skip .bin files when compiling full model
|
2024-01-22 17:34:24 +01:00 |
|
turboderp
|
7a9d12ae4c
|
Add non-RMS layernorm, support for Orion
|
2024-01-22 17:21:01 +01:00 |
|
turboderp
|
9373d0cda0
|
Fix for .bin files with shared weights
|
2024-01-22 16:21:34 +01:00 |
|
awtrisk
|
74e4d33298
|
Update sampling.h
|
2024-01-22 10:29:07 +05:30 |
|
awtrisk
|
beb026927f
|
Merge dynatemp function to post_softmax_temperature
|
2024-01-22 10:27:20 +05:30 |
|
awtrisk
|
36efc7fad8
|
Merge branch 'turboderp:master' into dynatemp-test
|
2024-01-22 10:23:12 +05:30 |
|
turboderp
|
ec75362ee8
|
Fix stop token in streaming gen
|
2024-01-21 10:27:57 +01:00 |
|
turboderp
|
0f83192963
|
Merge branch 'pr-238'
Change probs return type to tensor
|
2024-01-20 13:14:21 +01:00 |
|
turboderp
|
31c93a6c1e
|
Omit CUDART_CB macro
|
2024-01-20 12:48:21 +01:00 |
|
turboderp
|
61864553be
|
Merge remote-tracking branch 'origin/master'
|
2024-01-20 12:17:33 +01:00 |
|
turboderp
|
59849ab464
|
Merge pull request #288 from TMK04/master
Improve StreamingGenerator stop conditions efficiency
|
2024-01-20 12:17:20 +01:00 |
|
TMK04
|
d4f1d8a6c7
|
Revert 4b3357005a
|
2024-01-20 18:38:46 +08:00 |
|
TMK04
|
3655c5c50e
|
Merge branch 'master' of https://github.com/TMK04/exllamav2
|
2024-01-20 18:35:58 +08:00 |
|
turboderp
|
99b19ec5f1
|
Cleanup examples a bit
|
2024-01-20 10:57:16 +01:00 |
|
turboderp
|
376218f70b
|
Add batch inference example
|
2024-01-20 10:51:52 +01:00 |
|
turboderp
|
7cc9e0bd31
|
Fix case for Torch attn when batch_size < cache batch_size
|
2024-01-20 10:51:41 +01:00 |
|
turboderp
|
7a226d039e
|
Cache safetensors context managers
|
2024-01-20 09:36:49 +01:00 |
|
turboderp
|
1f71d17b89
|
Use .union() for Python 3.8 compatibility
|
2024-01-20 06:22:14 +01:00 |
|
turboderp
|
46cca34fca
|
Add async request handling
|
2024-01-19 18:03:35 +01:00 |
|
turboderp
|
2f140f9027
|
Add stop action to WS server
|
2024-01-19 17:05:57 +01:00 |
|
awtrisk
|
2b2c094426
|
Update sampling.h
|
2024-01-19 21:23:57 +05:30 |
|
awtrisk
|
2fe47c4771
|
Proper implementation, with a function instead.
|
2024-01-19 21:17:18 +05:30 |
|
turboderp
|
2ad2a65d3a
|
Add nous prompt format
|
2024-01-19 16:45:37 +01:00 |
|
turboderp
|
43aa168e19
|
Fix --fast_safetensors on Windows in model_init.py
|
2024-01-19 12:07:51 +01:00 |
|
turboderp
|
3fcaa6fca7
|
Restrict fasttensors to Linux
|
2024-01-19 07:35:20 +01:00 |
|
TMK04
|
4b3357005a
|
refactor(StreamingGenerator.set_stop_conditions): add stop string as token if it encodes to a single token
|
2024-01-19 11:59:11 +08:00 |
|
TMK04
|
44625ec985
|
change StreamingGenerator stop_strings & stop_tokens to sets
No duplicates, and O(1) to check if a token is in the set
|
2024-01-19 11:40:55 +08:00 |
|
turboderp
|
2bd6118a08
|
Fix cleanup after fast_safetensors load
|
2024-01-18 20:21:55 +01:00 |
|
turboderp
|
23fc4737ae
|
Fast safetensors mode with direct IO and pinned buffer
|
2024-01-18 20:11:53 +01:00 |
|