Commit Graph

519 Commits

Author SHA1 Message Date
turboderp
4c93ce852f Fix remaining time estimate 2024-02-01 04:56:00 +01:00
turboderp
735807e800 Use os.replace to swap checkpoint states in measure.py as well 2024-02-01 04:39:34 +01:00
turboderp
a1c8b790f1 Merge branch 'aiconvert' 2024-02-01 04:25:12 +01:00
turboderp
1e70113de3 Don't print avg accuracy, clarify "completed" -> "measured" 2024-02-01 04:24:10 +01:00
turboderp
a37bd8141f Merge pull request #313 from bgorlick/safetensorai
Resolved compiler Warnings for typecasting for proper byte reading and comparison
2024-02-01 04:15:15 +01:00
Ben Gorlick
1e1be8b842 Added typecasting to ensure proper byte reading and comparison 2024-01-31 06:35:56 -08:00
Ben Gorlick
6c49870ec0 Micro-optimization in file handling when saving checkpoints in quantize.py by using os.replace for atomic operations 2024-01-31 03:22:08 -08:00
Ben Gorlick
56a0d6d995 Adding graceful exit signal handling and status box for estimating time remaining in quantization process 2024-01-30 17:33:54 -08:00
turboderp
9c3fd9df3a Make quantizer sanity check slightly more forgiving 2024-01-30 20:24:40 +01:00
turboderp
305982de43 Expand range for quantized parameter search 2024-01-30 20:22:44 +01:00
turboderp
2bd5ef758f Merge branch 'pr-263' 2024-01-30 17:54:50 +01:00
turboderp
8c9a3ecb49 Add dyn temp options to chat example 2024-01-30 17:51:59 +01:00
turboderp
5da0488e97 Formatting 2024-01-30 17:51:00 +01:00
turboderp
0a2c97a149 Fix variable name 2024-01-30 16:54:43 +01:00
turboderp
30fe6e7a7c Restore kernel instances for m = 1..4 2024-01-30 08:10:44 +01:00
turboderp
2cc9710273 Fix total bits calculation 2024-01-30 08:00:02 +01:00
awtrisk
79a731f78e Update sampler.py 2024-01-28 18:51:33 +05:30
awtrisk
7929ad343d Update ext.cpp 2024-01-28 18:38:38 +05:30
awtrisk
d06d36c883 Merge branch 'turboderp:master' into dynatemp-test 2024-01-28 18:22:09 +05:30
turboderp
8be8867548 Fix build workflow
Fix build workflow
2024-01-22 21:42:19 +01:00
turboderp
f94efb3a0f Bump to 0.0.12 2024-01-22 17:34:51 +01:00
turboderp
2707e28165 Skip .bin files when compiling full model 2024-01-22 17:34:24 +01:00
turboderp
7a9d12ae4c Add non-RMS layernorm, support for Orion 2024-01-22 17:21:01 +01:00
turboderp
9373d0cda0 Fix for .bin files with shared weights 2024-01-22 16:21:34 +01:00
awtrisk
74e4d33298 Update sampling.h 2024-01-22 10:29:07 +05:30
awtrisk
beb026927f Merge dynatemp function to post_softmax_temperature 2024-01-22 10:27:20 +05:30
awtrisk
36efc7fad8 Merge branch 'turboderp:master' into dynatemp-test 2024-01-22 10:23:12 +05:30
turboderp
ec75362ee8 Fix stop token in streaming gen 2024-01-21 10:27:57 +01:00
turboderp
0f83192963 Merge branch 'pr-238'
Change probs return type to tensor
2024-01-20 13:14:21 +01:00
turboderp
31c93a6c1e Omit CUDART_CB macro 2024-01-20 12:48:21 +01:00
turboderp
61864553be Merge remote-tracking branch 'origin/master' 2024-01-20 12:17:33 +01:00
turboderp
59849ab464 Merge pull request #288 from TMK04/master
Improve StreamingGenerator stop conditions efficiency
2024-01-20 12:17:20 +01:00
TMK04
d4f1d8a6c7 Revert 4b3357005a 2024-01-20 18:38:46 +08:00
TMK04
3655c5c50e Merge branch 'master' of https://github.com/TMK04/exllamav2 2024-01-20 18:35:58 +08:00
turboderp
99b19ec5f1 Cleanup examples a bit 2024-01-20 10:57:16 +01:00
turboderp
376218f70b Add batch inference example 2024-01-20 10:51:52 +01:00
turboderp
7cc9e0bd31 Fix case for Torch attn when batch_size < cache batch_size 2024-01-20 10:51:41 +01:00
turboderp
7a226d039e Cache safetensors context managers 2024-01-20 09:36:49 +01:00
turboderp
1f71d17b89 Use .union() for Python 3.8 compatibility 2024-01-20 06:22:14 +01:00
turboderp
46cca34fca Add async request handling 2024-01-19 18:03:35 +01:00
turboderp
2f140f9027 Add stop action to WS server 2024-01-19 17:05:57 +01:00
awtrisk
2b2c094426 Update sampling.h 2024-01-19 21:23:57 +05:30
awtrisk
2fe47c4771 Proper implementation, with a function instead. 2024-01-19 21:17:18 +05:30
turboderp
2ad2a65d3a Add nous prompt format 2024-01-19 16:45:37 +01:00
turboderp
43aa168e19 Fix --fast_safetensors on Windows in model_init.py 2024-01-19 12:07:51 +01:00
turboderp
3fcaa6fca7 Restrict fasttensors to Linux 2024-01-19 07:35:20 +01:00
TMK04
4b3357005a refactor(StreamingGenerator.set_stop_conditions): add stop string as token if it encodes to a single token 2024-01-19 11:59:11 +08:00
TMK04
44625ec985 change StreamingGenerator stop_strings & stop_tokens to sets
No duplicates, and O(1) to check if a token is in the set
2024-01-19 11:40:55 +08:00
turboderp
2bd6118a08 Fix cleanup after fast_safetensors load 2024-01-18 20:21:55 +01:00
turboderp
23fc4737ae Fast safetensors mode with direct IO and pinned buffer 2024-01-18 20:11:53 +01:00