Commit Graph

570 Commits

Author SHA1 Message Date
Colin
008a0bb777 Fix converting files with docker command 2024-02-20 19:41:57 -05:00
turboderp
c8e2bf4594 Fix small mistake in example 2024-02-19 14:20:17 +01:00
turboderp
229019d86e Add lm-format-enforcer JSON example 2024-02-19 00:56:06 +01:00
turboderp
f194d9d7b0 Add filter_prefer_eos option 2024-02-19 00:14:14 +01:00
turboderp
daf7844d18 Add prefix filter 2024-02-18 23:58:25 +01:00
turboderp
26f4bf8997 Make sure first_token is always set when beginning stream (bugfix) 2024-02-18 22:16:01 +01:00
turboderp
8c3b30dc4b Fix tokenizer decoding for Qwen 2024-02-16 22:53:24 +01:00
turboderp
7af6494afa Drop device tensors for head layer during conversion 2024-02-16 17:31:19 +01:00
turboderp
5967a29eb4 Fix architecture detection 2024-02-16 01:52:26 +01:00
turboderp
cedeb616ce Support Qwen2 2024-02-15 20:50:24 +01:00
turboderp
1bc7c85a27 Disambiguate sampling params 2024-02-15 20:04:47 +01:00
turboderp
702dd9740a VRAM optimizations during quant 2024-02-15 20:03:47 +01:00
turboderp
75f969a6d3 Disable cudaMallocAsync for post2 release 0.0.13.post2 2024-02-15 00:07:47 +01:00
turboderp
0535783ad3 Bump to 0.0.13.post2 2024-02-14 23:54:59 +01:00
turboderp
3424e70cae Only change allocator if Torch is not already imported 2024-02-14 23:54:48 +01:00
turboderp
c29f42626e Use cudaMallocAsync allocator by default 2024-02-14 23:18:07 +01:00
turboderp
69bfbea7b1 Allow autosplit to work with cudaMallocAsync backend 2024-02-14 20:19:28 +01:00
turboderp
9c37d64d74 Remove TODO items 2024-02-14 20:00:10 +01:00
turboderp
b0dc588d9b Remove return values from load_gen 2024-02-14 19:41:59 +01:00
turboderp
1c67f97f3d New API for streaming generator 2024-02-11 20:31:58 +01:00
turboderp
944e523109 Merge pull request #324 from flying-x/master
2 minor changes
2024-02-11 10:09:55 +01:00
turboderp
d7eddbaee0 Optimize typical sampling 2024-02-10 20:13:09 +01:00
turboderp
8639c554ab Option to return presampling probabilities 2024-02-10 16:18:24 +01:00
turboderp
80ed7a5222 Refactor extension 2024-02-10 12:05:34 +01:00
Min Xu
79402c5c7f Merge remote-tracking branch 'upstream/master' 2024-02-05 13:56:58 -08:00
Min Xu
9b964c4b28 better gitignore 2024-02-05 13:55:23 -08:00
turboderp
825929af7d Bump to 0.0.13.post1 0.0.13.post1 2024-02-04 21:46:43 +01:00
turboderp
312f400723 Fix vocab padding in generator 2024-02-04 21:30:37 +01:00
Min Xu
8e13598868 minor changes
1. added .so file to the ignored list
2. removed 2 unused imports from test_inference.py, which also
   avoided a warning for me that was produced by importing
   pandas
2024-02-02 20:19:11 -08:00
turboderp
7feed80cb7 fix typo v0.0.13 2024-02-02 19:04:14 +01:00
turboderp
efacbc53ca Use Torch 2.0.1 for cu117 2024-02-02 19:02:01 +01:00
turboderp
f40e8374e9 Bump version to 0.0.13 2024-02-02 18:45:34 +01:00
turboderp
3cd61203c5 Bump Torch to 2.2.0 2024-02-02 18:43:25 +01:00
turboderp
c0ddebaaaf Update install instructions, remove V1 benchmark 2024-02-02 16:24:19 +01:00
turboderp
480e706342 Merge quad sampling into softmax
Combine temperature and smoothing factor (still separate args to sample_basic)
Allow arbitrary exponent
2024-02-02 15:45:11 +01:00
turboderp
700f5a8921 Merge PR #317 2024-02-02 15:40:09 +01:00
turboderp
b60c34770e Merge quad sampling into softmax
Combine temperature and smoothing factor (still separate args to sample_basic)
Allow arbitrary exponent
2024-02-02 15:07:43 +01:00
turboderp
9f8951e63b More typeable arg shortcut 2024-02-02 15:03:16 +01:00
Alexander Abushady
7c7312d116 Fixed softmax error on quad sampling 2024-02-01 21:35:33 -05:00
Alexander Abushady
3ea67828ea Quadratic Sampling optimizations 2024-02-01 15:01:12 -05:00
turboderp
0e9d9c1010 Prevent tensors passed to save_file from sharing memory 2024-02-01 10:14:36 +01:00
turboderp
439854c1cf Remove unused members, default value for embedding 2024-02-01 07:06:09 +01:00
Alexander Abushady
8461e6fa76 Kalomaze's Quadratic Sampling
Quadratic Sampling
2024-02-01 00:11:44 -05:00
turboderp
b77028a2eb Merge PR #296 2024-02-01 06:04:25 +01:00
turboderp
9437f3b3e0 Fix indentation, refactor a bit 2024-02-01 06:02:18 +01:00
turboderp
887719f0fd Merge remote-tracking branch 'silphendio/silphendio-patch-1' 2024-02-01 05:16:21 +01:00
turboderp
8a0cb9e01d Add last saved checkpoint to status box 2024-02-01 04:56:33 +01:00
turboderp
4c93ce852f Fix remaining time estimate 2024-02-01 04:56:00 +01:00
turboderp
735807e800 Use os.replace to swap checkpoint states in measure.py as well 2024-02-01 04:39:34 +01:00
turboderp
a1c8b790f1 Merge branch 'aiconvert' 2024-02-01 04:25:12 +01:00