Commit Graph

76 Commits

Author SHA1 Message Date
turboderp
0b05686e76 Refactor, clean up and consolidate architecture logic 2024-03-06 02:46:47 +01:00
turboderp
dce84866e1 Support for StarCoder2, initial 2024-03-05 21:20:29 +01:00
turboderp
2044f8a31c Set inference_mode when compiling model 2024-02-22 10:48:44 +01:00
turboderp
7af6494afa Drop device tensors for head layer during conversion 2024-02-16 17:31:19 +01:00
turboderp
cedeb616ce Support Qwen2 2024-02-15 20:50:24 +01:00
turboderp
702dd9740a VRAM optimizations during quant 2024-02-15 20:03:47 +01:00
turboderp
0e9d9c1010 Prevent tensors passed to save_file from sharing memory 2024-02-01 10:14:36 +01:00
turboderp
8a0cb9e01d Add last saved checkpoint to status box 2024-02-01 04:56:33 +01:00
turboderp
4c93ce852f Fix remaining time estimate 2024-02-01 04:56:00 +01:00
turboderp
735807e800 Use os.replace to swap checkpoint states in measure.py as well 2024-02-01 04:39:34 +01:00
turboderp
1e70113de3 Don't print avg accuracy, clarify "completed" -> "measured" 2024-02-01 04:24:10 +01:00
Ben Gorlick
6c49870ec0 Micro-optimization in file handling when saving checkpoints in quantize.py by using os.replace for atomic operations 2024-01-31 03:22:08 -08:00
Ben Gorlick
56a0d6d995 Adding graceful exit signal handling and status box for estimating time remaining in quantization process 2024-01-30 17:33:54 -08:00
turboderp
9c3fd9df3a Make quantizer sanity check slightly more forgiving 2024-01-30 20:24:40 +01:00
turboderp
305982de43 Expand range for quantized parameter search 2024-01-30 20:22:44 +01:00
turboderp
2cc9710273 Fix total bits calculation 2024-01-30 08:00:02 +01:00
turboderp
2707e28165 Skip .bin files when compiling full model 2024-01-22 17:34:24 +01:00
turboderp
7a9d12ae4c Add non-RMS layernorm, support for Orion 2024-01-22 17:21:01 +01:00
turboderp
1f71d17b89 Use .union() for Python 3.8 compatibility 2024-01-20 06:22:14 +01:00
turboderp
48b3211d9c Fix for #281 2024-01-17 06:38:52 +01:00
turboderp
7d37b50d90 Fix typos 2024-01-09 07:12:38 +01:00
turboderp
e089313afd Reset norm 2024-01-09 05:30:15 +01:00
turboderp
6e214f59c7 Optimize conversion kernels 2024-01-08 03:40:40 +01:00
turboderp
41b15dd1c3 Refactor to consolidate attn params 2024-01-04 04:52:49 +01:00
turboderp
d36077cf92 Fix converter 2023-12-28 10:11:45 +01:00
turboderp
f4fe920a50 Reset snapshot interval 2023-12-27 17:23:58 +01:00
turboderp
02ce583318 Optimize VRAM usage a bit for quantizer 2023-12-26 00:00:37 +01:00
turboderp
b121ee418f Fix typo 2023-12-17 10:40:11 +01:00
turboderp
8a19badb01 Fix bug in standard cal dataset 2023-12-16 20:30:40 +01:00
turboderp
37a1322096 Fix mistake in MLP measure 2023-12-16 20:30:25 +01:00
turboderp
d2753a29b8 Mixtral EXL2 support, initial 2023-12-16 16:50:50 +01:00
turboderp
104c367451 Quantizer experiments 2023-12-14 01:29:12 +01:00
turboderp
39fd07083a Add error norm 2023-12-13 02:20:25 +01:00
turboderp
0d63d6479c Rework quantization and optimization 2023-12-13 01:00:11 +01:00
turboderp
c1dbe4221f Adjust qparams 2023-12-11 00:31:09 +01:00
turboderp
303d90b65e Fix regression 2023-12-10 19:51:59 +01:00
turboderp
3c43bad57f Revert some changes, calibrate to q state again (fixes 70B low bitrate) 2023-12-10 17:34:18 +01:00
turboderp
a89d85a803 Faster and more stable quant 2023-12-10 03:29:06 +01:00
turboderp
46f6f96662 Include calibration data 2023-12-08 20:20:43 +01:00
turboderp
2e91239571 New quant optimization procedure 2023-12-08 20:19:57 +01:00
turboderp
644805adba Reduce VRAM usage when quantizing 2023-12-02 17:18:53 +01:00
turboderp
f0c01a328b Skip stats for head layer to save system RAM 2023-11-30 19:40:45 +01:00
kingbri
6bfcefe940 Tree: Force utf8 when opening files
The default encoding on linux is utf8, but Windows uses cp1252 which
isn't compatible with some unicode characters.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 19:21:29 -05:00
turboderp
24f00214c9 Revert last commit 2023-11-27 05:29:04 +01:00
turboderp
4ff25ec6ef Ensure inference_mode while quanting 2023-11-26 19:22:40 +01:00
turboderp
714a19ca8f Allow irregular group sizes 2023-11-26 16:53:29 +01:00
turboderp
02b4e65ba1 Cleanup TODO items 2023-10-22 17:57:37 +02:00
turboderp
c7d1bc7ef0 TODO items 2023-10-11 23:44:04 +02:00
turboderp
2b0da96de7 Fix edge case if last layer doesn't fit in last shard 2023-09-23 21:23:23 +02:00
turboderp
385965ed61 Fix padding for extended vocab models 2023-09-22 00:21:49 +02:00