Commit Graph

27 Commits

Author SHA1 Message Date
Karl-Johan Alm
0ece2f3006 add layer GPU offloading for hidden/target states 2024-05-22 15:09:57 +09:00
Karl-Johan Alm
b428e239ed optimization: put rfn_sum on cuda and do .item() call out of for loop 2024-05-22 12:43:21 +09:00
turboderp
83baa98ed9 Add machine-parseable output to convert script 2024-05-20 01:49:34 +02:00
turboderp
750c85e2c7 Fixes to allow quantizing Granite 2024-05-09 02:31:21 +02:00
turboderp
b68c0bd89b Fix checkpoint interval 2024-04-18 23:18:58 +02:00
turboderp
672c7355a3 Quant: Change snapshot to time instead of layer interval 2024-04-05 21:49:10 +02:00
turboderp
5d9732165e Quant: Swap some state to CPU and attempt to keep more VRAM available in places 2024-04-05 21:48:08 +02:00
turboderp
4845b1e89d Quant: Save some memory when preparing quantizers for experts 2024-03-29 18:14:48 +01:00
turboderp
7baf3d4198 Adjust warning threshold for uncalibrated experts 2024-03-29 18:12:34 +01:00
turboderp
f3ed1dfed4 Quantize: Memory optimizations 2024-03-19 18:22:23 +01:00
turboderp
9c47269913 Add parallel decoder block 2024-03-19 18:20:44 +01:00
turboderp
0b05686e76 Refactor, clean up and consolidate architecture logic 2024-03-06 02:46:47 +01:00
turboderp
dce84866e1 Support for StarCoder2, initial 2024-03-05 21:20:29 +01:00
turboderp
cedeb616ce Support Qwen2 2024-02-15 20:50:24 +01:00
turboderp
0e9d9c1010 Prevent tensors passed to save_file from sharing memory 2024-02-01 10:14:36 +01:00
turboderp
8a0cb9e01d Add last saved checkpoint to status box 2024-02-01 04:56:33 +01:00
turboderp
4c93ce852f Fix remaining time estimate 2024-02-01 04:56:00 +01:00
turboderp
735807e800 Use os.replace to swap checkpoint states in measure.py as well 2024-02-01 04:39:34 +01:00
turboderp
1e70113de3 Don't print avg accuracy, clarify "completed" -> "measured" 2024-02-01 04:24:10 +01:00
Ben Gorlick
56a0d6d995 Adding graceful exit signal handling and status box for estimating time remaining in quantization process 2024-01-30 17:33:54 -08:00
turboderp
7a9d12ae4c Add non-RMS layernorm, support for Orion 2024-01-22 17:21:01 +01:00
turboderp
48b3211d9c Fix for #281 2024-01-17 06:38:52 +01:00
turboderp
6e214f59c7 Optimize conversion kernels 2024-01-08 03:40:40 +01:00
turboderp
41b15dd1c3 Refactor to consolidate attn params 2024-01-04 04:52:49 +01:00
turboderp
37a1322096 Fix mistake in MLP measure 2023-12-16 20:30:25 +01:00
turboderp
d2753a29b8 Mixtral EXL2 support, initial 2023-12-16 16:50:50 +01:00
turboderp
0d63d6479c Rework quantization and optimization 2023-12-13 01:00:11 +01:00