Commit Graph

20 Commits

Author SHA1 Message Date
turboderp
60eedf4622 Add exit status code for quant error 2024-06-13 20:43:49 +02:00
turboderp
83baa98ed9 Add machine-parseable output to convert script 2024-05-20 01:49:34 +02:00
turboderp
750c85e2c7 Fixes to allow quantizing Granite 2024-05-09 02:31:21 +02:00
turboderp
52bc008df9 Don't add metadata when -cf not specified 2024-04-06 17:28:57 +02:00
turboderp
97e8123c71 Enable head (qk) norms for quantized models 2024-04-05 21:35:23 +02:00
turboderp
9c47269913 Add parallel decoder block 2024-03-19 18:20:44 +01:00
turboderp
5fb2c679cb Add quantization_config to config.json when compiling 2024-03-12 09:09:30 +01:00
turboderp
0b05686e76 Refactor, clean up and consolidate architecture logic 2024-03-06 02:46:47 +01:00
turboderp
dce84866e1 Support for StarCoder2, initial 2024-03-05 21:20:29 +01:00
turboderp
2044f8a31c Set inference_mode when compiling model 2024-02-22 10:48:44 +01:00
turboderp
0e9d9c1010 Prevent tensors passed to save_file from sharing memory 2024-02-01 10:14:36 +01:00
turboderp
2707e28165 Skip .bin files when compiling full model 2024-01-22 17:34:24 +01:00
turboderp
7a9d12ae4c Add non-RMS layernorm, support for Orion 2024-01-22 17:21:01 +01:00
turboderp
1f71d17b89 Use .union() for Python 3.8 compatibility 2024-01-20 06:22:14 +01:00
turboderp
d2753a29b8 Mixtral EXL2 support, initial 2023-12-16 16:50:50 +01:00
turboderp
2b0da96de7 Fix edge case if last layer doesn't fit in last shard 2023-09-23 21:23:23 +02:00
turboderp
2a3ff14af2 Remove repeated console output 2023-09-20 09:54:43 +02:00
turboderp
6fd006b9d0 More options for converter to facilitate scripting 2023-09-18 18:25:30 +02:00
turboderp
af1398ff16 Conversion: ability to save sharded models (addresses OoM when compiling output file) 2023-09-16 11:44:07 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00