exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-05-04 21:21:25 +00:00

Author	SHA1	Message	Date
turboderp	60eedf4622	Add exit status code for quant error	2024-06-13 20:43:49 +02:00
turboderp	83baa98ed9	Add machine-parseable output to convert script	2024-05-20 01:49:34 +02:00
turboderp	750c85e2c7	Fixes to allow quantizing Granite	2024-05-09 02:31:21 +02:00
turboderp	52bc008df9	Don't add metadata when -cf not specified	2024-04-06 17:28:57 +02:00
turboderp	97e8123c71	Enable head (qk) norms for quantized models	2024-04-05 21:35:23 +02:00
turboderp	9c47269913	Add parallel decoder block	2024-03-19 18:20:44 +01:00
turboderp	5fb2c679cb	Add quantization_config to config.json when compiling	2024-03-12 09:09:30 +01:00
turboderp	0b05686e76	Refactor, clean up and consolidate architecture logic	2024-03-06 02:46:47 +01:00
turboderp	dce84866e1	Support for StarCoder2, initial	2024-03-05 21:20:29 +01:00
turboderp	2044f8a31c	Set inference_mode when compiling model	2024-02-22 10:48:44 +01:00
turboderp	0e9d9c1010	Prevent tensors passed to save_file from sharing memory	2024-02-01 10:14:36 +01:00
turboderp	2707e28165	Skip .bin files when compiling full model	2024-01-22 17:34:24 +01:00
turboderp	7a9d12ae4c	Add non-RMS layernorm, support for Orion	2024-01-22 17:21:01 +01:00
turboderp	1f71d17b89	Use .union() for Python 3.8 compatibility	2024-01-20 06:22:14 +01:00
turboderp	d2753a29b8	Mixtral EXL2 support, initial	2023-12-16 16:50:50 +01:00
turboderp	2b0da96de7	Fix edge case if last layer doesn't fit in last shard	2023-09-23 21:23:23 +02:00
turboderp	2a3ff14af2	Remove repeated console output	2023-09-20 09:54:43 +02:00
turboderp	6fd006b9d0	More options for converter to facilitate scripting	2023-09-18 18:25:30 +02:00
turboderp	af1398ff16	Conversion: ability to save sharded models (addresses OoM when compiling output file)	2023-09-16 11:44:07 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00

20 Commits