exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-03-15 00:07:26 +00:00

Author	SHA1	Message	Date
turboderp	b25210778c	Remove fasttensors, add platform-agnostic multithreaded ST loader	2024-09-17 00:33:16 +02:00
turboderp	823bf11c68	Update MMLU test to use dynamic batching	2024-06-01 03:31:40 +02:00
turboderp	e7cbb300ff	Update HumanEval test to dynamic generator	2024-05-31 22:24:16 +02:00
turboderp	d778ebc489	Add Granite formatting to HumanEval test	2024-05-09 12:14:57 +02:00
turboderp	c055457577	Add more datasets	2024-04-30 20:38:26 +02:00
turboderp	655c57f4de	Perplexity comparison for AutoAWQ, AutoGPTQ	2024-04-30 16:16:16 +02:00
turboderp	864abeb137	Update MMLU test a bit	2024-04-26 23:26:30 +02:00
turboderp	84b5d2b90f	Cache prompts for MMLU test	2024-04-21 15:14:09 +02:00
turboderp	02edfb9f2f	Fix HumanEval test for Gemma models	2024-03-09 05:59:00 +01:00
turboderp	222c8465bc	Rework HumanEval test	2024-03-07 17:54:07 +01:00
turboderp	23fc4737ae	Fast safetensors mode with direct IO and pinned buffer	2024-01-18 20:11:53 +01:00
turboderp	7262fb8f9d	Batch latency test script	2023-12-23 22:04:40 +01:00
turboderp	bf2710f008	Optimizer batched sampling	2023-12-23 22:04:10 +01:00
turboderp	c284648dbe	Add script to compare quantized and unquantized model	2023-12-23 02:57:13 +01:00
AlpinDale	5131099b5f	add top_a in a few more places	2023-12-21 15:28:34 +00:00
turboderp	c4ae226df5	HumanEval test	2023-12-21 12:29:09 +01:00
David Toth	ed7a104e71	Fix encoder in MMLU	2023-12-17 21:08:27 +00:00
turboderp	d8b4efa8d4	Instrumentation etc.	2023-12-10 17:36:40 +01:00
turboderp	38d393718d	Fix some tokenization edge cases	2023-12-03 22:03:23 +01:00
turboderp	020fa4fcae	Temporary workaround for tokenizers with undefined padding token	2023-11-30 09:01:57 +01:00
turboderp	c165d7b73a	Torture the tokenizer	2023-11-29 08:18:02 +01:00
turboderp	42bcacdb84	Tests for half GEMM kernels	2023-11-25 11:54:53 +01:00
turboderp	b302e310c8	More output in SD example	2023-11-10 20:16:08 +01:00
turboderp	4afe616aee	Fix unhandled OoM condition when loading GPTQ model with auto split Free minimum reserved VRAM on previous device when moving to next device	2023-10-28 20:08:39 +02:00
turboderp	093b89d38c	Add generator versions of model.load() and model.load_autosplit()	2023-10-23 01:17:10 +02:00
turboderp	5834f3a968	Make sure all inference is done in torch.inference_mode()	2023-10-22 20:23:42 +02:00
turboderp	eb2cae6c52	Add auto GPU split feature	2023-10-22 18:48:35 +02:00
turboderp	1d151e73a1	Test script	2023-10-15 22:58:19 +02:00
turboderp	ba5f6191c8	Add typical setting to chat example.	2023-09-26 19:50:44 +02:00
turboderp	f2c773592c	Add MMLU test	2023-09-23 21:25:52 +02:00
turboderp	918368b295	34B testing	2023-09-10 06:15:33 +02:00
turboderp	f79e16c5d0	Optimization, wider loads in EXL2 kernel (int4)	2023-09-07 10:56:43 +02:00
turboderp	c2f62e1f1f	Optimization, wider loads in GPTQ kernel (int2) working	2023-09-07 04:07:13 +02:00
turboderp	3c80d41234	Add 4-bit GPTQ support	2023-09-05 14:03:51 +02:00
turboderp	a386102ac6	Improve prediction of VRAM usage when loading model	2023-09-01 10:47:29 +02:00

35 Commits