exllamav3

mirror of https://github.com/turboderp-org/exllamav3.git synced 2026-03-15 00:07:24 +00:00

Author	SHA1	Message	Date
turboderp	85cb54c6f3	perf.py: Make sure test context is nontrivial to force more expert diversity	2026-03-07 01:18:27 +01:00
turboderp	67785fc286	compare_q.py: Paper over some dependency problems	2026-03-02 18:47:39 +01:00
turboderp	b2b6f37e12	perf.py: Error out if test length > cache size	2026-02-17 20:04:13 +01:00
MikeRoz47	52c2f5794d	Add optional arg to compare_q to allow it to save plots rather than show them	2026-02-15 16:41:18 +00:00
turboderp	428a082276	Add performance test	2026-01-22 23:28:53 +01:00
turboderp	0d09af403a	Diversity test: use greedy sampling for extraction	2026-01-14 21:40:31 +01:00
turboderp	e839152802	Add diversity test	2026-01-11 19:12:04 +01:00
turboderp	6b31fc00f5	Add HF tokenizer helper, refactor example	2026-01-11 12:49:12 +01:00
turboderp	0a629cf70a	HumanEval: Add max batch size arg	2025-12-05 13:21:07 +01:00
turboderp	ef8fd43d1c	Cleanup unused imports	2025-11-16 14:25:46 +01:00
turboderp	38ddd8b9c5	MMLU: Fix prompt	2025-11-09 22:25:53 +01:00
turboderp	3562dbe7b0	compare_q.py: Work around AutoAWQ being broken in later versions of Transformers	2025-11-09 13:33:40 +01:00
turboderp	d33aba1845	HumanEval: Add MiniMax prompt format	2025-10-31 11:50:14 +01:00
turboderp	3634436641	model_diff.py: Limit batch size (prevent OoM on output layer)	2025-10-29 21:15:35 +01:00
turboderp	aa9a315fd9	compare_q.py: Explicit GC between runs	2025-09-19 19:15:29 +02:00
turboderp	3845775650	ppl_transformers.py: Explicitly make bfloat16 the default dtype	2025-09-18 22:11:19 +02:00
turboderp	d8203063dc	PPL eval: Transformers FP32 mode	2025-09-04 00:39:09 +02:00
turboderp	ca806c3386	ppl_transformers.py: Fix input IDs device	2025-08-24 21:39:33 +02:00
turboderp	8377460ac6	prequant_test.py: Disable torch.compile (conflicts with cudaMallocAsync)	2025-08-23 14:55:43 +02:00
turboderp	1f7f3e94c0	compare_q.py: Fix/ignore anyprecision imports (Transformers version mismatch)	2025-07-16 09:49:18 +02:00
turboderp	5cb70f591b	model_diff.py: Add option to save IDs and logits	2025-07-15 20:34:10 +02:00
turboderp	4265c9e193	Add Transformers ppl test (equivalent to eval/ppl.py)	2025-07-15 20:33:42 +02:00
turboderp	c09f809876	ppl.py: Add length argument	2025-07-15 20:32:47 +02:00
turboderp	a6d79e5d0d	MMLU: Random sample option	2025-07-12 21:14:56 +02:00
turboderp	415a55cc2d	MMLU eval: More feedback during eval	2025-07-12 18:31:32 +02:00
turboderp	997ca85bcc	Add MMLU eval	2025-07-11 13:55:07 +02:00
turboderp	6341b119ef	Loader: Add tensor override script	2025-07-08 18:58:43 +02:00
turboderp	fce1b96e3f	prequant_test.py: Add some more options	2025-06-14 15:00:09 +02:00
turboderp	463ebe1841	compare_q.py: Add dark mode	2025-06-12 05:54:57 +02:00
turboderp	32d98c24c1	compare_q.py: Add QTIP wrapper	2025-06-08 15:41:30 +02:00
turboderp	f02c9afd6a	compare_q_logits.py: Fix bug	2025-06-08 15:37:41 +02:00
turboderp	db65151b07	compare_q.py: Allow script to run without all backends installed	2025-06-05 02:26:00 +02:00
turboderp	162f99ab8b	compare_q.py: Add AnyPrecision models	2025-06-05 02:26:00 +02:00
turboderp	ab875ba730	compare_q.py: Fix GGUF VRAM computation when output.weight precedes token_embd.weight	2025-06-04 23:34:42 +02:00
turboderp	8ff65b8742	compare_q.py: Option to capture logits in streaming mode (for large unquantized models)	2025-05-31 01:11:56 +02:00
turboderp	2cc8f718da	Add cosine_error and SQNR measures	2025-05-30 19:43:20 +02:00
turboderp	34d2f1f5fa	Add prequant_test script	2025-05-30 19:42:49 +02:00
turboderp	f8dc9975fe	model_diff.py: Add device argument	2025-05-30 19:42:49 +02:00
turboderp	c0a2028fb5	compare_q.py: Fix some logic for KLD test	2025-05-18 21:55:26 +02:00
turboderp	e1d2fa11d6	compare_q.py: Add -mask arg	2025-05-18 10:58:14 +02:00
turboderp	07ffea7f89	compare_q.py: Fix llama.cpp bpw measurement for MoE models	2025-05-18 00:19:59 +02:00
turboderp	475dfcca47	compare_q.py: Add more GPTQ layer types	2025-05-18 00:19:19 +02:00
turboderp	0488385eb0	Add simple long-context evaluation script	2025-05-17 16:58:12 +02:00
turboderp	3873d40ae2	compare_q.py: Add KLD test and some other tweaks	2025-05-16 16:13:26 +02:00
turboderp	a19538cf1e	compare_q.py: Some fixes	2025-05-16 00:33:48 +02:00
turboderp	7f3096ffd7	compare_q.py: Account for unquantized weights in blocksparse EXL2 layers	2025-05-14 23:55:25 +02:00
turboderp	cb7c70cde0	compare_q.py: Add a little versatility to plot	2025-05-14 17:52:21 +02:00
turboderp	5c3ff204c4	model_diff.py: Use deferred load and close file handles between modules	2025-05-12 21:23:48 +02:00
turboderp	1e1754787e	HumanEval: Move BOS token to individual prompt template, don't prepend by default when tokenizing	2025-05-11 23:02:07 +02:00
turboderp	81a0a7d240	Merge pull request #35 from gakada/humaneval humaneval.py: fix top_k type, remove rep_p, add qwen3	2025-05-11 20:47:03 +02:00

1 2

59 Commits