Commit Graph

72 Commits

Author SHA1 Message Date
turboderp
5416d5acc9 spec_decode eval: Allow overriding default num_draft_tokens 2026-04-25 04:00:18 +02:00
turboderp
67ddb6b4ec Add spec_decode eval 2026-04-25 03:20:48 +02:00
turboderp
cb50b9fa6a afmoe: Assert topk_groups=1 and use dots router 2026-04-23 21:05:21 +02:00
turboderp
7f6459f259 MMLU eval: Add redux option 2026-04-18 18:30:25 +02:00
turboderp
0d3893face Loader: Allow specifying max bsz for autosplit to better estimate recurrent state VRAM overhead 2026-04-18 14:12:29 +02:00
turboderp
e90fe55e89 compare_q.py: Add option to format test dataset with chat template (stable measurements for Gemma4) 2026-04-10 20:59:43 +02:00
turboderp
7046a5c739 perf.py: Fix cache overflow 2026-04-06 01:58:27 +02:00
turboderp
514389a2b5 ppl.py: No default datatype for HF mode 2026-04-05 19:43:46 +02:00
turboderp
476ad297ec BBEH eval: Fix results display 2026-04-04 23:29:19 +02:00
turboderp
5bb4e0d32b MMLU eval: Fix confidence interval 2026-04-04 22:49:51 +02:00
turboderp
76acd9c140 Eval: Add BigBench Extra Hard 2026-04-04 22:31:26 +02:00
turboderp
9bd2b5ea4d ppl eval: Combine HF and EXL3 evals into single module, add mode that attempts to replicate default llama.cpp eval tokenization and scoring 2026-04-03 23:37:00 +02:00
turboderp
7d80c39a45 Add IFBench eval 2026-03-25 01:58:45 +01:00
turboderp
85cb54c6f3 perf.py: Make sure test context is nontrivial to force more expert diversity 2026-03-07 01:18:27 +01:00
turboderp
67785fc286 compare_q.py: Paper over some dependency problems 2026-03-02 18:47:39 +01:00
turboderp
b2b6f37e12 perf.py: Error out if test length > cache size 2026-02-17 20:04:13 +01:00
MikeRoz47
52c2f5794d Add optional arg to compare_q to allow it to save plots rather than show them 2026-02-15 16:41:18 +00:00
turboderp
428a082276 Add performance test 2026-01-22 23:28:53 +01:00
turboderp
0d09af403a Diversity test: use greedy sampling for extraction 2026-01-14 21:40:31 +01:00
turboderp
e839152802 Add diversity test 2026-01-11 19:12:04 +01:00
turboderp
6b31fc00f5 Add HF tokenizer helper, refactor example 2026-01-11 12:49:12 +01:00
turboderp
0a629cf70a HumanEval: Add max batch size arg 2025-12-05 13:21:07 +01:00
turboderp
ef8fd43d1c Cleanup unused imports 2025-11-16 14:25:46 +01:00
turboderp
38ddd8b9c5 MMLU: Fix prompt 2025-11-09 22:25:53 +01:00
turboderp
3562dbe7b0 compare_q.py: Work around AutoAWQ being broken in later versions of Transformers 2025-11-09 13:33:40 +01:00
turboderp
d33aba1845 HumanEval: Add MiniMax prompt format 2025-10-31 11:50:14 +01:00
turboderp
3634436641 model_diff.py: Limit batch size (prevent OoM on output layer) 2025-10-29 21:15:35 +01:00
turboderp
aa9a315fd9 compare_q.py: Explicit GC between runs 2025-09-19 19:15:29 +02:00
turboderp
3845775650 ppl_transformers.py: Explicitly make bfloat16 the default dtype 2025-09-18 22:11:19 +02:00
turboderp
d8203063dc PPL eval: Transformers FP32 mode 2025-09-04 00:39:09 +02:00
turboderp
ca806c3386 ppl_transformers.py: Fix input IDs device 2025-08-24 21:39:33 +02:00
turboderp
8377460ac6 prequant_test.py: Disable torch.compile (conflicts with cudaMallocAsync) 2025-08-23 14:55:43 +02:00
turboderp
1f7f3e94c0 compare_q.py: Fix/ignore anyprecision imports (Transformers version mismatch) 2025-07-16 09:49:18 +02:00
turboderp
5cb70f591b model_diff.py: Add option to save IDs and logits 2025-07-15 20:34:10 +02:00
turboderp
4265c9e193 Add Transformers ppl test (equivalent to eval/ppl.py) 2025-07-15 20:33:42 +02:00
turboderp
c09f809876 ppl.py: Add length argument 2025-07-15 20:32:47 +02:00
turboderp
a6d79e5d0d MMLU: Random sample option 2025-07-12 21:14:56 +02:00
turboderp
415a55cc2d MMLU eval: More feedback during eval 2025-07-12 18:31:32 +02:00
turboderp
997ca85bcc Add MMLU eval 2025-07-11 13:55:07 +02:00
turboderp
6341b119ef Loader: Add tensor override script 2025-07-08 18:58:43 +02:00
turboderp
fce1b96e3f prequant_test.py: Add some more options 2025-06-14 15:00:09 +02:00
turboderp
463ebe1841 compare_q.py: Add dark mode 2025-06-12 05:54:57 +02:00
turboderp
32d98c24c1 compare_q.py: Add QTIP wrapper 2025-06-08 15:41:30 +02:00
turboderp
f02c9afd6a compare_q_logits.py: Fix bug 2025-06-08 15:37:41 +02:00
turboderp
db65151b07 compare_q.py: Allow script to run without all backends installed 2025-06-05 02:26:00 +02:00
turboderp
162f99ab8b compare_q.py: Add AnyPrecision models 2025-06-05 02:26:00 +02:00
turboderp
ab875ba730 compare_q.py: Fix GGUF VRAM computation when output.weight precedes token_embd.weight 2025-06-04 23:34:42 +02:00
turboderp
8ff65b8742 compare_q.py: Option to capture logits in streaming mode (for large unquantized models) 2025-05-31 01:11:56 +02:00
turboderp
2cc8f718da Add cosine_error and SQNR measures 2025-05-30 19:43:20 +02:00
turboderp
34d2f1f5fa Add prequant_test script 2025-05-30 19:42:49 +02:00