35 Commits

Author SHA1 Message Date
turboderp
b25210778c Remove fasttensors, add platform-agnostic multithreaded ST loader 2024-09-17 00:33:16 +02:00
turboderp
823bf11c68 Update MMLU test to use dynamic batching 2024-06-01 03:31:40 +02:00
turboderp
e7cbb300ff Update HumanEval test to dynamic generator 2024-05-31 22:24:16 +02:00
turboderp
d778ebc489 Add Granite formatting to HumanEval test 2024-05-09 12:14:57 +02:00
turboderp
c055457577 Add more datasets 2024-04-30 20:38:26 +02:00
turboderp
655c57f4de Perplexity comparison for AutoAWQ, AutoGPTQ 2024-04-30 16:16:16 +02:00
turboderp
864abeb137 Update MMLU test a bit 2024-04-26 23:26:30 +02:00
turboderp
84b5d2b90f Cache prompts for MMLU test 2024-04-21 15:14:09 +02:00
turboderp
02edfb9f2f Fix HumanEval test for Gemma models 2024-03-09 05:59:00 +01:00
turboderp
222c8465bc Rework HumanEval test 2024-03-07 17:54:07 +01:00
turboderp
23fc4737ae Fast safetensors mode with direct IO and pinned buffer 2024-01-18 20:11:53 +01:00
turboderp
7262fb8f9d Batch latency test script 2023-12-23 22:04:40 +01:00
turboderp
bf2710f008 Optimizer batched sampling 2023-12-23 22:04:10 +01:00
turboderp
c284648dbe Add script to compare quantized and unquantized model 2023-12-23 02:57:13 +01:00
AlpinDale
5131099b5f add top_a in a few more places 2023-12-21 15:28:34 +00:00
turboderp
c4ae226df5 HumanEval test 2023-12-21 12:29:09 +01:00
David Toth
ed7a104e71 Fix encoder in MMLU 2023-12-17 21:08:27 +00:00
turboderp
d8b4efa8d4 Instrumentation etc. 2023-12-10 17:36:40 +01:00
turboderp
38d393718d Fix some tokenization edge cases 2023-12-03 22:03:23 +01:00
turboderp
020fa4fcae Temporary workaround for tokenizers with undefined padding token 2023-11-30 09:01:57 +01:00
turboderp
c165d7b73a Torture the tokenizer 2023-11-29 08:18:02 +01:00
turboderp
42bcacdb84 Tests for half GEMM kernels 2023-11-25 11:54:53 +01:00
turboderp
b302e310c8 More output in SD example 2023-11-10 20:16:08 +01:00
turboderp
4afe616aee Fix unhandled OoM condition when loading GPTQ model with auto split
Free minimum reserved VRAM on previous device when moving to next device
2023-10-28 20:08:39 +02:00
turboderp
093b89d38c Add generator versions of model.load() and model.load_autosplit() 2023-10-23 01:17:10 +02:00
turboderp
5834f3a968 Make sure all inference is done in torch.inference_mode() 2023-10-22 20:23:42 +02:00
turboderp
eb2cae6c52 Add auto GPU split feature 2023-10-22 18:48:35 +02:00
turboderp
1d151e73a1 Test script 2023-10-15 22:58:19 +02:00
turboderp
ba5f6191c8 Add typical setting to chat example. 2023-09-26 19:50:44 +02:00
turboderp
f2c773592c Add MMLU test 2023-09-23 21:25:52 +02:00
turboderp
918368b295 34B testing 2023-09-10 06:15:33 +02:00
turboderp
f79e16c5d0 Optimization, wider loads in EXL2 kernel (int4) 2023-09-07 10:56:43 +02:00
turboderp
c2f62e1f1f Optimization, wider loads in GPTQ kernel (int2) working 2023-09-07 04:07:13 +02:00
turboderp
3c80d41234 Add 4-bit GPTQ support 2023-09-05 14:03:51 +02:00
turboderp
a386102ac6 Improve prediction of VRAM usage when loading model 2023-09-01 10:47:29 +02:00