turboderp
|
b25210778c
|
Remove fasttensors, add platform-agnostic multithreaded ST loader
|
2024-09-17 00:33:16 +02:00 |
|
turboderp
|
823bf11c68
|
Update MMLU test to use dynamic batching
|
2024-06-01 03:31:40 +02:00 |
|
turboderp
|
e7cbb300ff
|
Update HumanEval test to dynamic generator
|
2024-05-31 22:24:16 +02:00 |
|
turboderp
|
d778ebc489
|
Add Granite formatting to HumanEval test
|
2024-05-09 12:14:57 +02:00 |
|
turboderp
|
c055457577
|
Add more datasets
|
2024-04-30 20:38:26 +02:00 |
|
turboderp
|
655c57f4de
|
Perplexity comparison for AutoAWQ, AutoGPTQ
|
2024-04-30 16:16:16 +02:00 |
|
turboderp
|
864abeb137
|
Update MMLU test a bit
|
2024-04-26 23:26:30 +02:00 |
|
turboderp
|
84b5d2b90f
|
Cache prompts for MMLU test
|
2024-04-21 15:14:09 +02:00 |
|
turboderp
|
02edfb9f2f
|
Fix HumanEval test for Gemma models
|
2024-03-09 05:59:00 +01:00 |
|
turboderp
|
222c8465bc
|
Rework HumanEval test
|
2024-03-07 17:54:07 +01:00 |
|
turboderp
|
23fc4737ae
|
Fast safetensors mode with direct IO and pinned buffer
|
2024-01-18 20:11:53 +01:00 |
|
turboderp
|
7262fb8f9d
|
Batch latency test script
|
2023-12-23 22:04:40 +01:00 |
|
turboderp
|
bf2710f008
|
Optimizer batched sampling
|
2023-12-23 22:04:10 +01:00 |
|
turboderp
|
c284648dbe
|
Add script to compare quantized and unquantized model
|
2023-12-23 02:57:13 +01:00 |
|
AlpinDale
|
5131099b5f
|
add top_a in a few more places
|
2023-12-21 15:28:34 +00:00 |
|
turboderp
|
c4ae226df5
|
HumanEval test
|
2023-12-21 12:29:09 +01:00 |
|
David Toth
|
ed7a104e71
|
Fix encoder in MMLU
|
2023-12-17 21:08:27 +00:00 |
|
turboderp
|
d8b4efa8d4
|
Instrumentation etc.
|
2023-12-10 17:36:40 +01:00 |
|
turboderp
|
38d393718d
|
Fix some tokenization edge cases
|
2023-12-03 22:03:23 +01:00 |
|
turboderp
|
020fa4fcae
|
Temporary workaround for tokenizers with undefined padding token
|
2023-11-30 09:01:57 +01:00 |
|
turboderp
|
c165d7b73a
|
Torture the tokenizer
|
2023-11-29 08:18:02 +01:00 |
|
turboderp
|
42bcacdb84
|
Tests for half GEMM kernels
|
2023-11-25 11:54:53 +01:00 |
|
turboderp
|
b302e310c8
|
More output in SD example
|
2023-11-10 20:16:08 +01:00 |
|
turboderp
|
4afe616aee
|
Fix unhandled OoM condition when loading GPTQ model with auto split
Free minimum reserved VRAM on previous device when moving to next device
|
2023-10-28 20:08:39 +02:00 |
|
turboderp
|
093b89d38c
|
Add generator versions of model.load() and model.load_autosplit()
|
2023-10-23 01:17:10 +02:00 |
|
turboderp
|
5834f3a968
|
Make sure all inference is done in torch.inference_mode()
|
2023-10-22 20:23:42 +02:00 |
|
turboderp
|
eb2cae6c52
|
Add auto GPU split feature
|
2023-10-22 18:48:35 +02:00 |
|
turboderp
|
1d151e73a1
|
Test script
|
2023-10-15 22:58:19 +02:00 |
|
turboderp
|
ba5f6191c8
|
Add typical setting to chat example.
|
2023-09-26 19:50:44 +02:00 |
|
turboderp
|
f2c773592c
|
Add MMLU test
|
2023-09-23 21:25:52 +02:00 |
|
turboderp
|
918368b295
|
34B testing
|
2023-09-10 06:15:33 +02:00 |
|
turboderp
|
f79e16c5d0
|
Optimization, wider loads in EXL2 kernel (int4)
|
2023-09-07 10:56:43 +02:00 |
|
turboderp
|
c2f62e1f1f
|
Optimization, wider loads in GPTQ kernel (int2) working
|
2023-09-07 04:07:13 +02:00 |
|
turboderp
|
3c80d41234
|
Add 4-bit GPTQ support
|
2023-09-05 14:03:51 +02:00 |
|
turboderp
|
a386102ac6
|
Improve prediction of VRAM usage when loading model
|
2023-09-01 10:47:29 +02:00 |
|