52 Commits

Author SHA1 Message Date
turboderp
c8fa853c89 Test script: Allow --eval_rows in wiki2 ppl test 2025-01-09 11:14:48 +01:00
turboderp
6cdfa5e52f Refactor architecture logic for code reuse between LLM/VLM 2024-11-03 22:34:25 +01:00
turboderp
b30f796690 TP mode for attn layer, non-paged 2024-08-14 23:41:10 +02:00
turboderp
036506f273 Use high priority stream for forward pass 2024-07-27 16:05:12 +02:00
turboderp
1179b8a5e5 Fix ppl test for long seq lengths 2024-07-10 08:05:57 +02:00
turboderp
f1179ff200 Add ppl-over-seqlen test 2024-07-03 22:37:51 +02:00
turboderp
05b1f2194e Fix imports in test_inference 2024-06-24 00:56:29 +02:00
turboderp
f3596fc0d9 Add Q6 cache mode 2024-06-09 01:23:50 +02:00
turboderp
fb61a817ec Add Q8 cache mode 2024-06-08 15:33:19 +02:00
turboderp
7502eef349 Better sampling settings for test gen 2024-05-09 02:31:50 +02:00
turboderp
dc1dfc4dd5 Update prompt speed test 2024-04-18 09:19:25 +02:00
turboderp
3be55a97af Test inference script: add max_output_len option 2024-04-05 21:35:52 +02:00
turboderp
d5e4f66a05 Fix standard ppl test 2024-04-03 17:10:19 +02:00
turboderp
ad67790e73 softmax+topk kernel for 16 experts 2024-03-30 09:31:58 +01:00
turboderp
37c3b69958 Merge pull request #374 from Lyrcaxis/patch-1
Fix installation step (install requirements) & Add multi-GPU explanation
2024-03-20 04:57:50 +01:00
turboderp
9c47269913 Add parallel decoder block 2024-03-19 18:20:44 +01:00
turboderp
21772adaf9 Add logit scale 2024-03-19 18:20:44 +01:00
turboderp
efd20eec03 Make BOS preference arch dependent 2024-03-19 18:20:44 +01:00
Thanasis Galianis
8e8a711687 Update test_inference.py 2024-03-18 22:39:27 +02:00
Thanasis Galianis
2ee57974dc Added --gpu_split explanation on test_inference.py 2024-03-18 21:36:22 +02:00
turboderp
bafe539728 Add Q4 cache mode 2024-03-03 23:34:11 +01:00
turboderp
a19a2eccb4 Add option to force BOS for ppl test 2024-02-22 14:44:27 +01:00
turboderp
983a229913 Add BOS token by default in test_inference.py, option to override 2024-02-22 09:42:43 +01:00
Min Xu
8e13598868 minor changes
1. added .so file to the ignored list
2. removed 2 unused imports from test_inference.py, which also
   avoided a warning for me that was produced by importing
   pandas
2024-02-02 20:19:11 -08:00
turboderp
23fc4737ae Fast safetensors mode with direct IO and pinned buffer 2024-01-18 20:11:53 +01:00
turboderp
024080186f Util functions for rank-reduce experiment 2024-01-06 09:52:59 +01:00
turboderp
e1010218a7 Reduce chunk size to reduce likelihood of OoM during ppl test 2024-01-06 07:48:16 +01:00
turboderp
41b15dd1c3 Refactor to consolidate attn params 2024-01-04 04:52:49 +01:00
turboderp
5ddf57f945 Fix regular ppl test 2023-12-30 22:14:43 +01:00
turboderp
4d5ef3b53d Attempt to add standard ppl test (experimental) 2023-12-30 01:39:20 +01:00
turboderp
a52d410d4a Attempt to add standard ppl test (experimental) 2023-12-30 01:39:03 +01:00
turboderp
5eef9beef1 Fix OoM when running multiple tests with -gs auto 2023-12-15 19:11:31 +01:00
turboderp
95ddeb7588 Layer-streaming perplexity test, fix for no_flash_attn 2023-12-15 15:40:38 +01:00
turboderp
0fd1241a54 Layer-streaming perplexity test 2023-12-14 19:18:57 +01:00
turboderp
7bdde5a8e3 no_warmup option in test script 2023-12-14 01:30:12 +01:00
turboderp
9f01116fb4 Fix OoM when testing PPL with large vocab 2023-12-12 13:02:42 +01:00
turboderp
0597668673 Safer sampler settings for prompt completion in test script 2023-12-11 00:25:32 +01:00
turboderp
d8b4efa8d4 Instrumentation etc. 2023-12-10 17:36:40 +01:00
turboderp
f475b442b8 Add layer replace option to test_inference.py 2023-12-09 21:47:19 +01:00
turboderp
0aeca11fa6 Add token ppl and 8-bit cache test to test_inference script 2023-12-03 22:04:01 +01:00
turboderp
7f35594a54 Add auto split to model_init and test_inference 2023-10-22 19:11:29 +02:00
turboderp
7bd131e738 Use sampling and healing for prompt completion in test script 2023-09-29 22:49:01 +02:00
turboderp
73a133405f Change test_inference.py ppl calculation to exactly match logic in convert.py 2023-09-20 09:56:25 +02:00
turboderp
d1d97742bc Allow reduced max_input_len when measuring ppl 2023-09-17 17:45:55 +02:00
turboderp
2f72437fcb Add more quant options 2023-09-16 15:04:12 +02:00
turboderp
c0ade31bfe Fix typo 2023-09-10 10:00:52 +02:00
turboderp
f79e16c5d0 Optimization, wider loads in EXL2 kernel (int4) 2023-09-07 10:56:43 +02:00
turboderp
1075b7514f Optimization, wider loads in GPTQ kernel (int4) 2023-09-07 04:26:45 +02:00
turboderp
2a2cc16119 More kernel optimizin 2023-09-02 13:29:43 +02:00
turboderp
92ce76dec1 Kernel optimizations WIP 2023-09-02 05:37:00 +02:00