turboderp
c8fa853c89
Test script: Allow --eval_rows in wiki2 ppl test
2025-01-09 11:14:48 +01:00
turboderp
6cdfa5e52f
Refactor architecture logic for code reuse between LLM/VLM
2024-11-03 22:34:25 +01:00
turboderp
b30f796690
TP mode for attn layer, non-paged
2024-08-14 23:41:10 +02:00
turboderp
036506f273
Use high priority stream for forward pass
2024-07-27 16:05:12 +02:00
turboderp
1179b8a5e5
Fix ppl test for long seq lengths
2024-07-10 08:05:57 +02:00
turboderp
f1179ff200
Add ppl-over-seqlen test
2024-07-03 22:37:51 +02:00
turboderp
05b1f2194e
Fix imports in test_inference
2024-06-24 00:56:29 +02:00
turboderp
f3596fc0d9
Add Q6 cache mode
2024-06-09 01:23:50 +02:00
turboderp
fb61a817ec
Add Q8 cache mode
2024-06-08 15:33:19 +02:00
turboderp
7502eef349
Better sampling settings for test gen
2024-05-09 02:31:50 +02:00
turboderp
dc1dfc4dd5
Update prompt speed test
2024-04-18 09:19:25 +02:00
turboderp
3be55a97af
Test inference script: add max_output_len option
2024-04-05 21:35:52 +02:00
turboderp
d5e4f66a05
Fix standard ppl test
2024-04-03 17:10:19 +02:00
turboderp
ad67790e73
softmax+topk kernel for 16 experts
2024-03-30 09:31:58 +01:00
turboderp
37c3b69958
Merge pull request #374 from Lyrcaxis/patch-1
...
Fix installation step (install requirements) & Add multi-GPU explanation
2024-03-20 04:57:50 +01:00
turboderp
9c47269913
Add parallel decoder block
2024-03-19 18:20:44 +01:00
turboderp
21772adaf9
Add logit scale
2024-03-19 18:20:44 +01:00
turboderp
efd20eec03
Make BOS preference arch dependent
2024-03-19 18:20:44 +01:00
Thanasis Galianis
8e8a711687
Update test_inference.py
2024-03-18 22:39:27 +02:00
Thanasis Galianis
2ee57974dc
Added --gpu_split explanation on test_inference.py
2024-03-18 21:36:22 +02:00
turboderp
bafe539728
Add Q4 cache mode
2024-03-03 23:34:11 +01:00
turboderp
a19a2eccb4
Add option to force BOS for ppl test
2024-02-22 14:44:27 +01:00
turboderp
983a229913
Add BOS token by default in test_inference.py, option to override
2024-02-22 09:42:43 +01:00
Min Xu
8e13598868
minor changes
...
1. added .so file to the ignored list
2. removed 2 unused imports from test_inference.py, which also
avoided a warning for me that was produced by importing
pandas
2024-02-02 20:19:11 -08:00
turboderp
23fc4737ae
Fast safetensors mode with direct IO and pinned buffer
2024-01-18 20:11:53 +01:00
turboderp
024080186f
Util functions for rank-reduce experiment
2024-01-06 09:52:59 +01:00
turboderp
e1010218a7
Reduce chunk size to reduce likelihood of OoM during ppl test
2024-01-06 07:48:16 +01:00
turboderp
41b15dd1c3
Refactor to consolidate attn params
2024-01-04 04:52:49 +01:00
turboderp
5ddf57f945
Fix regular ppl test
2023-12-30 22:14:43 +01:00
turboderp
4d5ef3b53d
Attempt to add standard ppl test (experimental)
2023-12-30 01:39:20 +01:00
turboderp
a52d410d4a
Attempt to add standard ppl test (experimental)
2023-12-30 01:39:03 +01:00
turboderp
5eef9beef1
Fix OoM when running multiple tests with -gs auto
2023-12-15 19:11:31 +01:00
turboderp
95ddeb7588
Layer-streaming perplexity test, fix for no_flash_attn
2023-12-15 15:40:38 +01:00
turboderp
0fd1241a54
Layer-streaming perplexity test
2023-12-14 19:18:57 +01:00
turboderp
7bdde5a8e3
no_warmup option in test script
2023-12-14 01:30:12 +01:00
turboderp
9f01116fb4
Fix OoM when testing PPL with large vocab
2023-12-12 13:02:42 +01:00
turboderp
0597668673
Safer sampler settings for prompt completion in test script
2023-12-11 00:25:32 +01:00
turboderp
d8b4efa8d4
Instrumentation etc.
2023-12-10 17:36:40 +01:00
turboderp
f475b442b8
Add layer replace option to test_inference.py
2023-12-09 21:47:19 +01:00
turboderp
0aeca11fa6
Add token ppl and 8-bit cache test to test_inference script
2023-12-03 22:04:01 +01:00
turboderp
7f35594a54
Add auto split to model_init and test_inference
2023-10-22 19:11:29 +02:00
turboderp
7bd131e738
Use sampling and healing for prompt completion in test script
2023-09-29 22:49:01 +02:00
turboderp
73a133405f
Change test_inference.py ppl calculation to exactly match logic in convert.py
2023-09-20 09:56:25 +02:00
turboderp
d1d97742bc
Allow reduced max_input_len when measuring ppl
2023-09-17 17:45:55 +02:00
turboderp
2f72437fcb
Add more quant options
2023-09-16 15:04:12 +02:00
turboderp
c0ade31bfe
Fix typo
2023-09-10 10:00:52 +02:00
turboderp
f79e16c5d0
Optimization, wider loads in EXL2 kernel (int4)
2023-09-07 10:56:43 +02:00
turboderp
1075b7514f
Optimization, wider loads in GPTQ kernel (int4)
2023-09-07 04:26:45 +02:00
turboderp
2a2cc16119
More kernel optimizin
2023-09-02 13:29:43 +02:00
turboderp
92ce76dec1
Kernel optimizations WIP
2023-09-02 05:37:00 +02:00