exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-03-15 00:07:26 +00:00

Author	SHA1	Message	Date
turboderp	c8fa853c89	Test script: Allow --eval_rows in wiki2 ppl test	2025-01-09 11:14:48 +01:00
turboderp	6cdfa5e52f	Refactor architecture logic for code reuse between LLM/VLM	2024-11-03 22:34:25 +01:00
turboderp	b30f796690	TP mode for attn layer, non-paged	2024-08-14 23:41:10 +02:00
turboderp	036506f273	Use high priority stream for forward pass	2024-07-27 16:05:12 +02:00
turboderp	1179b8a5e5	Fix ppl test for long seq lengths	2024-07-10 08:05:57 +02:00
turboderp	f1179ff200	Add ppl-over-seqlen test	2024-07-03 22:37:51 +02:00
turboderp	05b1f2194e	Fix imports in test_inference	2024-06-24 00:56:29 +02:00
turboderp	f3596fc0d9	Add Q6 cache mode	2024-06-09 01:23:50 +02:00
turboderp	fb61a817ec	Add Q8 cache mode	2024-06-08 15:33:19 +02:00
turboderp	7502eef349	Better sampling settings for test gen	2024-05-09 02:31:50 +02:00
turboderp	dc1dfc4dd5	Update prompt speed test	2024-04-18 09:19:25 +02:00
turboderp	3be55a97af	Test inference script: add max_output_len option	2024-04-05 21:35:52 +02:00
turboderp	d5e4f66a05	Fix standard ppl test	2024-04-03 17:10:19 +02:00
turboderp	ad67790e73	softmax+topk kernel for 16 experts	2024-03-30 09:31:58 +01:00
turboderp	37c3b69958	Merge pull request #374 from Lyrcaxis/patch-1 Fix installation step (install requirements) & Add multi-GPU explanation	2024-03-20 04:57:50 +01:00
turboderp	9c47269913	Add parallel decoder block	2024-03-19 18:20:44 +01:00
turboderp	21772adaf9	Add logit scale	2024-03-19 18:20:44 +01:00
turboderp	efd20eec03	Make BOS preference arch dependent	2024-03-19 18:20:44 +01:00
Thanasis Galianis	8e8a711687	Update test_inference.py	2024-03-18 22:39:27 +02:00
Thanasis Galianis	2ee57974dc	Added --gpu_split explanation on test_inference.py	2024-03-18 21:36:22 +02:00
turboderp	bafe539728	Add Q4 cache mode	2024-03-03 23:34:11 +01:00
turboderp	a19a2eccb4	Add option to force BOS for ppl test	2024-02-22 14:44:27 +01:00
turboderp	983a229913	Add BOS token by default in test_inference.py, option to override	2024-02-22 09:42:43 +01:00
Min Xu	8e13598868	minor changes 1. added .so file to the ignored list 2. removed 2 unused imports from test_inference.py, which also avoided a warning for me that was produced by importing pandas	2024-02-02 20:19:11 -08:00
turboderp	23fc4737ae	Fast safetensors mode with direct IO and pinned buffer	2024-01-18 20:11:53 +01:00
turboderp	024080186f	Util functions for rank-reduce experiment	2024-01-06 09:52:59 +01:00
turboderp	e1010218a7	Reduce chunk size to reduce likelihood of OoM during ppl test	2024-01-06 07:48:16 +01:00
turboderp	41b15dd1c3	Refactor to consolidate attn params	2024-01-04 04:52:49 +01:00
turboderp	5ddf57f945	Fix regular ppl test	2023-12-30 22:14:43 +01:00
turboderp	4d5ef3b53d	Attempt to add standard ppl test (experimental)	2023-12-30 01:39:20 +01:00
turboderp	a52d410d4a	Attempt to add standard ppl test (experimental)	2023-12-30 01:39:03 +01:00
turboderp	5eef9beef1	Fix OoM when running multiple tests with -gs auto	2023-12-15 19:11:31 +01:00
turboderp	95ddeb7588	Layer-streaming perplexity test, fix for no_flash_attn	2023-12-15 15:40:38 +01:00
turboderp	0fd1241a54	Layer-streaming perplexity test	2023-12-14 19:18:57 +01:00
turboderp	7bdde5a8e3	no_warmup option in test script	2023-12-14 01:30:12 +01:00
turboderp	9f01116fb4	Fix OoM when testing PPL with large vocab	2023-12-12 13:02:42 +01:00
turboderp	0597668673	Safer sampler settings for prompt completion in test script	2023-12-11 00:25:32 +01:00
turboderp	d8b4efa8d4	Instrumentation etc.	2023-12-10 17:36:40 +01:00
turboderp	f475b442b8	Add layer replace option to test_inference.py	2023-12-09 21:47:19 +01:00
turboderp	0aeca11fa6	Add token ppl and 8-bit cache test to test_inference script	2023-12-03 22:04:01 +01:00
turboderp	7f35594a54	Add auto split to model_init and test_inference	2023-10-22 19:11:29 +02:00
turboderp	7bd131e738	Use sampling and healing for prompt completion in test script	2023-09-29 22:49:01 +02:00
turboderp	73a133405f	Change test_inference.py ppl calculation to exactly match logic in convert.py	2023-09-20 09:56:25 +02:00
turboderp	d1d97742bc	Allow reduced max_input_len when measuring ppl	2023-09-17 17:45:55 +02:00
turboderp	2f72437fcb	Add more quant options	2023-09-16 15:04:12 +02:00
turboderp	c0ade31bfe	Fix typo	2023-09-10 10:00:52 +02:00
turboderp	f79e16c5d0	Optimization, wider loads in EXL2 kernel (int4)	2023-09-07 10:56:43 +02:00
turboderp	1075b7514f	Optimization, wider loads in GPTQ kernel (int4)	2023-09-07 04:26:45 +02:00
turboderp	2a2cc16119	More kernel optimizin	2023-09-02 13:29:43 +02:00
turboderp	92ce76dec1	Kernel optimizations WIP	2023-09-02 05:37:00 +02:00

1 2

52 Commits