exllamav3

mirror of https://github.com/turboderp-org/exllamav3.git synced 2026-04-29 18:51:34 +00:00

Author	SHA1	Message	Date
turboderp	8b1fccf7a6	Fix compiler warnings	2025-05-27 11:27:37 +02:00
turboderp	cf22084579	Generator: Add probs and top tokens/probs	2025-05-25 20:02:59 +02:00
turboderp	33466d7855	Generator: Add probs and top tokens/probs	2025-05-25 19:59:49 +02:00
turboderp	460e201cc3	Add batched translation example	2025-05-25 15:03:19 +02:00
turboderp	c94905bf79	RMSNorm: Reduce CPU overhead	2025-05-25 13:53:01 +02:00
turboderp	d54648bfab	LinearFP16: Use empty output tensor	2025-05-25 13:33:39 +02:00
turboderp	c2ec220c83	LinearEXL3: Reduce CPU overhead	2025-05-25 13:33:39 +02:00
turboderp	ec839e16b8	Mixtral/Qwen3 MoE: Skip redundant downcast after attn	2025-05-25 13:33:39 +02:00
turboderp	1284d43c76	GEMM kernel tweaks and tuning	2025-05-25 13:33:39 +02:00
turboderp	c82af98d57	New RoPE kernel with fused head norm	2025-05-25 13:33:39 +02:00
turboderp	8357593c39	BlockSparseMLP: Move functionality to extension, reduce CPU overhead	2025-05-25 02:20:06 +02:00
turboderp	6693b50105	Add GPTJ to reference RoPE implementation	2025-05-25 02:13:11 +02:00
turboderp	f1cfd3fb4e	Attn: Parallelize k_proj and v_proj to use more SMs on models with small num_kv_heads	2025-05-25 02:13:11 +02:00
turboderp	afac0a4320	Keep k_proj and v_proj bitrate equal	2025-05-24 13:30:23 +02:00
turboderp	bc3d38f04d	Some profiling util stuff	2025-05-24 13:30:23 +02:00
turboderp	30c7386b7c	BlockSparseMLP: Small optimization	2025-05-24 02:08:29 +02:00
turboderp	02982fcc9f	Sampler: Skip some asserts	2025-05-23 23:33:20 +02:00
turboderp	8b0df69103	Sampler: Don't set torch/random seed unless it's needed	2025-05-23 23:33:20 +02:00
turboderp	d359bcc0d3	Add MCG 3INST and MCG 1MAD (MUL1) experimental quant modes	2025-05-21 19:15:13 +02:00
turboderp	c0a2028fb5	compare_q.py: Fix some logic for KLD test	2025-05-18 21:55:26 +02:00
turboderp	d860f8e1e1	Linear: Load scaled FP8 weights	2025-05-18 16:02:48 +02:00
turboderp	e1d2fa11d6	compare_q.py: Add -mask arg	2025-05-18 10:58:14 +02:00
turboderp	07ffea7f89	compare_q.py: Fix llama.cpp bpw measurement for MoE models	2025-05-18 00:19:59 +02:00
turboderp	475dfcca47	compare_q.py: Add more GPTQ layer types	2025-05-18 00:19:19 +02:00
turboderp	2432c64e68	model_init: Add override for default cache size	2025-05-17 16:58:32 +02:00
turboderp	0488385eb0	Add simple long-context evaluation script	2025-05-17 16:58:12 +02:00
turboderp	b5fb1827da	Fix head BPW estimate for component model	2025-05-17 16:41:39 +02:00
turboderp	769ddb34b0	chat.py: Add some more functionality	2025-05-17 12:33:22 +02:00
turboderp	08858bc8e3	Fix regression	2025-05-16 22:25:14 +02:00
turboderp	3873d40ae2	compare_q.py: Add KLD test and some other tweaks	2025-05-16 16:13:26 +02:00
turboderp	966762a32d	Add Gemma3 architecture (text)	2025-05-16 12:14:47 +02:00
turboderp	830b6a0180	Preparation for multimodal models	2025-05-16 00:35:44 +02:00
turboderp	a19538cf1e	compare_q.py: Some fixes	2025-05-16 00:33:48 +02:00
turboderp	48747ba09d	Fix: Don't (try to) apply full-width padding when loading partial tensors	2025-05-15 01:28:24 +02:00
turboderp	7f3096ffd7	compare_q.py: Account for unquantized weights in blocksparse EXL2 layers	2025-05-14 23:55:25 +02:00
turboderp	d1e3b2b20e	Update README	2025-05-14 17:53:44 +02:00
turboderp	9665ba9998	Add Mixtral architecture	2025-05-14 17:53:33 +02:00
turboderp	b728058b20	Conversion: Close files between layers to avoid overusing handles for extremely large models	2025-05-14 17:53:08 +02:00
turboderp	cb7c70cde0	compare_q.py: Add a little versatility to plot	2025-05-14 17:52:21 +02:00
turboderp	ce58d99c71	BlockSparseMLP: Disable padding for routing gate	2025-05-14 14:46:33 +02:00
turboderp	5c3ff204c4	model_diff.py: Use deferred load and close file handles between modules	2025-05-12 21:23:48 +02:00
Brian	a905cffb1a	Merge pull request #37 from turboderp-org/dev Merge Dev to master v0.0.2	2025-05-12 12:38:08 -04:00
kingbri	70056fef5f	Project: Bump version v0.0.2 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:35:06 -04:00
turboderp	95cfa726b6	top_k sampler: Fix int check for older Pythons	2025-05-12 02:34:32 +02:00
turboderp	1e1754787e	HumanEval: Move BOS token to individual prompt template, don't prepend by default when tokenizing	2025-05-11 23:02:07 +02:00
turboderp	f5127e87f8	Merge branch 'master' into dev	2025-05-11 20:48:19 +02:00
turboderp	81a0a7d240	Merge pull request #35 from gakada/humaneval humaneval.py: fix top_k type, remove rep_p, add qwen3	2025-05-11 20:47:03 +02:00
turboderp	9c31971b84	Merge pull request #36 from tokoba/master added max_total_tokens variable to class Generator, fixed type assert…	2025-05-11 20:09:14 +02:00
turboderp	10222646d0	Merge branch 'master' into dev	2025-05-11 18:40:53 +02:00
turboderp	43383ebdbc	Fix potential NaN condition when applying repetition penalty	2025-05-11 17:27:10 +02:00

... 13 14 15 16 17 ...

894 Commits