894 Commits

Author SHA1 Message Date
turboderp
8b1fccf7a6 Fix compiler warnings 2025-05-27 11:27:37 +02:00
turboderp
cf22084579 Generator: Add probs and top tokens/probs 2025-05-25 20:02:59 +02:00
turboderp
33466d7855 Generator: Add probs and top tokens/probs 2025-05-25 19:59:49 +02:00
turboderp
460e201cc3 Add batched translation example 2025-05-25 15:03:19 +02:00
turboderp
c94905bf79 RMSNorm: Reduce CPU overhead 2025-05-25 13:53:01 +02:00
turboderp
d54648bfab LinearFP16: Use empty output tensor 2025-05-25 13:33:39 +02:00
turboderp
c2ec220c83 LinearEXL3: Reduce CPU overhead 2025-05-25 13:33:39 +02:00
turboderp
ec839e16b8 Mixtral/Qwen3 MoE: Skip redundant downcast after attn 2025-05-25 13:33:39 +02:00
turboderp
1284d43c76 GEMM kernel tweaks and tuning 2025-05-25 13:33:39 +02:00
turboderp
c82af98d57 New RoPE kernel with fused head norm 2025-05-25 13:33:39 +02:00
turboderp
8357593c39 BlockSparseMLP: Move functionality to extension, reduce CPU overhead 2025-05-25 02:20:06 +02:00
turboderp
6693b50105 Add GPTJ to reference RoPE implementation 2025-05-25 02:13:11 +02:00
turboderp
f1cfd3fb4e Attn: Parallelize k_proj and v_proj to use more SMs on models with small num_kv_heads 2025-05-25 02:13:11 +02:00
turboderp
afac0a4320 Keep k_proj and v_proj bitrate equal 2025-05-24 13:30:23 +02:00
turboderp
bc3d38f04d Some profiling util stuff 2025-05-24 13:30:23 +02:00
turboderp
30c7386b7c BlockSparseMLP: Small optimization 2025-05-24 02:08:29 +02:00
turboderp
02982fcc9f Sampler: Skip some asserts 2025-05-23 23:33:20 +02:00
turboderp
8b0df69103 Sampler: Don't set torch/random seed unless it's needed 2025-05-23 23:33:20 +02:00
turboderp
d359bcc0d3 Add MCG 3INST and MCG 1MAD (MUL1) experimental quant modes 2025-05-21 19:15:13 +02:00
turboderp
c0a2028fb5 compare_q.py: Fix some logic for KLD test 2025-05-18 21:55:26 +02:00
turboderp
d860f8e1e1 Linear: Load scaled FP8 weights 2025-05-18 16:02:48 +02:00
turboderp
e1d2fa11d6 compare_q.py: Add -mask arg 2025-05-18 10:58:14 +02:00
turboderp
07ffea7f89 compare_q.py: Fix llama.cpp bpw measurement for MoE models 2025-05-18 00:19:59 +02:00
turboderp
475dfcca47 compare_q.py: Add more GPTQ layer types 2025-05-18 00:19:19 +02:00
turboderp
2432c64e68 model_init: Add override for default cache size 2025-05-17 16:58:32 +02:00
turboderp
0488385eb0 Add simple long-context evaluation script 2025-05-17 16:58:12 +02:00
turboderp
b5fb1827da Fix head BPW estimate for component model 2025-05-17 16:41:39 +02:00
turboderp
769ddb34b0 chat.py: Add some more functionality 2025-05-17 12:33:22 +02:00
turboderp
08858bc8e3 Fix regression 2025-05-16 22:25:14 +02:00
turboderp
3873d40ae2 compare_q.py: Add KLD test and some other tweaks 2025-05-16 16:13:26 +02:00
turboderp
966762a32d Add Gemma3 architecture (text) 2025-05-16 12:14:47 +02:00
turboderp
830b6a0180 Preparation for multimodal models 2025-05-16 00:35:44 +02:00
turboderp
a19538cf1e compare_q.py: Some fixes 2025-05-16 00:33:48 +02:00
turboderp
48747ba09d Fix: Don't (try to) apply full-width padding when loading partial tensors 2025-05-15 01:28:24 +02:00
turboderp
7f3096ffd7 compare_q.py: Account for unquantized weights in blocksparse EXL2 layers 2025-05-14 23:55:25 +02:00
turboderp
d1e3b2b20e Update README 2025-05-14 17:53:44 +02:00
turboderp
9665ba9998 Add Mixtral architecture 2025-05-14 17:53:33 +02:00
turboderp
b728058b20 Conversion: Close files between layers to avoid overusing handles for extremely large models 2025-05-14 17:53:08 +02:00
turboderp
cb7c70cde0 compare_q.py: Add a little versatility to plot 2025-05-14 17:52:21 +02:00
turboderp
ce58d99c71 BlockSparseMLP: Disable padding for routing gate 2025-05-14 14:46:33 +02:00
turboderp
5c3ff204c4 model_diff.py: Use deferred load and close file handles between modules 2025-05-12 21:23:48 +02:00
Brian
a905cffb1a Merge pull request #37 from turboderp-org/dev
Merge Dev to master
v0.0.2
2025-05-12 12:38:08 -04:00
kingbri
70056fef5f Project: Bump version
v0.0.2

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 11:35:06 -04:00
turboderp
95cfa726b6 top_k sampler: Fix int check for older Pythons 2025-05-12 02:34:32 +02:00
turboderp
1e1754787e HumanEval: Move BOS token to individual prompt template, don't prepend by default when tokenizing 2025-05-11 23:02:07 +02:00
turboderp
f5127e87f8 Merge branch 'master' into dev 2025-05-11 20:48:19 +02:00
turboderp
81a0a7d240 Merge pull request #35 from gakada/humaneval
humaneval.py: fix top_k type, remove rep_p, add qwen3
2025-05-11 20:47:03 +02:00
turboderp
9c31971b84 Merge pull request #36 from tokoba/master
added max_total_tokens variable to class Generator, fixed type assert…
2025-05-11 20:09:14 +02:00
turboderp
10222646d0 Merge branch 'master' into dev 2025-05-11 18:40:53 +02:00
turboderp
43383ebdbc Fix potential NaN condition when applying repetition penalty 2025-05-11 17:27:10 +02:00