turboderp
|
8b1fccf7a6
|
Fix compiler warnings
|
2025-05-27 11:27:37 +02:00 |
|
turboderp
|
cf22084579
|
Generator: Add probs and top tokens/probs
|
2025-05-25 20:02:59 +02:00 |
|
turboderp
|
33466d7855
|
Generator: Add probs and top tokens/probs
|
2025-05-25 19:59:49 +02:00 |
|
turboderp
|
460e201cc3
|
Add batched translation example
|
2025-05-25 15:03:19 +02:00 |
|
turboderp
|
c94905bf79
|
RMSNorm: Reduce CPU overhead
|
2025-05-25 13:53:01 +02:00 |
|
turboderp
|
d54648bfab
|
LinearFP16: Use empty output tensor
|
2025-05-25 13:33:39 +02:00 |
|
turboderp
|
c2ec220c83
|
LinearEXL3: Reduce CPU overhead
|
2025-05-25 13:33:39 +02:00 |
|
turboderp
|
ec839e16b8
|
Mixtral/Qwen3 MoE: Skip redundant downcast after attn
|
2025-05-25 13:33:39 +02:00 |
|
turboderp
|
1284d43c76
|
GEMM kernel tweaks and tuning
|
2025-05-25 13:33:39 +02:00 |
|
turboderp
|
c82af98d57
|
New RoPE kernel with fused head norm
|
2025-05-25 13:33:39 +02:00 |
|
turboderp
|
8357593c39
|
BlockSparseMLP: Move functionality to extension, reduce CPU overhead
|
2025-05-25 02:20:06 +02:00 |
|
turboderp
|
6693b50105
|
Add GPTJ to reference RoPE implementation
|
2025-05-25 02:13:11 +02:00 |
|
turboderp
|
f1cfd3fb4e
|
Attn: Parallelize k_proj and v_proj to use more SMs on models with small num_kv_heads
|
2025-05-25 02:13:11 +02:00 |
|
turboderp
|
afac0a4320
|
Keep k_proj and v_proj bitrate equal
|
2025-05-24 13:30:23 +02:00 |
|
turboderp
|
bc3d38f04d
|
Some profiling util stuff
|
2025-05-24 13:30:23 +02:00 |
|
turboderp
|
30c7386b7c
|
BlockSparseMLP: Small optimization
|
2025-05-24 02:08:29 +02:00 |
|
turboderp
|
02982fcc9f
|
Sampler: Skip some asserts
|
2025-05-23 23:33:20 +02:00 |
|
turboderp
|
8b0df69103
|
Sampler: Don't set torch/random seed unless it's needed
|
2025-05-23 23:33:20 +02:00 |
|
turboderp
|
d359bcc0d3
|
Add MCG 3INST and MCG 1MAD (MUL1) experimental quant modes
|
2025-05-21 19:15:13 +02:00 |
|
turboderp
|
c0a2028fb5
|
compare_q.py: Fix some logic for KLD test
|
2025-05-18 21:55:26 +02:00 |
|
turboderp
|
d860f8e1e1
|
Linear: Load scaled FP8 weights
|
2025-05-18 16:02:48 +02:00 |
|
turboderp
|
e1d2fa11d6
|
compare_q.py: Add -mask arg
|
2025-05-18 10:58:14 +02:00 |
|
turboderp
|
07ffea7f89
|
compare_q.py: Fix llama.cpp bpw measurement for MoE models
|
2025-05-18 00:19:59 +02:00 |
|
turboderp
|
475dfcca47
|
compare_q.py: Add more GPTQ layer types
|
2025-05-18 00:19:19 +02:00 |
|
turboderp
|
2432c64e68
|
model_init: Add override for default cache size
|
2025-05-17 16:58:32 +02:00 |
|
turboderp
|
0488385eb0
|
Add simple long-context evaluation script
|
2025-05-17 16:58:12 +02:00 |
|
turboderp
|
b5fb1827da
|
Fix head BPW estimate for component model
|
2025-05-17 16:41:39 +02:00 |
|
turboderp
|
769ddb34b0
|
chat.py: Add some more functionality
|
2025-05-17 12:33:22 +02:00 |
|
turboderp
|
08858bc8e3
|
Fix regression
|
2025-05-16 22:25:14 +02:00 |
|
turboderp
|
3873d40ae2
|
compare_q.py: Add KLD test and some other tweaks
|
2025-05-16 16:13:26 +02:00 |
|
turboderp
|
966762a32d
|
Add Gemma3 architecture (text)
|
2025-05-16 12:14:47 +02:00 |
|
turboderp
|
830b6a0180
|
Preparation for multimodal models
|
2025-05-16 00:35:44 +02:00 |
|
turboderp
|
a19538cf1e
|
compare_q.py: Some fixes
|
2025-05-16 00:33:48 +02:00 |
|
turboderp
|
48747ba09d
|
Fix: Don't (try to) apply full-width padding when loading partial tensors
|
2025-05-15 01:28:24 +02:00 |
|
turboderp
|
7f3096ffd7
|
compare_q.py: Account for unquantized weights in blocksparse EXL2 layers
|
2025-05-14 23:55:25 +02:00 |
|
turboderp
|
d1e3b2b20e
|
Update README
|
2025-05-14 17:53:44 +02:00 |
|
turboderp
|
9665ba9998
|
Add Mixtral architecture
|
2025-05-14 17:53:33 +02:00 |
|
turboderp
|
b728058b20
|
Conversion: Close files between layers to avoid overusing handles for extremely large models
|
2025-05-14 17:53:08 +02:00 |
|
turboderp
|
cb7c70cde0
|
compare_q.py: Add a little versatility to plot
|
2025-05-14 17:52:21 +02:00 |
|
turboderp
|
ce58d99c71
|
BlockSparseMLP: Disable padding for routing gate
|
2025-05-14 14:46:33 +02:00 |
|
turboderp
|
5c3ff204c4
|
model_diff.py: Use deferred load and close file handles between modules
|
2025-05-12 21:23:48 +02:00 |
|
Brian
|
a905cffb1a
|
Merge pull request #37 from turboderp-org/dev
Merge Dev to master
v0.0.2
|
2025-05-12 12:38:08 -04:00 |
|
kingbri
|
70056fef5f
|
Project: Bump version
v0.0.2
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
|
2025-05-12 11:35:06 -04:00 |
|
turboderp
|
95cfa726b6
|
top_k sampler: Fix int check for older Pythons
|
2025-05-12 02:34:32 +02:00 |
|
turboderp
|
1e1754787e
|
HumanEval: Move BOS token to individual prompt template, don't prepend by default when tokenizing
|
2025-05-11 23:02:07 +02:00 |
|
turboderp
|
f5127e87f8
|
Merge branch 'master' into dev
|
2025-05-11 20:48:19 +02:00 |
|
turboderp
|
81a0a7d240
|
Merge pull request #35 from gakada/humaneval
humaneval.py: fix top_k type, remove rep_p, add qwen3
|
2025-05-11 20:47:03 +02:00 |
|
turboderp
|
9c31971b84
|
Merge pull request #36 from tokoba/master
added max_total_tokens variable to class Generator, fixed type assert…
|
2025-05-11 20:09:14 +02:00 |
|
turboderp
|
10222646d0
|
Merge branch 'master' into dev
|
2025-05-11 18:40:53 +02:00 |
|
turboderp
|
43383ebdbc
|
Fix potential NaN condition when applying repetition penalty
|
2025-05-11 17:27:10 +02:00 |
|