turboderp
|
ea87af6ea8
|
Bump to v0.0.28
v0.0.28
|
2026-03-30 22:03:01 +02:00 |
|
turboderp
|
8b15b32af6
|
MoE kernel: Include instances for dims not divisible by 256, addresses
|
2026-03-30 21:58:23 +02:00 |
|
turboderp
|
db082c5d32
|
Cleanup
|
2026-03-30 21:17:19 +02:00 |
|
turboderp
|
d5ad174d8f
|
Quantize: Retry cholesky if H not positive-definite
|
2026-03-30 21:17:04 +02:00 |
|
turboderp
|
3423357bc6
|
Sampling: Fix argmax for sorted logits, remove redundant norm before log_gumbel
|
2026-03-30 21:16:12 +02:00 |
|
turboderp
|
863f96bcae
|
Bump to v0.0.27
v0.0.27
|
2026-03-26 02:15:00 +01:00 |
|
turboderp
|
4b58e05fdc
|
Merge branch 'refs/heads/dev' into fork/Katehuuh/nanochat-ve-scalars
# Conflicts:
# exllamav3/modules/transformer.py
|
2026-03-25 22:47:16 +01:00 |
|
turboderp
|
2a99bbe35f
|
Nanochat: Fix quantization
|
2026-03-25 22:43:53 +01:00 |
|
turboderp
|
317086503d
|
Norm: Fix unweighted norm if input dtype != at::kHalf
|
2026-03-25 21:09:36 +01:00 |
|
turboderp
|
8daabfc207
|
Nanochat: Rework/refactor new features implementation
|
2026-03-25 21:09:02 +01:00 |
|
turboderp
|
b381e15ccb
|
Generator: Give requeued jobs priority on the pending list
|
2026-03-25 02:05:59 +01:00 |
|
turboderp
|
7d80c39a45
|
Add IFBench eval
|
2026-03-25 01:58:45 +01:00 |
|
Katehuuh
|
8aca86c4a3
|
nanochat: VE, residual scalars, backout; auto-detect key format
|
2026-03-24 14:20:00 +01:00 |
|
turboderp
|
03d9aaf3f8
|
Generator: Ensure recurrent checkpoint after every prefill chunk, even if chunks aren't aligned with checkpoint intervals
|
2026-03-23 21:49:11 +01:00 |
|
turboderp
|
97b0bbc5c0
|
Docs: Remove outdated quant duration estimates
|
2026-03-22 23:18:13 +01:00 |
|
turboderp
|
77a42495a5
|
Conversion: Rework allocation strategy for noninteger bitrates, add --hq mode
|
2026-03-22 23:17:49 +01:00 |
|
turboderp
|
936483ece2
|
Generator: Decrease defrag frequency
|
2026-03-22 18:16:58 +01:00 |
|
turboderp
|
1592d04ffd
|
Tests: Fix up generator stress test
|
2026-03-22 18:16:33 +01:00 |
|
turboderp
|
0b898f2cc0
|
Generator: Fix last recurrent checkpoints not hitting page boundary
|
2026-03-22 18:02:48 +01:00 |
|
turboderp
|
3e18c72d9e
|
Generator: Enforce recurrent_checkpoint_interval <= max_chunk_size
|
2026-03-22 18:02:48 +01:00 |
|
turboderp
|
15647d98d7
|
Sampling: Fix possible divide-by-zero in rep.penalty kernels
|
2026-03-22 18:02:48 +01:00 |
|
turboderp
|
d706467d85
|
recompile.py: Allow overriding tensors defined by the model/architecture but missing from an incomplete input model's SafetensorsCollection
|
2026-03-22 02:57:54 +01:00 |
|
turboderp
|
8a36ee8b9a
|
chat.py: Add Qwen3.5-specific ChatML template
|
2026-03-21 14:16:09 +01:00 |
|
turboderp
|
4fa57eaaeb
|
Tokenizer: Fix regression in HF template helper
|
2026-03-17 02:36:16 +01:00 |
|
turboderp
|
ba1ad9ac66
|
Bump to v0.0.26
v0.0.26
|
2026-03-16 19:55:04 +01:00 |
|
turboderp
|
a31d2187fc
|
chat.py: Add probs option
|
2026-03-16 02:30:14 +01:00 |
|
turboderp
|
2cac1d612d
|
Sampler: Make sure probs are normalized before log gumbel
|
2026-03-16 02:28:57 +01:00 |
|
turboderp
|
517c2db5a0
|
BlockSparseMLP: Work around NVCC constexpr quirk
|
2026-03-15 20:01:44 +01:00 |
|
turboderp
|
e54c1b8b7a
|
BlockSparseMLP: Tune kernel size
|
2026-03-15 17:27:59 +01:00 |
|
turboderp
|
05e2541bb8
|
BlockSparseMLP: Allow fused path for module with mixed bitrates
|
2026-03-15 01:46:27 +01:00 |
|
turboderp
|
48de29c05b
|
BlockSparseMLP: Fix regression when loading to single device
|
2026-03-15 01:39:30 +01:00 |
|
turboderp
|
5f54aa5f57
|
convert.py: Fix overflow when mixing bitrates for expert-heavy models
|
2026-03-15 00:29:37 +01:00 |
|
turboderp
|
cd94bf8f8f
|
Step3.5: Fix negative activation limit
|
2026-03-14 23:14:27 +01:00 |
|
turboderp
|
0e61e43e0f
|
BlockSparseMLP: Add fused MoE kernel
|
2026-03-14 23:14:27 +01:00 |
|
turboderp
|
fff187224b
|
Model: Drop all refs to shared tensors after model load
|
2026-03-14 22:31:36 +01:00 |
|
turboderp
|
3f3c0bc325
|
Mixtral: Fix MoE out_dtype
|
2026-03-14 22:29:11 +01:00 |
|
turboderp
|
ebd2efb6bd
|
chat.py: Random benchmark question feature
|
2026-03-13 04:31:19 +01:00 |
|
turboderp
|
aaf6337f12
|
Add OlmoHybridForCausalLM
|
2026-03-13 00:59:10 +01:00 |
|
turboderp
|
f674142ff3
|
Qwen3.5: Don't set mrope flag if no vision tower
|
2026-03-13 00:46:27 +01:00 |
|
turboderp
|
bdfb7929f4
|
GatedDeltaNet: Support split qkv and conv1d weights
|
2026-03-13 00:45:59 +01:00 |
|
turboderp
|
f4c56f8c6d
|
GatedDeltaNet: Handle head sizes up to 256, divisible by down to 32, support beta scale (linear_allow_neg_eigval)
|
2026-03-13 00:44:37 +01:00 |
|
turboderp
|
f83a9ae242
|
GatedRMSNorm: Use single warp for head size up to 256
|
2026-03-13 00:40:05 +01:00 |
|
turboderp
|
42d0854c39
|
convert.py: Compactify display of module tree
|
2026-03-13 00:37:39 +01:00 |
|
turboderp
|
1404f7aa48
|
Bump to v0.0.25
v0.0.25
|
2026-03-11 23:48:47 +01:00 |
|
turboderp
|
9db029ded5
|
Separate transpose options for fused expert weights (account for differences between Qwen3Moe and Qwen3_5Moe)
|
2026-03-11 21:43:45 +01:00 |
|
turboderp
|
e05f4636ee
|
Add Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM
|
2026-03-11 21:00:23 +01:00 |
|
turboderp
|
1b9e58c9b5
|
BlockSparseMLP: Skip redundant gather
|
2026-03-11 20:25:56 +01:00 |
|
turboderp
|
d52c49c17f
|
GatedDeltaNet: Allow bfloat16 a_log
|
2026-03-11 20:24:04 +01:00 |
|
turboderp
|
ad546f7937
|
Bump to v0.0.24
v0.0.24
|
2026-03-08 20:39:35 +01:00 |
|
turboderp
|
63ba4d005c
|
Generator: If model is recurrent, run last page of prompt in a separate forward pass to create checkpoint
Ensures at most 255 tokens have to be reingested per request
|
2026-03-07 23:32:42 +01:00 |
|