turboderp
|
4356527867
|
Pin pydantic to 2.11.0
|
2025-10-09 11:00:25 +02:00 |
|
turboderp
|
001e2ab125
|
Addition kernel
|
2025-10-09 10:59:23 +02:00 |
|
turboderp
|
78391e76c2
|
Config: Read and expose max_position_embeddings
|
2025-10-06 20:01:41 +02:00 |
|
turboderp
|
402ab56b3e
|
Generator: Avoid recursive functions in defrag, should fix #78
|
2025-10-05 14:40:11 +02:00 |
|
turboderp
|
cb2a467c93
|
Fix cmdline argument collision
|
2025-10-05 01:30:20 +02:00 |
|
turboderp
|
4829ea43d9
|
Rework GEMM kernel tuning
|
2025-10-05 01:30:20 +02:00 |
|
turboderp
|
c3cae873c4
|
Fix regression (banned strings)
|
2025-10-04 22:54:17 +02:00 |
|
turboderp
|
4c23cb9d03
|
Bump to v0.0.7
v0.0.7
|
2025-09-28 17:33:33 +02:00 |
|
turboderp
|
6cea948c3d
|
More C++ modules
|
2025-09-28 16:55:17 +02:00 |
|
turboderp
|
13dd99cafe
|
GatedDeltaNet: Move import
|
2025-09-28 03:32:12 +02:00 |
|
turboderp
|
a8a3da598b
|
Add C++ modules
|
2025-09-28 00:08:51 +02:00 |
|
turboderp
|
8c4a542751
|
Generator: Fix max_rq_tokens edge case
|
2025-09-27 23:49:33 +02:00 |
|
turboderp
|
bbfd4e7069
|
Cleanup
|
2025-09-27 23:48:40 +02:00 |
|
turboderp
|
58f36bf14e
|
Quantizer optimizations
|
2025-09-26 23:05:54 +02:00 |
|
turboderp
|
4fb2e37d22
|
Fix typo
|
2025-09-25 05:28:51 +02:00 |
|
turboderp
|
700b394126
|
TP: Fix _InterlockedExchange MSVC compilation issue
|
2025-09-25 05:07:19 +02:00 |
|
turboderp
|
3ab408b45e
|
GatedDeltaNet: fsqrt -> sqrtf
|
2025-09-25 04:02:26 +02:00 |
|
turboderp
|
9933736be6
|
TP: Split AVX2 code from .cu objects
|
2025-09-25 01:47:49 +02:00 |
|
turboderp
|
9871564982
|
GatedDeltaNet: Skip some redundant reshaping
|
2025-09-25 00:46:09 +02:00 |
|
turboderp
|
f1d9c1816b
|
GatedDeltaNet: Add gated delta rule kernel
|
2025-09-25 00:46:09 +02:00 |
|
turboderp
|
7cf5ac1fb3
|
GatedMLP: Fix tensor shape and re-enable fused gate/up projection
|
2025-09-25 00:19:16 +02:00 |
|
turboderp
|
1df9d371b3
|
Generator: Fix regression
|
2025-09-22 22:54:59 +02:00 |
|
turboderp
|
a602d2d409
|
GatedDeltaNet: Skip some boilerplate around fused_recurrent_gated_delta_rule to reduce overhead
|
2025-09-22 22:08:55 +02:00 |
|
turboderp
|
53f92693b2
|
GatedDeltaNet: Call causal_conv1d ext function directly
|
2025-09-22 22:07:14 +02:00 |
|
turboderp
|
2547c63b28
|
GatedRMSNorm: Custom kernel
|
2025-09-22 22:04:32 +02:00 |
|
turboderp
|
12eadfe114
|
Generator: Add requeue option
|
2025-09-22 03:34:31 +02:00 |
|
turboderp
|
1ff09ee3c4
|
Generator: Fix batching with recurrent states
|
2025-09-22 03:30:37 +02:00 |
|
turboderp
|
b2d4dcab73
|
BlockSparseMLP: Use MGEMM for intermediate batch sizes
|
2025-09-21 06:24:14 +02:00 |
|
turboderp
|
c207f98e63
|
Generator: Fix recurrent state batching
|
2025-09-21 05:24:06 +02:00 |
|
turboderp
|
8c71b0aa57
|
GatedDeltaNet: Fused kernel for splitting inputs, casting, applying sigmoid etc.
|
2025-09-21 05:03:43 +02:00 |
|
turboderp
|
1f6b3b5c0a
|
Cleanup
|
2025-09-21 04:58:39 +02:00 |
|
turboderp
|
7a0b7de368
|
GatedDeltaNet: Bypass causal-conv1d interface to reduce CPU overhead
|
2025-09-21 04:58:39 +02:00 |
|
turboderp
|
bbaa86397f
|
Add sigmoid gate kernel
|
2025-09-21 04:58:39 +02:00 |
|
turboderp
|
d9536003f2
|
GatedDeltaNet: Remove stray import
|
2025-09-21 04:16:05 +02:00 |
|
turboderp
|
7fda5c570c
|
TP: Fix missing bias export in LayerNorm
|
2025-09-20 04:02:45 +02:00 |
|
turboderp
|
b25082a0be
|
Update README
|
2025-09-19 19:24:01 +02:00 |
|
turboderp
|
aa9a315fd9
|
compare_q.py: Explicit GC between runs
|
2025-09-19 19:15:29 +02:00 |
|
turboderp
|
4f23a5c9ac
|
Add Qwen3NextForCausalLM architecture
|
2025-09-19 01:35:39 +02:00 |
|
turboderp
|
9cfa445031
|
Add GatedDeltaNet module
|
2025-09-19 01:34:50 +02:00 |
|
turboderp
|
b507ea71b0
|
Add GatedRMSNorm module
|
2025-09-19 00:48:13 +02:00 |
|
turboderp
|
c0f6a77902
|
Generator: Recurrent support
|
2025-09-19 00:48:13 +02:00 |
|
turboderp
|
38cb1a3157
|
BlockSparseMLP: Support gate for shared expert
|
2025-09-19 00:48:13 +02:00 |
|
turboderp
|
159661dfb4
|
Attn: Support gated softmax attn (gate interleaved with query)
|
2025-09-19 00:48:13 +02:00 |
|
turboderp
|
8cb3535bcb
|
Model: Fail to load in TP mode when model doesn't have TP implementation
|
2025-09-19 00:48:13 +02:00 |
|
turboderp
|
9f373a9460
|
Routing kernel: Support up to 512 experts
|
2025-09-19 00:48:13 +02:00 |
|
turboderp
|
cc2e9b4ecb
|
Add recurrent cache
|
2025-09-19 00:48:12 +02:00 |
|
turboderp
|
c7f0b694b4
|
Update example
|
2025-09-19 00:48:12 +02:00 |
|
turboderp
|
3845775650
|
ppl_transformers.py: Explicitly make bfloat16 the default dtype
|
2025-09-18 22:11:19 +02:00 |
|
turboderp
|
476e591966
|
Fix some typos
|
2025-09-18 03:01:04 +02:00 |
|
turboderp
|
9f14bcf27c
|
optimize.py: Compile all filter globs to single regex
|
2025-09-15 14:56:17 +02:00 |
|