Commit Graph

528 Commits

Author SHA1 Message Date
turboderp
4356527867 Pin pydantic to 2.11.0 2025-10-09 11:00:25 +02:00
turboderp
001e2ab125 Addition kernel 2025-10-09 10:59:23 +02:00
turboderp
78391e76c2 Config: Read and expose max_position_embeddings 2025-10-06 20:01:41 +02:00
turboderp
402ab56b3e Generator: Avoid recursive functions in defrag, should fix #78 2025-10-05 14:40:11 +02:00
turboderp
cb2a467c93 Fix cmdline argument collision 2025-10-05 01:30:20 +02:00
turboderp
4829ea43d9 Rework GEMM kernel tuning 2025-10-05 01:30:20 +02:00
turboderp
c3cae873c4 Fix regression (banned strings) 2025-10-04 22:54:17 +02:00
turboderp
4c23cb9d03 Bump to v0.0.7 v0.0.7 2025-09-28 17:33:33 +02:00
turboderp
6cea948c3d More C++ modules 2025-09-28 16:55:17 +02:00
turboderp
13dd99cafe GatedDeltaNet: Move import 2025-09-28 03:32:12 +02:00
turboderp
a8a3da598b Add C++ modules 2025-09-28 00:08:51 +02:00
turboderp
8c4a542751 Generator: Fix max_rq_tokens edge case 2025-09-27 23:49:33 +02:00
turboderp
bbfd4e7069 Cleanup 2025-09-27 23:48:40 +02:00
turboderp
58f36bf14e Quantizer optimizations 2025-09-26 23:05:54 +02:00
turboderp
4fb2e37d22 Fix typo 2025-09-25 05:28:51 +02:00
turboderp
700b394126 TP: Fix _InterlockedExchange MSVC compilation issue 2025-09-25 05:07:19 +02:00
turboderp
3ab408b45e GatedDeltaNet: fsqrt -> sqrtf 2025-09-25 04:02:26 +02:00
turboderp
9933736be6 TP: Split AVX2 code from .cu objects 2025-09-25 01:47:49 +02:00
turboderp
9871564982 GatedDeltaNet: Skip some redundant reshaping 2025-09-25 00:46:09 +02:00
turboderp
f1d9c1816b GatedDeltaNet: Add gated delta rule kernel 2025-09-25 00:46:09 +02:00
turboderp
7cf5ac1fb3 GatedMLP: Fix tensor shape and re-enable fused gate/up projection 2025-09-25 00:19:16 +02:00
turboderp
1df9d371b3 Generator: Fix regression 2025-09-22 22:54:59 +02:00
turboderp
a602d2d409 GatedDeltaNet: Skip some boilerplate around fused_recurrent_gated_delta_rule to reduce overhead 2025-09-22 22:08:55 +02:00
turboderp
53f92693b2 GatedDeltaNet: Call causal_conv1d ext function directly 2025-09-22 22:07:14 +02:00
turboderp
2547c63b28 GatedRMSNorm: Custom kernel 2025-09-22 22:04:32 +02:00
turboderp
12eadfe114 Generator: Add requeue option 2025-09-22 03:34:31 +02:00
turboderp
1ff09ee3c4 Generator: Fix batching with recurrent states 2025-09-22 03:30:37 +02:00
turboderp
b2d4dcab73 BlockSparseMLP: Use MGEMM for intermediate batch sizes 2025-09-21 06:24:14 +02:00
turboderp
c207f98e63 Generator: Fix recurrent state batching 2025-09-21 05:24:06 +02:00
turboderp
8c71b0aa57 GatedDeltaNet: Fused kernel for splitting inputs, casting, applying sigmoid etc. 2025-09-21 05:03:43 +02:00
turboderp
1f6b3b5c0a Cleanup 2025-09-21 04:58:39 +02:00
turboderp
7a0b7de368 GatedDeltaNet: Bypass causal-conv1d interface to reduce CPU overhead 2025-09-21 04:58:39 +02:00
turboderp
bbaa86397f Add sigmoid gate kernel 2025-09-21 04:58:39 +02:00
turboderp
d9536003f2 GatedDeltaNet: Remove stray import 2025-09-21 04:16:05 +02:00
turboderp
7fda5c570c TP: Fix missing bias export in LayerNorm 2025-09-20 04:02:45 +02:00
turboderp
b25082a0be Update README 2025-09-19 19:24:01 +02:00
turboderp
aa9a315fd9 compare_q.py: Explicit GC between runs 2025-09-19 19:15:29 +02:00
turboderp
4f23a5c9ac Add Qwen3NextForCausalLM architecture 2025-09-19 01:35:39 +02:00
turboderp
9cfa445031 Add GatedDeltaNet module 2025-09-19 01:34:50 +02:00
turboderp
b507ea71b0 Add GatedRMSNorm module 2025-09-19 00:48:13 +02:00
turboderp
c0f6a77902 Generator: Recurrent support 2025-09-19 00:48:13 +02:00
turboderp
38cb1a3157 BlockSparseMLP: Support gate for shared expert 2025-09-19 00:48:13 +02:00
turboderp
159661dfb4 Attn: Support gated softmax attn (gate interleaved with query) 2025-09-19 00:48:13 +02:00
turboderp
8cb3535bcb Model: Fail to load in TP mode when model doesn't have TP implementation 2025-09-19 00:48:13 +02:00
turboderp
9f373a9460 Routing kernel: Support up to 512 experts 2025-09-19 00:48:13 +02:00
turboderp
cc2e9b4ecb Add recurrent cache 2025-09-19 00:48:12 +02:00
turboderp
c7f0b694b4 Update example 2025-09-19 00:48:12 +02:00
turboderp
3845775650 ppl_transformers.py: Explicitly make bfloat16 the default dtype 2025-09-18 22:11:19 +02:00
turboderp
476e591966 Fix some typos 2025-09-18 03:01:04 +02:00
turboderp
9f14bcf27c optimize.py: Compile all filter globs to single regex 2025-09-15 14:56:17 +02:00