Commit Graph

711 Commits

Author SHA1 Message Date
turboderp
ba8fbc0bb6 Attn/Norm: Allow norm over entire Q/K projection (not per head) 2025-10-31 11:21:25 +01:00
turboderp
8a044a4cf4 chat.py: Add MiniMax template 2025-10-31 11:18:35 +01:00
turboderp
3634436641 model_diff.py: Limit batch size (prevent OoM on output layer) 2025-10-29 21:15:35 +01:00
turboderp
8fc143534c BlockSparseMLP: Fix graph update when no shared experts 2025-10-28 00:36:15 +01:00
turboderp
7380022912 Merge branch 'refs/heads/dev_graph' into dev 2025-10-24 03:11:50 +02:00
turboderp
eb26853fd3 Static cuBLAS workspace for graph nodes 2025-10-24 02:20:00 +02:00
turboderp
16c20e81f3 Add fused sigmoid-gate kernel 2025-10-24 02:18:52 +02:00
turboderp
660e6cc874 Add kernel: Allow broadcast 2025-10-23 02:24:44 +02:00
turboderp
377c8423be GEMV kernel (WIP) 2025-10-23 00:38:45 +02:00
turboderp
98500b9d76 Kernels: Const correctness for good measure 2025-10-19 12:46:55 +02:00
turboderp
b817f0ee7f Bump to v0.0.11 v0.0.11 2025-10-17 17:31:49 +02:00
turboderp
d5556db368 TP: Fix sharing 0-dimensional tensors 2025-10-17 17:31:23 +02:00
turboderp
a29d97a88e Bump to v0.0.10 v0.0.10 2025-10-15 14:44:02 +02:00
turboderp
d6ab7c2a27 Generator: Reuse job reference when requeueing (req for async jobs) 2025-10-15 14:32:20 +02:00
turboderp
20f4ebeec1 BlockSparseMLP, MLP: Add bsz1 graphs 2025-10-15 14:30:35 +02:00
turboderp
f2b83490b4 Fix missing package dir v0.0.9 2025-10-13 23:33:38 +02:00
turboderp
bdde47f090 Merge branch 'master' into dev 2025-10-13 23:15:13 +02:00
turboderp
6ec1b13c06 MLP: Fix edge case when hidden_size == interm_size (happens in some TP setups) 2025-10-13 22:13:51 +02:00
turboderp
ce50be3138 TP: Fix regression 2025-10-13 21:44:04 +02:00
turboderp
5d883a26e3 Bump to v0.0.9 2025-10-13 20:56:59 +02:00
turboderp
104f8bf522 asdasdasd 2025-10-13 01:09:29 +02:00
turboderp
0f2da5d6a7 GEMM: Lock MCG multiplier to 0xCBAC1FED and MUL1 to 0x83DCD12D. Make MCG the default codebook for new models. 2025-10-12 22:09:01 +02:00
turboderp
b65c833ddf Linear: Use correct dtype for mcg and mul1 multipliers in bound classes 2025-10-12 15:30:58 +02:00
turboderp
d908a6c439 Convert: Increase default calibration to 250 rows, add more cal data 2025-10-12 14:12:59 +02:00
turboderp
917cce5aac GEMM: Recognize A100 as Ampere 2025-10-11 23:57:53 +02:00
turboderp
8d38d4fcbd Qcache: Fix batched page indexing 2025-10-10 01:05:56 +02:00
turboderp
41e28466e4 Merge branch 'dev' v0.0.8 2025-10-10 00:06:38 +02:00
turboderp
19513f34e9 Bump to v0.0.8 2025-10-10 00:05:55 +02:00
turboderp
41544ae4ec Fix @lrucache memory leaks 2025-10-10 00:04:54 +02:00
turboderp
19c7010e99 Filters: Allow framework to load if Formatron isn't available
(fail at runtime instead if attempting to use Formatron filter)
2025-10-09 19:06:52 +02:00
turboderp
992816b56f Graph framework (unused, WIP) 2025-10-09 19:06:51 +02:00
turboderp
2ce86d9df8 Convert: Warn if TORCH_ALLOW_TF32_CUBLAS_OVERRIDE is set during conversion 2025-10-09 11:12:21 +02:00
turboderp
4356527867 Pin pydantic to 2.11.0 2025-10-09 11:00:25 +02:00
turboderp
001e2ab125 Addition kernel 2025-10-09 10:59:23 +02:00
turboderp
78391e76c2 Config: Read and expose max_position_embeddings 2025-10-06 20:01:41 +02:00
turboderp
402ab56b3e Generator: Avoid recursive functions in defrag, should fix #78 2025-10-05 14:40:11 +02:00
turboderp
cb2a467c93 Fix cmdline argument collision 2025-10-05 01:30:20 +02:00
turboderp
4829ea43d9 Rework GEMM kernel tuning 2025-10-05 01:30:20 +02:00
turboderp
c3cae873c4 Fix regression (banned strings) 2025-10-04 22:54:17 +02:00
turboderp
8098d619f6 Update README.md 2025-09-28 18:23:39 +02:00
turboderp
4c23cb9d03 Bump to v0.0.7 v0.0.7 2025-09-28 17:33:33 +02:00
turboderp
6cea948c3d More C++ modules 2025-09-28 16:55:17 +02:00
turboderp
13dd99cafe GatedDeltaNet: Move import 2025-09-28 03:32:12 +02:00
turboderp
a8a3da598b Add C++ modules 2025-09-28 00:08:51 +02:00
turboderp
8c4a542751 Generator: Fix max_rq_tokens edge case 2025-09-27 23:49:33 +02:00
turboderp
bbfd4e7069 Cleanup 2025-09-27 23:48:40 +02:00
turboderp
58f36bf14e Quantizer optimizations 2025-09-26 23:05:54 +02:00
turboderp
4fb2e37d22 Fix typo 2025-09-25 05:28:51 +02:00
turboderp
700b394126 TP: Fix _InterlockedExchange MSVC compilation issue 2025-09-25 05:07:19 +02:00
turboderp
3ab408b45e GatedDeltaNet: fsqrt -> sqrtf 2025-09-25 04:02:26 +02:00