turboderp
|
ba8fbc0bb6
|
Attn/Norm: Allow norm over entire Q/K projection (not per head)
|
2025-10-31 11:21:25 +01:00 |
|
turboderp
|
8a044a4cf4
|
chat.py: Add MiniMax template
|
2025-10-31 11:18:35 +01:00 |
|
turboderp
|
3634436641
|
model_diff.py: Limit batch size (prevent OoM on output layer)
|
2025-10-29 21:15:35 +01:00 |
|
turboderp
|
8fc143534c
|
BlockSparseMLP: Fix graph update when no shared experts
|
2025-10-28 00:36:15 +01:00 |
|
turboderp
|
7380022912
|
Merge branch 'refs/heads/dev_graph' into dev
|
2025-10-24 03:11:50 +02:00 |
|
turboderp
|
eb26853fd3
|
Static cuBLAS workspace for graph nodes
|
2025-10-24 02:20:00 +02:00 |
|
turboderp
|
16c20e81f3
|
Add fused sigmoid-gate kernel
|
2025-10-24 02:18:52 +02:00 |
|
turboderp
|
660e6cc874
|
Add kernel: Allow broadcast
|
2025-10-23 02:24:44 +02:00 |
|
turboderp
|
377c8423be
|
GEMV kernel (WIP)
|
2025-10-23 00:38:45 +02:00 |
|
turboderp
|
98500b9d76
|
Kernels: Const correctness for good measure
|
2025-10-19 12:46:55 +02:00 |
|
turboderp
|
b817f0ee7f
|
Bump to v0.0.11
v0.0.11
|
2025-10-17 17:31:49 +02:00 |
|
turboderp
|
d5556db368
|
TP: Fix sharing 0-dimensional tensors
|
2025-10-17 17:31:23 +02:00 |
|
turboderp
|
a29d97a88e
|
Bump to v0.0.10
v0.0.10
|
2025-10-15 14:44:02 +02:00 |
|
turboderp
|
d6ab7c2a27
|
Generator: Reuse job reference when requeueing (req for async jobs)
|
2025-10-15 14:32:20 +02:00 |
|
turboderp
|
20f4ebeec1
|
BlockSparseMLP, MLP: Add bsz1 graphs
|
2025-10-15 14:30:35 +02:00 |
|
turboderp
|
f2b83490b4
|
Fix missing package dir
v0.0.9
|
2025-10-13 23:33:38 +02:00 |
|
turboderp
|
bdde47f090
|
Merge branch 'master' into dev
|
2025-10-13 23:15:13 +02:00 |
|
turboderp
|
6ec1b13c06
|
MLP: Fix edge case when hidden_size == interm_size (happens in some TP setups)
|
2025-10-13 22:13:51 +02:00 |
|
turboderp
|
ce50be3138
|
TP: Fix regression
|
2025-10-13 21:44:04 +02:00 |
|
turboderp
|
5d883a26e3
|
Bump to v0.0.9
|
2025-10-13 20:56:59 +02:00 |
|
turboderp
|
104f8bf522
|
asdasdasd
|
2025-10-13 01:09:29 +02:00 |
|
turboderp
|
0f2da5d6a7
|
GEMM: Lock MCG multiplier to 0xCBAC1FED and MUL1 to 0x83DCD12D. Make MCG the default codebook for new models.
|
2025-10-12 22:09:01 +02:00 |
|
turboderp
|
b65c833ddf
|
Linear: Use correct dtype for mcg and mul1 multipliers in bound classes
|
2025-10-12 15:30:58 +02:00 |
|
turboderp
|
d908a6c439
|
Convert: Increase default calibration to 250 rows, add more cal data
|
2025-10-12 14:12:59 +02:00 |
|
turboderp
|
917cce5aac
|
GEMM: Recognize A100 as Ampere
|
2025-10-11 23:57:53 +02:00 |
|
turboderp
|
8d38d4fcbd
|
Qcache: Fix batched page indexing
|
2025-10-10 01:05:56 +02:00 |
|
turboderp
|
41e28466e4
|
Merge branch 'dev'
v0.0.8
|
2025-10-10 00:06:38 +02:00 |
|
turboderp
|
19513f34e9
|
Bump to v0.0.8
|
2025-10-10 00:05:55 +02:00 |
|
turboderp
|
41544ae4ec
|
Fix @lrucache memory leaks
|
2025-10-10 00:04:54 +02:00 |
|
turboderp
|
19c7010e99
|
Filters: Allow framework to load if Formatron isn't available
(fail at runtime instead if attempting to use Formatron filter)
|
2025-10-09 19:06:52 +02:00 |
|
turboderp
|
992816b56f
|
Graph framework (unused, WIP)
|
2025-10-09 19:06:51 +02:00 |
|
turboderp
|
2ce86d9df8
|
Convert: Warn if TORCH_ALLOW_TF32_CUBLAS_OVERRIDE is set during conversion
|
2025-10-09 11:12:21 +02:00 |
|
turboderp
|
4356527867
|
Pin pydantic to 2.11.0
|
2025-10-09 11:00:25 +02:00 |
|
turboderp
|
001e2ab125
|
Addition kernel
|
2025-10-09 10:59:23 +02:00 |
|
turboderp
|
78391e76c2
|
Config: Read and expose max_position_embeddings
|
2025-10-06 20:01:41 +02:00 |
|
turboderp
|
402ab56b3e
|
Generator: Avoid recursive functions in defrag, should fix #78
|
2025-10-05 14:40:11 +02:00 |
|
turboderp
|
cb2a467c93
|
Fix cmdline argument collision
|
2025-10-05 01:30:20 +02:00 |
|
turboderp
|
4829ea43d9
|
Rework GEMM kernel tuning
|
2025-10-05 01:30:20 +02:00 |
|
turboderp
|
c3cae873c4
|
Fix regression (banned strings)
|
2025-10-04 22:54:17 +02:00 |
|
turboderp
|
8098d619f6
|
Update README.md
|
2025-09-28 18:23:39 +02:00 |
|
turboderp
|
4c23cb9d03
|
Bump to v0.0.7
v0.0.7
|
2025-09-28 17:33:33 +02:00 |
|
turboderp
|
6cea948c3d
|
More C++ modules
|
2025-09-28 16:55:17 +02:00 |
|
turboderp
|
13dd99cafe
|
GatedDeltaNet: Move import
|
2025-09-28 03:32:12 +02:00 |
|
turboderp
|
a8a3da598b
|
Add C++ modules
|
2025-09-28 00:08:51 +02:00 |
|
turboderp
|
8c4a542751
|
Generator: Fix max_rq_tokens edge case
|
2025-09-27 23:49:33 +02:00 |
|
turboderp
|
bbfd4e7069
|
Cleanup
|
2025-09-27 23:48:40 +02:00 |
|
turboderp
|
58f36bf14e
|
Quantizer optimizations
|
2025-09-26 23:05:54 +02:00 |
|
turboderp
|
4fb2e37d22
|
Fix typo
|
2025-09-25 05:28:51 +02:00 |
|
turboderp
|
700b394126
|
TP: Fix _InterlockedExchange MSVC compilation issue
|
2025-09-25 05:07:19 +02:00 |
|
turboderp
|
3ab408b45e
|
GatedDeltaNet: fsqrt -> sqrtf
|
2025-09-25 04:02:26 +02:00 |
|