676 Commits

Author SHA1 Message Date
turboderp
d85690204a Replacement safetensors lib for quantization 2026-01-27 00:52:54 +01:00
turboderp
428a082276 Add performance test 2026-01-22 23:28:53 +01:00
turboderp
91a11853cd Update README.md 2026-01-22 23:27:23 +01:00
turboderp
96ba966ad9 Bump to v0.0.20 v0.0.20 2026-01-19 23:21:59 +01:00
turboderp
0ecc37bf97 Fix ComboSampler init when initializing as greedy 2026-01-19 22:57:19 +01:00
turboderp
75ee2c78c3 Add Qwen2_5_VLForConditionalGeneration, refactor HCXVisionV2VisionModel as subclass of Qwen2_5VLVisionModel 2026-01-19 22:48:49 +01:00
turboderp
5a6975747f Bump to v0.0.19 v0.0.19 2026-01-16 23:28:09 +01:00
turboderp
c39616a7b5 Merge pull request #125 from amanwalksdownthestreet/fix-arch-suffix-parsing
arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.)
2026-01-14 22:11:43 +01:00
turboderp
f21b92e978 Add Adaptive-P sampler 2026-01-14 21:58:34 +01:00
turboderp
0d09af403a Diversity test: use greedy sampling for extraction 2026-01-14 21:40:31 +01:00
turboderp
e839152802 Add diversity test 2026-01-11 19:12:04 +01:00
turboderp
3186dca9da generator: Pad token mask when output layer is padded 2026-01-11 19:11:26 +01:00
turboderp
9043690801 generator: Free recurrent state after job completed (prevent memory leak with large job queue) 2026-01-11 17:38:15 +01:00
turboderp
e69d91b12b model_init: Add sampling args default overrides 2026-01-11 16:38:33 +01:00
turboderp
6b31fc00f5 Add HF tokenizer helper, refactor example 2026-01-11 12:49:12 +01:00
turboderp
288a98f5e3 Refactor sampler args for examples 2026-01-11 12:33:27 +01:00
turboderp
27c68d4e65 Update README.md 2026-01-10 15:59:46 +01:00
turboderp
539410a2a3 Support NanoChatForCausalLM 2026-01-10 15:59:08 +01:00
turboderp
3ecb9f54fb Merge pull request #136 from mindkrypted/feature/support-solar-open-moe
Add support for SolarOpenMoE architecture
2026-01-10 10:55:36 +01:00
mindkrypted
fd8659a6c3 Add support for SolarOpenMoE architecture 2026-01-07 13:45:23 -05:00
turboderp
703b05ab52 Update README.md 2026-01-06 16:08:23 +01:00
turboderp
a17d1a4334 Add HCXVisionV2ForCausalLM architecture 2026-01-06 16:01:54 +01:00
turboderp
7de8641fce Attn: Add varlen mode 2026-01-06 16:01:54 +01:00
turboderp
a026b32df3 Support IQuestCoderForCausalLM 2026-01-04 12:31:58 +01:00
turboderp
6e75e7b151 chat.py: Fix for models with eos_token_id=null 2026-01-04 02:02:10 +01:00
turboderp
227621e49e Support HyperCLOVAXForCausalLM 2026-01-03 03:22:50 +01:00
turboderp
a92cf0a13a Attn: Support custom softmax scale in SDPA mode 2026-01-03 03:22:13 +01:00
turboderp
cff5fd542c Embedding: Support embedding multiplier 2026-01-03 03:21:55 +01:00
turboderp
452803e73d Olmo3: Use default RoPE type for SWA layers 2025-12-26 21:38:56 +01:00
turboderp
195d01657a RoPE: Allow RoPE type override 2025-12-26 21:38:34 +01:00
turboderp
e8b77bba4a chat.py: Fix prompt tokens/s display 2025-12-25 23:18:50 +01:00
turboderp
80907797a5 chat.py: Add debug mode 2025-12-25 23:18:25 +01:00
turboderp
f0ea2ca858 Linear: Support new FP8 scale format 2025-12-23 21:05:05 +01:00
turboderp
2698a83022 RoPE: Let arch override theta key name 2025-12-23 21:04:41 +01:00
amanwalksdownthestreet
65cfaf3c60 arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.) 2025-12-16 23:06:34 -07:00
turboderp
a32e2219af Allow -hb 16 while quantizing 2025-12-13 20:55:25 +01:00
turboderp
104268521c Support Olmo3ForCausalLM 2025-12-13 20:49:03 +01:00
turboderp
bd0f26cd0e Fix comments 2025-12-10 21:47:30 +01:00
turboderp
1b7009c5b8 Merge remote-tracking branch 'origin/master' v0.0.18 2025-12-10 10:43:17 +01:00
turboderp
f9d0e6038f Bump to v0.0.18 2025-12-10 10:42:41 +01:00
turboderp
d8be5d638f chat.py: Read all stop conditions from config.json 2025-12-10 00:53:45 +01:00
turboderp
9b75bc5f58 Support Ministral3ForCausalLM 2025-12-10 00:53:22 +01:00
turboderp
9663357c4f Convert: Print some more RoPE debug info 2025-12-10 00:52:49 +01:00
turboderp
24caf2c762 RoPE: Accept partial_rotary_factor in rope_parameters 2025-12-10 00:52:29 +01:00
kingbri
e49c02a3aa Actions: Add builds for torch 2.9
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-12-09 15:03:25 -05:00
turboderp
4d4992a8b8 GLM4: Update config parser to support 4.6V 2025-12-08 21:39:42 +01:00
turboderp
1385486592 Bump to v0.0.17 v0.0.17 2025-12-07 17:47:20 +01:00
turboderp
784d3dc7e7 GEMM: Optimize reduction a little bit 2025-12-06 01:56:21 +01:00
turboderp
15b9c2b421 Cleanup 2025-12-06 01:55:56 +01:00
turboderp
700b34695f Generator: Fix #118, make sure prepare_logit_mask is only called on jobs in the sample batch.
Thanks to @EthanAndersonUSA
2025-12-05 16:29:13 +01:00