35 Commits

Author SHA1 Message Date
turboderp
36a636b478 Refactor and rework Gemma4 implementation:
- Remove custom quant cache layer stuff for now (cache quant needs to be tested with all the new changes)
- Move preprocessing to separate util module
- Replace dedicated Gemma4 modules with existing generic modules, make necessary adjustments:
   - SDPA fallback triggers whenever head_dim > 512 (xformers also added, but its GQA impl. is buggy and needs an annoying workaround that slows it down a lot)
   - Add necessary extra norms, new transpose args and second residual channel to BlockSparseMLP (dense_mlp becomes shared expert instead)
   - Add layer scalar per decoder block
   - Don't apply embedding multiplier to embedded MM tokens
- Ensure embedding scaling exactly matches HF bfloat16 version

Vision stuff:
- Handle non-causal attention in multimodal spans with multiple (flash) attn passes rather than custom mask.
- Avoid extending chunk size past the first MM span (allow small amount of redundant processing to keep VRAM overhead relatively constant.)
- Fold Gemma4VisionStandardize into Gemma4VisionPooler
- Replace Gemma4VisionProjector with RMSNorm+Linear modules
- Use 2D RoPE in kernel instead of precomputed sin,cos tensors
- Use non-causal attention with no mask (HF reference pads all embeddings to the same size of 280 tokens and then has to apply a custom attn mask to make that work, but the padding tokens are discarded anyway so there's no point)
2026-04-08 03:59:52 +02:00
turboderp
aaf6337f12 Add OlmoHybridForCausalLM 2026-03-13 00:59:10 +01:00
turboderp
60afe8d983 Update README.md 2026-03-07 20:48:24 +01:00
turboderp
e12e6bd759 Update README.md 2026-03-02 15:51:58 +01:00
turboderp
88dcdf782d Update README.md 2026-03-02 03:49:28 +01:00
turboderp
91a11853cd Update README.md 2026-01-22 23:27:23 +01:00
turboderp
27c68d4e65 Update README.md 2026-01-10 15:59:46 +01:00
turboderp
703b05ab52 Update README.md 2026-01-06 16:08:23 +01:00
turboderp
a026b32df3 Support IQuestCoderForCausalLM 2026-01-04 12:31:58 +01:00
turboderp
227621e49e Support HyperCLOVAXForCausalLM 2026-01-03 03:22:50 +01:00
turboderp
104268521c Support Olmo3ForCausalLM 2025-12-13 20:49:03 +01:00
turboderp
9b58b45999 Update README.md 2025-11-13 17:23:41 +01:00
turboderp
08a82f36a3 Add Glm4V architecture 2025-11-13 13:00:19 +01:00
turboderp
98e1c4017c Update README.md 2025-11-09 23:02:08 +01:00
turboderp
3533b307c3 Update README.md 2025-11-01 19:42:18 +01:00
turboderp
8098d619f6 Update README.md 2025-09-28 18:23:39 +02:00
turboderp
8c71b0aa57 GatedDeltaNet: Fused kernel for splitting inputs, casting, applying sigmoid etc. 2025-09-21 05:03:43 +02:00
turboderp
b25082a0be Update README 2025-09-19 19:24:01 +02:00
turboderp
8f28558eed Update README.md 2025-09-04 02:50:37 +02:00
turboderp
d8213dc04c Update README.md 2025-09-03 02:16:43 +02:00
turboderp
d8167b0cf4 Update README.md 2025-08-26 13:34:01 +02:00
turboderp
26d89e36e1 Update README 2025-08-01 19:27:42 +02:00
turboderp
2957e20cad Update README.md 2025-07-12 18:33:18 +02:00
turboderp
0e8ad5418d Update readme 2025-06-02 02:31:04 +02:00
turboderp
d1e3b2b20e Update README 2025-05-14 17:53:44 +02:00
turboderp
3b4c3d4dde Update readme 2025-04-27 01:28:19 +02:00
turboderp
cf84811485 Add cache quantization 2025-04-22 21:52:33 +02:00
turboderp
c0e9242315 Update installation instructions 2025-04-17 19:37:07 +02:00
turboderp
5b900442fe Fix typo 2025-04-12 00:38:31 +02:00
turboderp
ae9f910be2 Help for conversion script 2025-04-11 00:26:03 +02:00
turboderp
08cf8e91ce Cleanup and comments 2025-04-09 14:18:23 +02:00
turboderp
f579de4923 Link to model collection 2025-04-06 22:13:32 +02:00
turboderp
90db82667e Typo 2025-04-06 15:21:55 +02:00
turboderp
942ecaf18b Formatting 2025-04-06 15:08:25 +02:00
turboderp
543c4b2771 Initial commit 2025-04-06 14:42:49 +02:00