Commit Graph

570 Commits

Author SHA1 Message Date
turboderp
00ab0084c3 Basic LoRA support for MoE/Mixtral. Working but pretty slow for now 2023-12-24 01:41:23 +01:00
turboderp
b6b54dab00 Attempt to fix VC++ weirdness 2023-12-24 01:09:53 +01:00
turboderp
87225fe0c1 Optimize kernel batch performance 2023-12-23 22:05:41 +01:00
turboderp
7262fb8f9d Batch latency test script 2023-12-23 22:04:40 +01:00
turboderp
bf2710f008 Optimizer batched sampling 2023-12-23 22:04:10 +01:00
turboderp
845260cff6 Fix paths in setup.py 2023-12-23 14:22:46 +01:00
turboderp
be1e48dc4d Fix bug when applying offsets to position embeddings 2023-12-23 04:58:24 +01:00
turboderp
c284648dbe Add script to compare quantized and unquantized model 2023-12-23 02:57:13 +01:00
AlpinDale
a531dea6a0 Merge branch 'turboderp:master' into feat/frequency_presence_pen 2023-12-23 01:42:00 +00:00
turboderp
0b0afab9bd Merge pull request #239 from AlpinDale/master
feat: add top-A sampling
2023-12-23 02:33:41 +01:00
turboderp
b10e53822f Fix comment 2023-12-23 02:33:10 +01:00
turboderp
4eb05be36a Split up compilation some more 2023-12-23 01:35:10 +01:00
turboderp
6d63f46a93 Use multiple compilation units for templated kernels to speed up build 2023-12-23 01:26:05 +01:00
turboderp
e922f7e295 Fix console output in model_init 2023-12-23 01:23:34 +01:00
turboderp
c7de810313 Bit of cleanup 2023-12-23 01:23:04 +01:00
AlpinDale
1384eb540a add frequency and presence penalties 2023-12-21 17:19:47 +00:00
AlpinDale
ef81354232 add top_a to sample_basic 2023-12-21 16:11:42 +00:00
AlpinDale
8b22c0f2d1 do probs sorting 2023-12-21 16:06:19 +00:00
AlpinDale
a58176af64 just use temp_probs 2023-12-21 15:44:34 +00:00
AlpinDale
5131099b5f add top_a in a few more places 2023-12-21 15:28:34 +00:00
AlpinDale
f55bece3d3 top_a return type is float 2023-12-21 15:23:52 +00:00
AlpinDale
50856f1f2b Revert "remove algorithm include"
This reverts commit 1a20a02cc3.
2023-12-21 15:14:32 +00:00
AlpinDale
1a20a02cc3 remove algorithm include 2023-12-21 14:55:49 +00:00
AlpinDale
638af33e89 add top_a sampling 2023-12-21 14:54:47 +00:00
turboderp
c4ae226df5 HumanEval test 2023-12-21 12:29:09 +01:00
Ivan Sanchez
41efa463cd unpack prob from return of generator.stream() 2023-12-21 10:54:52 +00:00
turboderp
9009ba5cd2 Fix sampling for bsz > 1 2023-12-20 18:14:23 +01:00
Ivan Sanchez
b908544845 Add probabilities to streaming generator 2023-12-20 10:01:11 +00:00
turboderp
162fc5d62c model_init (and test_inference.py): add option to override no. experts per token from config.json 2023-12-19 16:38:36 +01:00
turboderp
8f40b5f92d Merge remote-tracking branch 'origin/master' 2023-12-19 00:14:51 +01:00
turboderp
5a61d6e821 Merge pull request #137 from deltaguo/master
Fix the garbadge output for ROCM
2023-12-18 14:04:36 +01:00
turboderp
9c81167e4e Update rms_norm.cu
use warpSize provided by hip
2023-12-18 14:04:15 +01:00
turboderp
fb1a20fbfd Merge pull request #231 from dvdtoth/master
Fix encoder in MMLU benchmark
2023-12-18 14:01:05 +01:00
turboderp
93bca57cc4 Free all VRAM when unloading quantized module 2023-12-18 01:30:21 +01:00
David Toth
ed7a104e71 Fix encoder in MMLU 2023-12-17 21:08:27 +00:00
turboderp
d1f2952cd6 Fix multiple caches not working with 8-bit cache mode 2023-12-17 14:41:21 +01:00
turboderp
e6bb29f06b Fix cache clone function 2023-12-17 13:54:41 +01:00
turboderp
a77a051025 Enable fused MoE kernels for num_experts = 4 2023-12-17 11:49:15 +01:00
turboderp
b121ee418f Fix typo 2023-12-17 10:40:11 +01:00
turboderp
a4ecea6d57 Bump to 0.0.11 v0.0.11 2023-12-16 23:48:53 +01:00
turboderp
3c6ee1bb61 Merge pull request #228 from turboderp/experimental
Merge experimental
2023-12-16 23:36:15 +01:00
turboderp
79eb742bcf Update README.md 2023-12-16 22:06:44 +01:00
turboderp
89587d13df Update convert.py instructions 2023-12-16 22:03:25 +01:00
turboderp
02e2cb4d4a Update convert.py instructions 2023-12-16 21:51:35 +01:00
turboderp
660ce041cf Merge remote-tracking branch 'origin/experimental' into experimental 2023-12-16 21:50:48 +01:00
turboderp
d979249790 Merge pull request #227 from turboderp/master
Merge changes from master
2023-12-16 21:39:09 +01:00
turboderp
8a19badb01 Fix bug in standard cal dataset 2023-12-16 20:30:40 +01:00
turboderp
37a1322096 Fix mistake in MLP measure 2023-12-16 20:30:25 +01:00
turboderp
d2753a29b8 Mixtral EXL2 support, initial 2023-12-16 16:50:50 +01:00
turboderp
371f875aef Un-hardcode number of experts per token 2023-12-15 21:07:27 +01:00