Commit Graph

50 Commits

Author SHA1 Message Date
turboderp
157fcfb5b9 Add license 2023-09-12 06:52:09 +02:00
turboderp
8e3cd01889 Update README.md 2023-09-12 06:52:04 +02:00
turboderp
546c91482e Add FUNDING.yml 2023-09-12 06:42:55 +02:00
turboderp
c240eb0b70 Update README.md 2023-09-12 06:41:36 +02:00
turboderp
7704a6877b Fix VRAM usage estimate for linear layer spanning multiple shards 2023-09-11 19:20:24 +02:00
turboderp
c5c90a8b4b Clean up imports 2023-09-11 07:31:43 +02:00
turboderp
27bccf02d3 Clean up Torch extension script 2023-09-11 07:21:55 +02:00
turboderp
6e14f8802b Unsharding utility 2023-09-11 04:16:42 +02:00
turboderp
49c8d9e51d Fix load quant tensors that span multiple shards 2023-09-11 04:16:17 +02:00
turboderp
5dc32f0f8c Fix padding for head layer when vocab is extended 2023-09-10 20:12:15 +02:00
turboderp
8ca9a1896d Add sharding utility 2023-09-10 18:59:05 +02:00
turboderp
8f9617dd5d Fix image link 2023-09-10 18:58:54 +02:00
turboderp
f3ef397656 Add README.md 2023-09-10 15:42:38 +02:00
turboderp
ddaf503e98 Add README.md 2023-09-10 14:16:53 +02:00
turboderp
b4afc666dd Clean up examples 2023-09-10 14:16:42 +02:00
turboderp
c0ade31bfe Fix typo 2023-09-10 10:00:52 +02:00
turboderp
b389b474eb Add ninja requirement 2023-09-10 10:00:27 +02:00
turboderp
2617b6c012 Setuptools script 2023-09-10 09:02:05 +02:00
turboderp
0ec776f53e Add requirements.txt 2023-09-10 08:05:45 +02:00
turboderp
10899838ea Add speculative generator and example 2023-09-10 06:22:27 +02:00
turboderp
48f0db78b2 Improved VRAM predictions 2023-09-10 06:17:44 +02:00
turboderp
918368b295 34B testing 2023-09-10 06:15:33 +02:00
turboderp
6046dcf39a Util functions for mem debugging 2023-09-10 06:14:51 +02:00
turboderp
5d798a178a Cleaning up converter 2023-09-09 14:54:23 +02:00
turboderp
952c67c4ff Update defaults for convert script 2023-09-09 14:53:52 +02:00
turboderp
19e164eea2 CodeLlama system prompt 2023-09-09 14:53:02 +02:00
turboderp
18fe5d5a5a Forward pass chunking adapted from V1 2023-09-08 08:15:14 +02:00
turboderp
0af5c8a413 Forward pass chunking adapted from V1 2023-09-08 08:11:40 +02:00
turboderp
d00c03ea69 Optimization: write K/V directly into cache when possible 2023-09-08 07:53:01 +02:00
turboderp
f9dc978e01 Fix and test fallback matmul mode 2023-09-07 18:15:42 +02:00
turboderp
f79e16c5d0 Optimization, wider loads in EXL2 kernel (int4) 2023-09-07 10:56:43 +02:00
turboderp
1075b7514f Optimization, wider loads in GPTQ kernel (int4) 2023-09-07 04:26:45 +02:00
turboderp
c2f62e1f1f Optimization, wider loads in GPTQ kernel (int2) working 2023-09-07 04:07:13 +02:00
turboderp
f259fafda9 Optimization, wider loads in GPTQ kernel (int2) 2023-09-07 03:03:02 +02:00
turboderp
a0cb4355c3 Fix regression in EXL2 convert 2023-09-06 08:47:38 +02:00
turboderp
4b98d98a5c Fix bug in 6-bit matrix preproc 2023-09-06 08:47:09 +02:00
turboderp
7964c73241 Add sampling settings as cmdline options to chat example 2023-09-05 14:32:02 +02:00
turboderp
e7b50fedcb Fix chat example Llama mode (EOS was appended twice) 2023-09-05 14:24:53 +02:00
turboderp
fb0825207f Fix chat example Llama mode (EOS was appended twice) 2023-09-05 14:22:34 +02:00
turboderp
3c80d41234 Add 4-bit GPTQ support 2023-09-05 14:03:51 +02:00
turboderp
6d576b3e56 Reworking attention, allow for batched inference with independent cache per sequence 2023-09-03 15:56:38 +02:00
turboderp
4570f6ee17 Tidying up 2023-09-02 16:40:57 +02:00
turboderp
2a2cc16119 More kernel optimizin 2023-09-02 13:29:43 +02:00
turboderp
92ce76dec1 Kernel optimizations WIP 2023-09-02 05:37:00 +02:00
turboderp
c5cf3956dc Add speed test 2023-09-01 12:03:00 +02:00
turboderp
a386102ac6 Improve prediction of VRAM usage when loading model 2023-09-01 10:47:29 +02:00
turboderp
176dbc43ad CodeLlama rope_theta_support 2023-09-01 09:26:00 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00
turboderp
9cc802c11a Test commit 2023-08-30 11:03:34 +02:00
turboderp
03fb9db2e0 Initial commit 2023-08-30 10:54:23 +02:00