Commit Graph

1072 Commits

Author SHA1 Message Date
turboderp
d00c03ea69 Optimization: write K/V directly into cache when possible 2023-09-08 07:53:01 +02:00
turboderp
f9dc978e01 Fix and test fallback matmul mode 2023-09-07 18:15:42 +02:00
turboderp
f79e16c5d0 Optimization, wider loads in EXL2 kernel (int4) 2023-09-07 10:56:43 +02:00
turboderp
1075b7514f Optimization, wider loads in GPTQ kernel (int4) 2023-09-07 04:26:45 +02:00
turboderp
c2f62e1f1f Optimization, wider loads in GPTQ kernel (int2) working 2023-09-07 04:07:13 +02:00
turboderp
f259fafda9 Optimization, wider loads in GPTQ kernel (int2) 2023-09-07 03:03:02 +02:00
turboderp
a0cb4355c3 Fix regression in EXL2 convert 2023-09-06 08:47:38 +02:00
turboderp
4b98d98a5c Fix bug in 6-bit matrix preproc 2023-09-06 08:47:09 +02:00
turboderp
7964c73241 Add sampling settings as cmdline options to chat example 2023-09-05 14:32:02 +02:00
turboderp
e7b50fedcb Fix chat example Llama mode (EOS was appended twice) 2023-09-05 14:24:53 +02:00
turboderp
fb0825207f Fix chat example Llama mode (EOS was appended twice) 2023-09-05 14:22:34 +02:00
turboderp
3c80d41234 Add 4-bit GPTQ support 2023-09-05 14:03:51 +02:00
turboderp
6d576b3e56 Reworking attention, allow for batched inference with independent cache per sequence 2023-09-03 15:56:38 +02:00
turboderp
4570f6ee17 Tidying up 2023-09-02 16:40:57 +02:00
turboderp
2a2cc16119 More kernel optimizin 2023-09-02 13:29:43 +02:00
turboderp
92ce76dec1 Kernel optimizations WIP 2023-09-02 05:37:00 +02:00
turboderp
c5cf3956dc Add speed test 2023-09-01 12:03:00 +02:00
turboderp
a386102ac6 Improve prediction of VRAM usage when loading model 2023-09-01 10:47:29 +02:00
turboderp
176dbc43ad CodeLlama rope_theta_support 2023-09-01 09:26:00 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00
turboderp
9cc802c11a Test commit 2023-08-30 11:03:34 +02:00
turboderp
03fb9db2e0 Initial commit 2023-08-30 10:54:23 +02:00