Commit Graph

9 Commits

Author SHA1 Message Date
turboderp
f259fafda9 Optimization, wider loads in GPTQ kernel (int2) 2023-09-07 03:03:02 +02:00
turboderp
4b98d98a5c Fix bug in 6-bit matrix preproc 2023-09-06 08:47:09 +02:00
turboderp
7964c73241 Add sampling settings as cmdline options to chat example 2023-09-05 14:32:02 +02:00
turboderp
e7b50fedcb Fix chat example Llama mode (EOS was appended twice) 2023-09-05 14:24:53 +02:00
turboderp
fb0825207f Fix chat example Llama mode (EOS was appended twice) 2023-09-05 14:22:34 +02:00
turboderp
3c80d41234 Add 4-bit GPTQ support 2023-09-05 14:03:51 +02:00
turboderp
6d576b3e56 Reworking attention, allow for batched inference with independent cache per sequence 2023-09-03 15:56:38 +02:00
turboderp
4570f6ee17 Tidying up 2023-09-02 16:40:57 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00