turboderp
|
19e164eea2
|
CodeLlama system prompt
|
2023-09-09 14:53:02 +02:00 |
|
turboderp
|
f79e16c5d0
|
Optimization, wider loads in EXL2 kernel (int4)
|
2023-09-07 10:56:43 +02:00 |
|
turboderp
|
f259fafda9
|
Optimization, wider loads in GPTQ kernel (int2)
|
2023-09-07 03:03:02 +02:00 |
|
turboderp
|
4b98d98a5c
|
Fix bug in 6-bit matrix preproc
|
2023-09-06 08:47:09 +02:00 |
|
turboderp
|
7964c73241
|
Add sampling settings as cmdline options to chat example
|
2023-09-05 14:32:02 +02:00 |
|
turboderp
|
e7b50fedcb
|
Fix chat example Llama mode (EOS was appended twice)
|
2023-09-05 14:24:53 +02:00 |
|
turboderp
|
fb0825207f
|
Fix chat example Llama mode (EOS was appended twice)
|
2023-09-05 14:22:34 +02:00 |
|
turboderp
|
3c80d41234
|
Add 4-bit GPTQ support
|
2023-09-05 14:03:51 +02:00 |
|
turboderp
|
6d576b3e56
|
Reworking attention, allow for batched inference with independent cache per sequence
|
2023-09-03 15:56:38 +02:00 |
|
turboderp
|
4570f6ee17
|
Tidying up
|
2023-09-02 16:40:57 +02:00 |
|
turboderp
|
bb83469574
|
Initial commit
|
2023-08-30 11:05:23 +02:00 |
|