Commit Graph

25 Commits

Author SHA1 Message Date
turboderp
5dec977006 Refactor chat example, split out prompt formats, add working option for TinyLlama-chat 2023-10-04 23:18:45 +02:00
turboderp
d09a3fa000 Add Orca prompt format to chat example 2023-10-04 01:44:57 +02:00
turboderp
d3217f0e4c Refactor code formatting, integrate in chatbot example 2023-10-01 12:51:20 +02:00
turboderp
51a0104bba WebSocket server (WIP) 2023-09-30 23:52:11 +02:00
turboderp
0961876eb2 Merge pull request #71 from SinanAkkoyun/code-chat
Code highlighting in chat CLI
2023-09-29 23:31:40 +02:00
turboderp
c136b2284c Add token healing 2023-09-29 22:33:51 +02:00
Sinan Akkoyun
fa23466f68 Really fixed the codeblock lang problem lol 2023-09-29 16:25:38 +02:00
Sinan Akkoyun
4f6f37c4a4 Removed lang after ``` in output 2023-09-29 16:17:16 +02:00
Sinan Akkoyun
2a43d3069d Added codeblock highlighting to chatcode.py 2023-09-29 15:57:28 +02:00
turboderp
ba5f6191c8 Add typical setting to chat example. 2023-09-26 19:50:44 +02:00
Jeff Kerr
c221ec3630 add comment on model.load() usage 2023-09-13 11:25:49 -04:00
turboderp
c5c90a8b4b Clean up imports 2023-09-11 07:31:43 +02:00
turboderp
b4afc666dd Clean up examples 2023-09-10 14:16:42 +02:00
turboderp
10899838ea Add speculative generator and example 2023-09-10 06:22:27 +02:00
turboderp
19e164eea2 CodeLlama system prompt 2023-09-09 14:53:02 +02:00
turboderp
f79e16c5d0 Optimization, wider loads in EXL2 kernel (int4) 2023-09-07 10:56:43 +02:00
turboderp
f259fafda9 Optimization, wider loads in GPTQ kernel (int2) 2023-09-07 03:03:02 +02:00
turboderp
4b98d98a5c Fix bug in 6-bit matrix preproc 2023-09-06 08:47:09 +02:00
turboderp
7964c73241 Add sampling settings as cmdline options to chat example 2023-09-05 14:32:02 +02:00
turboderp
e7b50fedcb Fix chat example Llama mode (EOS was appended twice) 2023-09-05 14:24:53 +02:00
turboderp
fb0825207f Fix chat example Llama mode (EOS was appended twice) 2023-09-05 14:22:34 +02:00
turboderp
3c80d41234 Add 4-bit GPTQ support 2023-09-05 14:03:51 +02:00
turboderp
6d576b3e56 Reworking attention, allow for batched inference with independent cache per sequence 2023-09-03 15:56:38 +02:00
turboderp
4570f6ee17 Tidying up 2023-09-02 16:40:57 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00