turboderp
|
d00c03ea69
|
Optimization: write K/V directly into cache when possible
|
2023-09-08 07:53:01 +02:00 |
|
turboderp
|
f9dc978e01
|
Fix and test fallback matmul mode
|
2023-09-07 18:15:42 +02:00 |
|
turboderp
|
f79e16c5d0
|
Optimization, wider loads in EXL2 kernel (int4)
|
2023-09-07 10:56:43 +02:00 |
|
turboderp
|
1075b7514f
|
Optimization, wider loads in GPTQ kernel (int4)
|
2023-09-07 04:26:45 +02:00 |
|
turboderp
|
c2f62e1f1f
|
Optimization, wider loads in GPTQ kernel (int2) working
|
2023-09-07 04:07:13 +02:00 |
|
turboderp
|
f259fafda9
|
Optimization, wider loads in GPTQ kernel (int2)
|
2023-09-07 03:03:02 +02:00 |
|
turboderp
|
a0cb4355c3
|
Fix regression in EXL2 convert
|
2023-09-06 08:47:38 +02:00 |
|
turboderp
|
4b98d98a5c
|
Fix bug in 6-bit matrix preproc
|
2023-09-06 08:47:09 +02:00 |
|
turboderp
|
7964c73241
|
Add sampling settings as cmdline options to chat example
|
2023-09-05 14:32:02 +02:00 |
|
turboderp
|
e7b50fedcb
|
Fix chat example Llama mode (EOS was appended twice)
|
2023-09-05 14:24:53 +02:00 |
|
turboderp
|
fb0825207f
|
Fix chat example Llama mode (EOS was appended twice)
|
2023-09-05 14:22:34 +02:00 |
|
turboderp
|
3c80d41234
|
Add 4-bit GPTQ support
|
2023-09-05 14:03:51 +02:00 |
|
turboderp
|
6d576b3e56
|
Reworking attention, allow for batched inference with independent cache per sequence
|
2023-09-03 15:56:38 +02:00 |
|
turboderp
|
4570f6ee17
|
Tidying up
|
2023-09-02 16:40:57 +02:00 |
|
turboderp
|
2a2cc16119
|
More kernel optimizin
|
2023-09-02 13:29:43 +02:00 |
|
turboderp
|
92ce76dec1
|
Kernel optimizations WIP
|
2023-09-02 05:37:00 +02:00 |
|
turboderp
|
c5cf3956dc
|
Add speed test
|
2023-09-01 12:03:00 +02:00 |
|
turboderp
|
a386102ac6
|
Improve prediction of VRAM usage when loading model
|
2023-09-01 10:47:29 +02:00 |
|
turboderp
|
176dbc43ad
|
CodeLlama rope_theta_support
|
2023-09-01 09:26:00 +02:00 |
|
turboderp
|
bb83469574
|
Initial commit
|
2023-08-30 11:05:23 +02:00 |
|
turboderp
|
9cc802c11a
|
Test commit
|
2023-08-30 11:03:34 +02:00 |
|
turboderp
|
03fb9db2e0
|
Initial commit
|
2023-08-30 10:54:23 +02:00 |
|