Commit Graph

13 Commits

Author SHA1 Message Date
kingbri
e8f00412f6 Model: Fetch from generation_config and tokenizer_config in Exl3
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
kingbri
eca403a0e4 Model: Add Exllamav3 sampler
File was not included in previous commit.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
kingbri
bdc5189a4b Exl3: Add chunk size, cache size, and model info
Use the same algorithm for estimating and adjusting cache size based
on multiples of 256 and above max seq len.

Same applies for chunk size.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
kingbri
303e2dde12 Model: Correct exl3 generation, add concurrency, and cleanup
Fixes application of sampler parameters by adding a new sampler builder
interface. Also expose the generator class-wide and add wait_for_jobs.

Finally, allow inline loading to specify the backend.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
randoentity
c744790f14 fixup: add sampler logs
Also passing sampler to job with this, no idea if this is correct
2025-05-02 21:33:25 -04:00
randoentity
b35c48da37 fixup: some metrics 2025-05-02 21:33:25 -04:00
randoentity
c0f268f33e fixup: autosplit, start work on metrics 2025-05-02 21:33:25 -04:00
randoentity
306fc7cd15 fixup: autosplit reserve
this probably breaks v2 support
2025-05-02 21:33:25 -04:00
randoentity
acb3adb953 fixup: auto split 2025-05-02 21:33:25 -04:00
randoentity
14fb573371 fixup: max_seq_len
Whoops
2025-05-02 21:33:25 -04:00
randoentity
daae9ec43d Exl3: Couldn't wait
Just copied some stuff around and it ended up working for basic use.
2025-05-02 21:33:25 -04:00
kingbri
b4ff2f23cf Exl3: Add token encode, decode, and special token fetch
Base class methods

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:32:53 -04:00
kingbri
0c1d794390 Model: Add exl3 and associated load functions
Initial exl3 compat and loading functionality.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:32:39 -04:00