mirror of
https://github.com/turboderp-org/exllamav2.git
synced 2026-04-20 14:29:28 +00:00
Add more sh tags
This commit is contained in:
10
README.md
10
README.md
@@ -86,7 +86,7 @@ python test_inference.py -m <path_to_model> -p "Once upon a time,"
|
||||
|
||||
A simple console chatbot is included. Run it with:
|
||||
|
||||
```
|
||||
```sh
|
||||
python examples/chat.py -m <path_to_model> -mode llama -gs auto
|
||||
```
|
||||
|
||||
@@ -115,7 +115,7 @@ and **exllamav2_HF** loaders.
|
||||
|
||||
To install the current dev version, clone the repo and run the setup script:
|
||||
|
||||
```
|
||||
```sh
|
||||
git clone https://github.com/turboderp/exllamav2
|
||||
cd exllamav2
|
||||
pip install -r requirements.txt
|
||||
@@ -125,7 +125,7 @@ pip install .
|
||||
By default this will also compile and install the Torch C++ extension (`exllamav2_ext`) that the library relies on.
|
||||
You can skip this step by setting the `EXLLAMA_NOCOMPILE` environment variable:
|
||||
|
||||
```
|
||||
```sh
|
||||
EXLLAMA_NOCOMPILE= pip install .
|
||||
```
|
||||
|
||||
@@ -142,7 +142,7 @@ PyTorch.
|
||||
|
||||
Either download an appropriate wheel or install directly from the appropriate URL:
|
||||
|
||||
```
|
||||
```sh
|
||||
pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.12/exllamav2-0.0.12+cu121-cp311-cp311-linux_x86_64.whl
|
||||
```
|
||||
|
||||
@@ -153,7 +153,7 @@ can also be installed this way, and it will build the extension while installing
|
||||
|
||||
A PyPI package is available as well. This is the same as the JIT version (see above). It can be installed with:
|
||||
|
||||
```
|
||||
```sh
|
||||
pip install exllamav2
|
||||
```
|
||||
|
||||
|
||||
@@ -94,7 +94,7 @@ measurement pass on subsequent quants of the same model.
|
||||
|
||||
Convert a model and create a directory containing the quantized version with all of its original files:
|
||||
|
||||
```
|
||||
```sh
|
||||
python convert.py \
|
||||
-i /mnt/models/llama2-7b-fp16/ \
|
||||
-o /mnt/temp/exl2/ \
|
||||
@@ -104,7 +104,7 @@ python convert.py \
|
||||
|
||||
Run just the measurement pass on a model, clearing the working directory first:
|
||||
|
||||
```
|
||||
```sh
|
||||
python convert.py \
|
||||
-i /mnt/models/llama2-7b-fp16/ \
|
||||
-o /mnt/temp/exl2/ \
|
||||
@@ -114,7 +114,7 @@ python convert.py \
|
||||
|
||||
Use that measurement to quantize the model at two different bitrates:
|
||||
|
||||
```
|
||||
```sh
|
||||
python convert.py \
|
||||
-i /mnt/models/llama2-7b-fp16/ \
|
||||
-o /mnt/temp/exl2/ \
|
||||
|
||||
@@ -29,7 +29,7 @@ in which ExLlama runs out of system memory when loading large models.
|
||||
This is the standard [HumanEval](https://github.com/openai/human-eval) test implemented for ExLlamaV2 with
|
||||
dynamic batching.
|
||||
|
||||
```
|
||||
```sh
|
||||
pip install human-eval
|
||||
python eval/humaneval.py -m <model_dir> -o humaneval_output.json
|
||||
evaluate-functional-correctness humaneval_output.json
|
||||
@@ -64,7 +64,7 @@ performance.
|
||||
This is the standard [MMLU](https://github.com/hendrycks/test) test implemented for ExLlamaV2 with
|
||||
dynamic batching.
|
||||
|
||||
```
|
||||
```sh
|
||||
pip install datasets
|
||||
python eval/mmlu.py -m <model_dir>
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user