Add more sh tags

This commit is contained in:
turboderp
2024-06-08 20:41:34 +02:00
parent 95c16a8bc8
commit de05ac696b
3 changed files with 10 additions and 10 deletions

View File

@@ -86,7 +86,7 @@ python test_inference.py -m <path_to_model> -p "Once upon a time,"
A simple console chatbot is included. Run it with:
```
```sh
python examples/chat.py -m <path_to_model> -mode llama -gs auto
```
@@ -115,7 +115,7 @@ and **exllamav2_HF** loaders.
To install the current dev version, clone the repo and run the setup script:
```
```sh
git clone https://github.com/turboderp/exllamav2
cd exllamav2
pip install -r requirements.txt
@@ -125,7 +125,7 @@ pip install .
By default this will also compile and install the Torch C++ extension (`exllamav2_ext`) that the library relies on.
You can skip this step by setting the `EXLLAMA_NOCOMPILE` environment variable:
```
```sh
EXLLAMA_NOCOMPILE= pip install .
```
@@ -142,7 +142,7 @@ PyTorch.
Either download an appropriate wheel or install directly from the appropriate URL:
```
```sh
pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.12/exllamav2-0.0.12+cu121-cp311-cp311-linux_x86_64.whl
```
@@ -153,7 +153,7 @@ can also be installed this way, and it will build the extension while installing
A PyPI package is available as well. This is the same as the JIT version (see above). It can be installed with:
```
```sh
pip install exllamav2
```

View File

@@ -94,7 +94,7 @@ measurement pass on subsequent quants of the same model.
Convert a model and create a directory containing the quantized version with all of its original files:
```
```sh
python convert.py \
-i /mnt/models/llama2-7b-fp16/ \
-o /mnt/temp/exl2/ \
@@ -104,7 +104,7 @@ python convert.py \
Run just the measurement pass on a model, clearing the working directory first:
```
```sh
python convert.py \
-i /mnt/models/llama2-7b-fp16/ \
-o /mnt/temp/exl2/ \
@@ -114,7 +114,7 @@ python convert.py \
Use that measurement to quantize the model at two different bitrates:
```
```sh
python convert.py \
-i /mnt/models/llama2-7b-fp16/ \
-o /mnt/temp/exl2/ \

View File

@@ -29,7 +29,7 @@ in which ExLlama runs out of system memory when loading large models.
This is the standard [HumanEval](https://github.com/openai/human-eval) test implemented for ExLlamaV2 with
dynamic batching.
```
```sh
pip install human-eval
python eval/humaneval.py -m <model_dir> -o humaneval_output.json
evaluate-functional-correctness humaneval_output.json
@@ -64,7 +64,7 @@ performance.
This is the standard [MMLU](https://github.com/hendrycks/test) test implemented for ExLlamaV2 with
dynamic batching.
```
```sh
pip install datasets
python eval/mmlu.py -m <model_dir>
```