Add more sh tags

2026-04-20 14:29:28 +00:00 · 2024-06-08 20:41:34 +02:00
parent 95c16a8bc8
commit de05ac696b
3 changed files with 10 additions and 10 deletions
--- a/README.md
+++ b/README.md
@@ -86,7 +86,7 @@ python test_inference.py -m <path_to_model> -p "Once upon a time,"

 A simple console chatbot is included. Run it with:

-```
+```sh
 python examples/chat.py -m <path_to_model> -mode llama -gs auto
 ```

@@ -115,7 +115,7 @@ and **exllamav2_HF** loaders.

 To install the current dev version, clone the repo and run the setup script:

-```
+```sh
 git clone https://github.com/turboderp/exllamav2
 cd exllamav2
 pip install -r requirements.txt
@@ -125,7 +125,7 @@ pip install .
 By default this will also compile and install the Torch C++ extension (`exllamav2_ext`) that the library relies on. 
 You can skip this step by setting the `EXLLAMA_NOCOMPILE` environment variable:

-```
+```sh
 EXLLAMA_NOCOMPILE= pip install .
 ```

@@ -142,7 +142,7 @@ PyTorch.

 Either download an appropriate wheel or install directly from the appropriate URL:

-```
+```sh
 pip install https://github.com/turboderp/exllamav2/releases/download/v0.0.12/exllamav2-0.0.12+cu121-cp311-cp311-linux_x86_64.whl
 ```

@@ -153,7 +153,7 @@ can also be installed this way, and it will build the extension while installing

 A PyPI package is available as well. This is the same as the JIT version (see above). It can be installed with:

-```
+```sh
 pip install exllamav2
 ```

--- a/doc/convert.md
+++ b/doc/convert.md
@@ -94,7 +94,7 @@ measurement pass on subsequent quants of the same model.

 Convert a model and create a directory containing the quantized version with all of its original files:

-```
+```sh
 python convert.py \
    -i /mnt/models/llama2-7b-fp16/ \
    -o /mnt/temp/exl2/ \
@@ -104,7 +104,7 @@ python convert.py \

 Run just the measurement pass on a model, clearing the working directory first:

-```
+```sh
 python convert.py \
    -i /mnt/models/llama2-7b-fp16/ \
    -o /mnt/temp/exl2/ \
@@ -114,7 +114,7 @@ python convert.py \

 Use that measurement to quantize the model at two different bitrates:

-```
+```sh
 python convert.py \
    -i /mnt/models/llama2-7b-fp16/ \
    -o /mnt/temp/exl2/ \
--- a/doc/eval.md
+++ b/doc/eval.md
@@ -29,7 +29,7 @@ in which ExLlama runs out of system memory when loading large models.
 This is the standard [HumanEval](https://github.com/openai/human-eval) test implemented for ExLlamaV2 with
 dynamic batching.

-```
+```sh
 pip install human-eval
 python eval/humaneval.py -m <model_dir> -o humaneval_output.json
 evaluate-functional-correctness humaneval_output.json
@@ -64,7 +64,7 @@ performance.
 This is the standard [MMLU](https://github.com/hendrycks/test) test implemented for ExLlamaV2 with
 dynamic batching.

-```
+```sh
 pip install datasets
 python eval/mmlu.py -m <model_dir>
 ```