mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-10 05:50:08 +00:00
--------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com> common : add nemotron 3 parsing (#18077) common : add parser for ministral/mistral large 3/devstral 2 (#17713) common : default content to an empty string (#18485) chat: make tool description and parameters optional per OpenAI spec (#18478) Per the OpenAI API specification, both 'description' and 'parameters' fields in tool function definitions are optional. Previously, the parser would throw an exception if these fields were missing. Attempts to fix #17667 common : implement new jinja template engine (#18462) --------- Co-authored-by: Alde Rojas <hello@alde.dev> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> jinja: correct member access rule (#18905) jinja : fix lexing of float literals with sign (#18901) jinja : add missing tojson filter for bool (#18900) jinja : attribute support for join, map and sort (#18883) jinja : fix object item order (and properly implement dictsort) (#18904) tests : add test-jinja -py option for cross-checking (#18906) Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> ci : run test-jinja -py on high perf [no ci] (#18916) jinja : fix undefined keys and attributes and int/float as bool (#18924) jinja: support none|string (#18995) Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> jinja : implement mixed type object keys (#18955) --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (#19147) `tojson` is not a supported `undefined` filter keep it DRY and fix some types jinja : do not pass empty tools and add some none filters (#19176) jinja : add unordered_map include to value.h [no ci] (#19205) jinja : add missing 'in' test to template engine (#19004) (#19239) The jinja template parser was missing the 'in' test from global_builtins(), causing templates using reject("in", ...), select("in", ...), or 'x is in(y)' to fail with "selectattr: unknown test 'in'". This broke tool-calling for Qwen3-Coder and any other model whose chat template uses the 'in' test. Added test_is_in supporting array, string, and object containment checks, mirroring the existing 'in' operator logic in runtime.cpp. Includes test cases for all three containment types plus reject/select filter usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Add Jinja support for "indent" string filter (#19529) Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> add vendor refactor chat server : support preserving reasoning_content in assistant message (#18994) chat : fix translategemma crash on common_chat_format_example (#19019) chat: fix language input for translategemma (#19052) Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev> chat: fix case where template accepts type content only (#19419) mtmd : chat : Fix extra \n between text and media marker (#19595) Thanks to @tugot17 for detecting and reporting the issue. For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) `llama-mtmd-cli` produces identical output to HF implementation. However `llama-server` doesn't. I traced it down to extra newline inserted after `<__media__>`. This happens in `to_json_oaicompat`, that treats media markers as text and joins all parts with `\n` separator. PR introduces new type `media_marker` and uses it for media markers. Extra logic is added to prevent insertion of newlines before and after media markers. With this change number of input tokens is identical to HF implementation and as a result the output is also identical. I explored other ways to address the issue * remove completely `\n` between text parts in `to_json_oaicompat` * merge text messages in server-common.cpp before sending them to `to_json_oaicompat` Please propose alternative ways of fixing this issue. Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com> --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com> common : merge qwen3-coder and nemotron nano 3 parsers (#19765) common : fix improper trimming in XML parser on complete message (#19805) Co-authored-by: Jules LEIDELINGER <11395311+julio75012@users.noreply.github.com> jinja: correct stats for tojson and string filters (#19785) jinja : correct default size for string slices (#19913) common : handle unicode during partial json parsing (#16526) common : fix json schema with '\' in literals (#17307) add back qwen_coder_xml and mirothinker Co-authored-by: Aldehir Rojas <hello@alde.dev>
776 lines
38 KiB
Bash
Executable File
776 lines
38 KiB
Bash
Executable File
#/bin/bash
|
|
#
|
|
# sample usage:
|
|
#
|
|
# mkdir tmp
|
|
#
|
|
# # CPU-only build
|
|
# bash ./ci/run.sh ./tmp/results ./tmp/mnt
|
|
#
|
|
# # with CUDA support
|
|
# GG_BUILD_CUDA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
|
|
#
|
|
# # with SYCL support
|
|
# GG_BUILD_SYCL=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
|
|
#
|
|
|
|
if [ -z "$2" ]; then
|
|
echo "usage: $0 <output-dir> <mnt-dir>"
|
|
exit 1
|
|
fi
|
|
|
|
mkdir -p "$1"
|
|
mkdir -p "$2"
|
|
|
|
OUT=$(realpath "$1")
|
|
MNT=$(realpath "$2")
|
|
|
|
rm -f "$OUT/*.log"
|
|
rm -f "$OUT/*.exit"
|
|
rm -f "$OUT/*.md"
|
|
|
|
sd=`dirname $0`
|
|
cd $sd/../
|
|
SRC=`pwd`
|
|
|
|
CMAKE_EXTRA="-DLLAMA_FATAL_WARNINGS=ON"
|
|
|
|
if [ ! -z ${GG_BUILD_METAL} ]; then
|
|
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_METAL=ON"
|
|
fi
|
|
|
|
if [ ! -z ${GG_BUILD_CUDA} ]; then
|
|
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_CUDA=1"
|
|
fi
|
|
|
|
if [ ! -z ${GG_BUILD_SYCL} ]; then
|
|
if [ -z ${ONEAPI_ROOT} ]; then
|
|
echo "Not detected ONEAPI_ROOT, please install oneAPI base toolkit and enable it by:"
|
|
echo "source /opt/intel/oneapi/setvars.sh"
|
|
exit 1
|
|
fi
|
|
|
|
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_SYCL=1 DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON"
|
|
fi
|
|
## helpers
|
|
|
|
# download a file if it does not exist or if it is outdated
|
|
function gg_wget {
|
|
local out=$1
|
|
local url=$2
|
|
|
|
local cwd=`pwd`
|
|
|
|
mkdir -p $out
|
|
cd $out
|
|
|
|
# should not re-download if file is the same
|
|
wget -nv -N $url
|
|
|
|
cd $cwd
|
|
}
|
|
|
|
function gg_printf {
|
|
printf -- "$@" >> $OUT/README.md
|
|
}
|
|
|
|
function gg_run {
|
|
ci=$1
|
|
|
|
set -o pipefail
|
|
set -x
|
|
|
|
gg_run_$ci | tee $OUT/$ci.log
|
|
cur=$?
|
|
echo "$cur" > $OUT/$ci.exit
|
|
|
|
set +x
|
|
set +o pipefail
|
|
|
|
gg_sum_$ci
|
|
|
|
ret=$((ret | cur))
|
|
}
|
|
|
|
## ci
|
|
|
|
# ctest_debug
|
|
|
|
function gg_run_ctest_debug {
|
|
cd ${SRC}
|
|
|
|
rm -rf build-ci-debug && mkdir build-ci-debug && cd build-ci-debug
|
|
|
|
set -e
|
|
|
|
# Check cmake, make and ctest are installed
|
|
gg_check_build_requirements
|
|
|
|
(time cmake -DCMAKE_BUILD_TYPE=Debug ${CMAKE_EXTRA} .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
|
|
|
(time ctest --output-on-failure -L main -E test-opt ) 2>&1 | tee -a $OUT/${ci}-ctest.log
|
|
|
|
set +e
|
|
}
|
|
|
|
function gg_sum_ctest_debug {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'Runs ctest in debug mode\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '```\n'
|
|
gg_printf '%s\n' "$(cat $OUT/${ci}-ctest.log)"
|
|
gg_printf '```\n'
|
|
gg_printf '\n'
|
|
}
|
|
|
|
# ctest_release
|
|
|
|
function gg_run_ctest_release {
|
|
cd ${SRC}
|
|
|
|
rm -rf build-ci-release && mkdir build-ci-release && cd build-ci-release
|
|
|
|
set -e
|
|
|
|
# Check cmake, make and ctest are installed
|
|
gg_check_build_requirements
|
|
|
|
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
|
|
|
if [ -z ${GG_BUILD_LOW_PERF} ]; then
|
|
(time ctest --output-on-failure -L 'main|python' ) 2>&1 | tee -a $OUT/${ci}-ctest.log
|
|
else
|
|
(time ctest --output-on-failure -L main -E test-opt ) 2>&1 | tee -a $OUT/${ci}-ctest.log
|
|
fi
|
|
|
|
set +e
|
|
}
|
|
|
|
function gg_sum_ctest_release {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'Runs ctest in release mode\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '```\n'
|
|
gg_printf '%s\n' "$(cat $OUT/${ci}-ctest.log)"
|
|
gg_printf '```\n'
|
|
}
|
|
|
|
# test_scripts_debug
|
|
|
|
function gg_run_test_scripts_debug {
|
|
cd ${SRC}
|
|
|
|
set -e
|
|
|
|
(cd ./examples/gguf-split && time bash tests.sh "$SRC/build-ci-debug/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
|
|
(cd ./examples/quantize && time bash tests.sh "$SRC/build-ci-debug/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
|
|
|
|
set +e
|
|
}
|
|
|
|
function gg_sum_test_scripts_debug {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'Runs test scripts in debug mode\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '```\n'
|
|
gg_printf '%s\n' "$(cat $OUT/${ci}-scripts.log)"
|
|
gg_printf '```\n'
|
|
gg_printf '\n'
|
|
}
|
|
|
|
# test_scripts_release
|
|
|
|
function gg_run_test_scripts_release {
|
|
cd ${SRC}
|
|
|
|
set -e
|
|
|
|
(cd ./examples/gguf-split && time bash tests.sh "$SRC/build-ci-release/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
|
|
(cd ./examples/quantize && time bash tests.sh "$SRC/build-ci-release/bin" "$MNT/models") 2>&1 | tee -a $OUT/${ci}-scripts.log
|
|
|
|
set +e
|
|
}
|
|
|
|
function gg_sum_test_scripts_release {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'Runs test scripts in release mode\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '```\n'
|
|
gg_printf '%s\n' "$(cat $OUT/${ci}-scripts.log)"
|
|
gg_printf '```\n'
|
|
gg_printf '\n'
|
|
}
|
|
|
|
function gg_get_model {
|
|
local gguf_0="$MNT/models/pythia/1.4B/ggml-model-f16.gguf"
|
|
local gguf_1="$MNT/models/pythia/2.8B/ggml-model-f16.gguf"
|
|
local gguf_2="$MNT/models/open-llama/7B-v2/ggml-model-f16.gguf"
|
|
if [[ -s $gguf_0 ]]; then
|
|
echo -n "$gguf_0"
|
|
elif [[ -s $gguf_1 ]]; then
|
|
echo -n "$gguf_1"
|
|
elif [[ -s $gguf_2 ]]; then
|
|
echo -n "$gguf_2"
|
|
else
|
|
echo >&2 "No model found. Can't run gg_run_ctest_with_model."
|
|
exit 1
|
|
fi
|
|
}
|
|
|
|
function gg_run_ctest_with_model_debug {
|
|
cd ${SRC}
|
|
|
|
local model; model=$(gg_get_model)
|
|
cd build-ci-debug
|
|
set -e
|
|
(LLAMACPP_TEST_MODELFILE="$model" time ctest --output-on-failure -L model) 2>&1 | tee -a $OUT/${ci}-ctest.log
|
|
set +e
|
|
cd ..
|
|
}
|
|
|
|
function gg_run_ctest_with_model_release {
|
|
cd ${SRC}
|
|
|
|
local model; model=$(gg_get_model)
|
|
cd build-ci-release
|
|
set -e
|
|
(LLAMACPP_TEST_MODELFILE="$model" time ctest --output-on-failure -L model) 2>&1 | tee -a $OUT/${ci}-ctest.log
|
|
set +e
|
|
cd ..
|
|
}
|
|
|
|
function gg_sum_ctest_with_model_debug {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'Runs ctest with model files in debug mode\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '```\n'
|
|
gg_printf '%s\n' "$(cat $OUT/${ci}-ctest.log)"
|
|
gg_printf '```\n'
|
|
}
|
|
|
|
function gg_sum_ctest_with_model_release {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'Runs ctest with model files in release mode\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '```\n'
|
|
gg_printf '%s\n' "$(cat $OUT/${ci}-ctest.log)"
|
|
gg_printf '```\n'
|
|
}
|
|
|
|
# open_llama_7b_v2
|
|
# requires: GG_BUILD_CUDA
|
|
|
|
function gg_run_open_llama_7b_v2 {
|
|
cd ${SRC}
|
|
|
|
gg_wget models-mnt/open-llama/7B-v2/ https://huggingface.co/openlm-research/open_llama_7b_v2/raw/main/config.json
|
|
gg_wget models-mnt/open-llama/7B-v2/ https://huggingface.co/openlm-research/open_llama_7b_v2/resolve/main/tokenizer.model
|
|
gg_wget models-mnt/open-llama/7B-v2/ https://huggingface.co/openlm-research/open_llama_7b_v2/raw/main/tokenizer_config.json
|
|
gg_wget models-mnt/open-llama/7B-v2/ https://huggingface.co/openlm-research/open_llama_7b_v2/raw/main/special_tokens_map.json
|
|
gg_wget models-mnt/open-llama/7B-v2/ https://huggingface.co/openlm-research/open_llama_7b_v2/raw/main/pytorch_model.bin.index.json
|
|
gg_wget models-mnt/open-llama/7B-v2/ https://huggingface.co/openlm-research/open_llama_7b_v2/resolve/main/pytorch_model-00001-of-00002.bin
|
|
gg_wget models-mnt/open-llama/7B-v2/ https://huggingface.co/openlm-research/open_llama_7b_v2/resolve/main/pytorch_model-00002-of-00002.bin
|
|
gg_wget models-mnt/open-llama/7B-v2/ https://huggingface.co/openlm-research/open_llama_7b_v2/raw/main/generation_config.json
|
|
|
|
gg_wget models-mnt/wikitext/ https://huggingface.co/datasets/ggml-org/ci/resolve/main/wikitext-2-raw-v1.zip
|
|
unzip -o models-mnt/wikitext/wikitext-2-raw-v1.zip -d models-mnt/wikitext/
|
|
|
|
path_models="../models-mnt/open-llama/7B-v2"
|
|
path_wiki="../models-mnt/wikitext/wikitext-2-raw"
|
|
|
|
rm -rf build-ci-release && mkdir build-ci-release && cd build-ci-release
|
|
|
|
set -e
|
|
|
|
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DGGML_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
|
|
|
python3 ../examples/convert_legacy_llama.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
|
|
|
model_f16="${path_models}/ggml-model-f16.gguf"
|
|
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
|
model_q4_0="${path_models}/ggml-model-q4_0.gguf"
|
|
model_q4_1="${path_models}/ggml-model-q4_1.gguf"
|
|
model_q5_0="${path_models}/ggml-model-q5_0.gguf"
|
|
model_q5_1="${path_models}/ggml-model-q5_1.gguf"
|
|
model_q2_k="${path_models}/ggml-model-q2_k.gguf"
|
|
model_q3_k="${path_models}/ggml-model-q3_k.gguf"
|
|
model_q4_k="${path_models}/ggml-model-q4_k.gguf"
|
|
model_q5_k="${path_models}/ggml-model-q5_k.gguf"
|
|
model_q6_k="${path_models}/ggml-model-q6_k.gguf"
|
|
|
|
wiki_test="${path_wiki}/wiki.test.raw"
|
|
|
|
./bin/llama-quantize ${model_f16} ${model_q8_0} q8_0
|
|
./bin/llama-quantize ${model_f16} ${model_q4_0} q4_0
|
|
./bin/llama-quantize ${model_f16} ${model_q4_1} q4_1
|
|
./bin/llama-quantize ${model_f16} ${model_q5_0} q5_0
|
|
./bin/llama-quantize ${model_f16} ${model_q5_1} q5_1
|
|
./bin/llama-quantize ${model_f16} ${model_q2_k} q2_k
|
|
./bin/llama-quantize ${model_f16} ${model_q3_k} q3_k
|
|
./bin/llama-quantize ${model_f16} ${model_q4_k} q4_k
|
|
./bin/llama-quantize ${model_f16} ${model_q5_k} q5_k
|
|
./bin/llama-quantize ${model_f16} ${model_q6_k} q6_k
|
|
|
|
(time ./bin/llama-cli --model ${model_f16} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-f16.log
|
|
(time ./bin/llama-cli --model ${model_q8_0} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q8_0.log
|
|
(time ./bin/llama-cli --model ${model_q4_0} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_0.log
|
|
(time ./bin/llama-cli --model ${model_q4_1} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_1.log
|
|
(time ./bin/llama-cli --model ${model_q5_0} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_0.log
|
|
(time ./bin/llama-cli --model ${model_q5_1} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_1.log
|
|
(time ./bin/llama-cli --model ${model_q2_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q2_k.log
|
|
(time ./bin/llama-cli --model ${model_q3_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q3_k.log
|
|
(time ./bin/llama-cli --model ${model_q4_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_k.log
|
|
(time ./bin/llama-cli --model ${model_q5_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_k.log
|
|
(time ./bin/llama-cli --model ${model_q6_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q6_k.log
|
|
|
|
(time ./bin/llama-perplexity --model ${model_f16} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-f16.log
|
|
(time ./bin/llama-perplexity --model ${model_q8_0} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q8_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_0} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_1} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_1.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_0} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_1} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_1.log
|
|
(time ./bin/llama-perplexity --model ${model_q2_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q2_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q3_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q3_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q6_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q6_k.log
|
|
|
|
(time ./bin/llama-imatrix --model ${model_f16} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-imatrix.log
|
|
|
|
(time ./bin/llama-save-load-state -ngl 10 --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
(time ./bin/llama-save-load-state -fa -ngl 10 --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
(time ./bin/llama-save-load-state -ngl 99 --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
(time ./bin/llama-save-load-state -fa -ngl 99 --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
|
|
function check_ppl {
|
|
qnt="$1"
|
|
ppl=$(echo "$2" | grep -oE "[0-9]+\.[0-9]+" | tail -n 1)
|
|
|
|
if [ $(echo "$ppl > 20.0" | bc) -eq 1 ]; then
|
|
printf ' - %s @ %s (FAIL: ppl > 20.0)\n' "$qnt" "$ppl"
|
|
return 20
|
|
fi
|
|
|
|
printf ' - %s @ %s OK\n' "$qnt" "$ppl"
|
|
return 0
|
|
}
|
|
|
|
check_ppl "f16" "$(cat $OUT/${ci}-tg-f16.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q8_0" "$(cat $OUT/${ci}-tg-q8_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_0" "$(cat $OUT/${ci}-tg-q4_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_1" "$(cat $OUT/${ci}-tg-q4_1.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_0" "$(cat $OUT/${ci}-tg-q5_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_1" "$(cat $OUT/${ci}-tg-q5_1.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q2_k" "$(cat $OUT/${ci}-tg-q2_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q3_k" "$(cat $OUT/${ci}-tg-q3_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_k" "$(cat $OUT/${ci}-tg-q4_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_k" "$(cat $OUT/${ci}-tg-q5_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q6_k" "$(cat $OUT/${ci}-tg-q6_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
|
|
cat $OUT/${ci}-imatrix.log | grep "Final" >> $OUT/${ci}-imatrix-sum.log
|
|
|
|
set +e
|
|
}
|
|
|
|
function gg_sum_open_llama_7b_v2 {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'OpenLLaMA 7B-v2:\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '- perplexity:\n%s\n' "$(cat $OUT/${ci}-ppl.log)"
|
|
gg_printf '- imatrix:\n```\n%s\n```\n' "$(cat $OUT/${ci}-imatrix-sum.log)"
|
|
gg_printf '- f16: \n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-f16.log)"
|
|
gg_printf '- q8_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q8_0.log)"
|
|
gg_printf '- q4_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_0.log)"
|
|
gg_printf '- q4_1:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_1.log)"
|
|
gg_printf '- q5_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_0.log)"
|
|
gg_printf '- q5_1:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_1.log)"
|
|
gg_printf '- q2_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q2_k.log)"
|
|
gg_printf '- q3_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q3_k.log)"
|
|
gg_printf '- q4_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_k.log)"
|
|
gg_printf '- q5_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_k.log)"
|
|
gg_printf '- q6_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q6_k.log)"
|
|
gg_printf '- save-load-state: \n```\n%s\n```\n' "$(cat $OUT/${ci}-save-load-state.log)"
|
|
}
|
|
|
|
# pythia_1.4b
|
|
|
|
function gg_run_pythia_1_4b {
|
|
cd ${SRC}
|
|
|
|
gg_wget models-mnt/pythia/1.4B/ https://huggingface.co/EleutherAI/pythia-1.4b/raw/main/config.json
|
|
gg_wget models-mnt/pythia/1.4B/ https://huggingface.co/EleutherAI/pythia-1.4b/raw/main/tokenizer.json
|
|
gg_wget models-mnt/pythia/1.4B/ https://huggingface.co/EleutherAI/pythia-1.4b/raw/main/tokenizer_config.json
|
|
gg_wget models-mnt/pythia/1.4B/ https://huggingface.co/EleutherAI/pythia-1.4b/raw/main/special_tokens_map.json
|
|
gg_wget models-mnt/pythia/1.4B/ https://huggingface.co/EleutherAI/pythia-1.4b/resolve/main/pytorch_model.bin
|
|
|
|
gg_wget models-mnt/wikitext/ https://huggingface.co/datasets/ggml-org/ci/resolve/main/wikitext-2-raw-v1.zip
|
|
unzip -o models-mnt/wikitext/wikitext-2-raw-v1.zip -d models-mnt/wikitext/
|
|
head -n 60 models-mnt/wikitext/wikitext-2-raw/wiki.test.raw > models-mnt/wikitext/wikitext-2-raw/wiki.test-60.raw
|
|
|
|
path_models="../models-mnt/pythia/1.4B"
|
|
path_wiki="../models-mnt/wikitext/wikitext-2-raw"
|
|
|
|
rm -rf build-ci-release && mkdir build-ci-release && cd build-ci-release
|
|
|
|
set -e
|
|
|
|
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
|
|
|
python3 ../convert_hf_to_gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
|
|
|
model_f16="${path_models}/ggml-model-f16.gguf"
|
|
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
|
model_q4_0="${path_models}/ggml-model-q4_0.gguf"
|
|
model_q4_1="${path_models}/ggml-model-q4_1.gguf"
|
|
model_q5_0="${path_models}/ggml-model-q5_0.gguf"
|
|
model_q5_1="${path_models}/ggml-model-q5_1.gguf"
|
|
model_q2_k="${path_models}/ggml-model-q2_k.gguf"
|
|
model_q3_k="${path_models}/ggml-model-q3_k.gguf"
|
|
model_q4_k="${path_models}/ggml-model-q4_k.gguf"
|
|
model_q5_k="${path_models}/ggml-model-q5_k.gguf"
|
|
model_q6_k="${path_models}/ggml-model-q6_k.gguf"
|
|
|
|
wiki_test_60="${path_wiki}/wiki.test-60.raw"
|
|
|
|
./bin/llama-quantize ${model_f16} ${model_q8_0} q8_0
|
|
./bin/llama-quantize ${model_f16} ${model_q4_0} q4_0
|
|
./bin/llama-quantize ${model_f16} ${model_q4_1} q4_1
|
|
./bin/llama-quantize ${model_f16} ${model_q5_0} q5_0
|
|
./bin/llama-quantize ${model_f16} ${model_q5_1} q5_1
|
|
./bin/llama-quantize ${model_f16} ${model_q2_k} q2_k
|
|
./bin/llama-quantize ${model_f16} ${model_q3_k} q3_k
|
|
./bin/llama-quantize ${model_f16} ${model_q4_k} q4_k
|
|
./bin/llama-quantize ${model_f16} ${model_q5_k} q5_k
|
|
./bin/llama-quantize ${model_f16} ${model_q6_k} q6_k
|
|
|
|
(time ./bin/llama-cli --model ${model_f16} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-f16.log
|
|
(time ./bin/llama-cli --model ${model_q8_0} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q8_0.log
|
|
(time ./bin/llama-cli --model ${model_q4_0} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_0.log
|
|
(time ./bin/llama-cli --model ${model_q4_1} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_1.log
|
|
(time ./bin/llama-cli --model ${model_q5_0} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_0.log
|
|
(time ./bin/llama-cli --model ${model_q5_1} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_1.log
|
|
(time ./bin/llama-cli --model ${model_q2_k} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q2_k.log
|
|
(time ./bin/llama-cli --model ${model_q3_k} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q3_k.log
|
|
(time ./bin/llama-cli --model ${model_q4_k} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_k.log
|
|
(time ./bin/llama-cli --model ${model_q5_k} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_k.log
|
|
(time ./bin/llama-cli --model ${model_q6_k} -s 1234 -n 64 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q6_k.log
|
|
|
|
(time ./bin/llama-perplexity --model ${model_f16} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-f16.log
|
|
(time ./bin/llama-perplexity --model ${model_q8_0} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q8_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_0} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_1} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_1.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_0} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_1} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_1.log
|
|
(time ./bin/llama-perplexity --model ${model_q2_k} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q2_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q3_k} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q3_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_k} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_k} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q6_k} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-tg-q6_k.log
|
|
|
|
(time ./bin/llama-imatrix --model ${model_f16} -f ${wiki_test_60} -c 128 -b 128 --chunks 1 ) 2>&1 | tee -a $OUT/${ci}-imatrix.log
|
|
|
|
(time ./bin/llama-save-load-state --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
(time ./bin/llama-save-load-state -fa --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
|
|
function check_ppl {
|
|
qnt="$1"
|
|
ppl=$(echo "$2" | grep -oE "[0-9]+\.[0-9]+" | tail -n 1)
|
|
|
|
if [ $(echo "$ppl > 20.0" | bc) -eq 1 ]; then
|
|
printf ' - %s @ %s (FAIL: ppl > 20.0)\n' "$qnt" "$ppl"
|
|
return 20
|
|
fi
|
|
|
|
printf ' - %s @ %s OK\n' "$qnt" "$ppl"
|
|
return 0
|
|
}
|
|
|
|
check_ppl "f16" "$(cat $OUT/${ci}-tg-f16.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q8_0" "$(cat $OUT/${ci}-tg-q8_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_0" "$(cat $OUT/${ci}-tg-q4_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_1" "$(cat $OUT/${ci}-tg-q4_1.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_0" "$(cat $OUT/${ci}-tg-q5_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_1" "$(cat $OUT/${ci}-tg-q5_1.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
#check_ppl "q2_k" "$(cat $OUT/${ci}-tg-q2_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log # note: ppl > 20.0 for this quant and model
|
|
check_ppl "q3_k" "$(cat $OUT/${ci}-tg-q3_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_k" "$(cat $OUT/${ci}-tg-q4_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_k" "$(cat $OUT/${ci}-tg-q5_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q6_k" "$(cat $OUT/${ci}-tg-q6_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
|
|
cat $OUT/${ci}-imatrix.log | grep "Final" >> $OUT/${ci}-imatrix-sum.log
|
|
|
|
set +e
|
|
}
|
|
|
|
function gg_sum_pythia_1_4b {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'Pythia 1.4B:\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '- perplexity:\n%s\n' "$(cat $OUT/${ci}-ppl.log)"
|
|
gg_printf '- imatrix:\n```\n%s\n```\n' "$(cat $OUT/${ci}-imatrix-sum.log)"
|
|
gg_printf '- f16: \n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-f16.log)"
|
|
gg_printf '- q8_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q8_0.log)"
|
|
gg_printf '- q4_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_0.log)"
|
|
gg_printf '- q4_1:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_1.log)"
|
|
gg_printf '- q5_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_0.log)"
|
|
gg_printf '- q5_1:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_1.log)"
|
|
gg_printf '- q2_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q2_k.log)"
|
|
gg_printf '- q3_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q3_k.log)"
|
|
gg_printf '- q4_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_k.log)"
|
|
gg_printf '- q5_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_k.log)"
|
|
gg_printf '- q6_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q6_k.log)"
|
|
gg_printf '- save-load-state: \n```\n%s\n```\n' "$(cat $OUT/${ci}-save-load-state.log)"
|
|
}
|
|
|
|
# pythia_2_8b
|
|
# requires: GG_BUILD_CUDA
|
|
|
|
function gg_run_pythia_2_8b {
|
|
cd ${SRC}
|
|
|
|
gg_wget models-mnt/pythia/2.8B/ https://huggingface.co/EleutherAI/pythia-2.8b/raw/main/config.json
|
|
gg_wget models-mnt/pythia/2.8B/ https://huggingface.co/EleutherAI/pythia-2.8b/raw/main/tokenizer.json
|
|
gg_wget models-mnt/pythia/2.8B/ https://huggingface.co/EleutherAI/pythia-2.8b/raw/main/tokenizer_config.json
|
|
gg_wget models-mnt/pythia/2.8B/ https://huggingface.co/EleutherAI/pythia-2.8b/raw/main/special_tokens_map.json
|
|
gg_wget models-mnt/pythia/2.8B/ https://huggingface.co/EleutherAI/pythia-2.8b/resolve/main/pytorch_model.bin
|
|
|
|
gg_wget models-mnt/wikitext/ https://huggingface.co/datasets/ggml-org/ci/resolve/main/wikitext-2-raw-v1.zip
|
|
unzip -o models-mnt/wikitext/wikitext-2-raw-v1.zip -d models-mnt/wikitext/
|
|
|
|
path_models="../models-mnt/pythia/2.8B"
|
|
path_wiki="../models-mnt/wikitext/wikitext-2-raw"
|
|
|
|
rm -rf build-ci-release && mkdir build-ci-release && cd build-ci-release
|
|
|
|
set -e
|
|
|
|
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DGGML_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
|
|
|
python3 ../convert_hf_to_gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
|
|
|
model_f16="${path_models}/ggml-model-f16.gguf"
|
|
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
|
model_q4_0="${path_models}/ggml-model-q4_0.gguf"
|
|
model_q4_1="${path_models}/ggml-model-q4_1.gguf"
|
|
model_q5_0="${path_models}/ggml-model-q5_0.gguf"
|
|
model_q5_1="${path_models}/ggml-model-q5_1.gguf"
|
|
model_q2_k="${path_models}/ggml-model-q2_k.gguf"
|
|
model_q3_k="${path_models}/ggml-model-q3_k.gguf"
|
|
model_q4_k="${path_models}/ggml-model-q4_k.gguf"
|
|
model_q5_k="${path_models}/ggml-model-q5_k.gguf"
|
|
model_q6_k="${path_models}/ggml-model-q6_k.gguf"
|
|
|
|
wiki_test="${path_wiki}/wiki.test.raw"
|
|
|
|
./bin/llama-quantize ${model_f16} ${model_q8_0} q8_0
|
|
./bin/llama-quantize ${model_f16} ${model_q4_0} q4_0
|
|
./bin/llama-quantize ${model_f16} ${model_q4_1} q4_1
|
|
./bin/llama-quantize ${model_f16} ${model_q5_0} q5_0
|
|
./bin/llama-quantize ${model_f16} ${model_q5_1} q5_1
|
|
./bin/llama-quantize ${model_f16} ${model_q2_k} q2_k
|
|
./bin/llama-quantize ${model_f16} ${model_q3_k} q3_k
|
|
./bin/llama-quantize ${model_f16} ${model_q4_k} q4_k
|
|
./bin/llama-quantize ${model_f16} ${model_q5_k} q5_k
|
|
./bin/llama-quantize ${model_f16} ${model_q6_k} q6_k
|
|
|
|
(time ./bin/llama-cli --model ${model_f16} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-f16.log
|
|
(time ./bin/llama-cli --model ${model_q8_0} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q8_0.log
|
|
(time ./bin/llama-cli --model ${model_q4_0} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_0.log
|
|
(time ./bin/llama-cli --model ${model_q4_1} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_1.log
|
|
(time ./bin/llama-cli --model ${model_q5_0} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_0.log
|
|
(time ./bin/llama-cli --model ${model_q5_1} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_1.log
|
|
(time ./bin/llama-cli --model ${model_q2_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q2_k.log
|
|
(time ./bin/llama-cli --model ${model_q3_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q3_k.log
|
|
(time ./bin/llama-cli --model ${model_q4_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q4_k.log
|
|
(time ./bin/llama-cli --model ${model_q5_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q5_k.log
|
|
(time ./bin/llama-cli --model ${model_q6_k} -t 1 -ngl 999 -s 1234 -n 256 --ignore-eos -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q6_k.log
|
|
|
|
(time ./bin/llama-perplexity --model ${model_f16} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-f16.log
|
|
(time ./bin/llama-perplexity --model ${model_q8_0} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q8_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_0} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_1} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_1.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_0} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_0.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_1} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_1.log
|
|
(time ./bin/llama-perplexity --model ${model_q2_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q2_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q3_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q3_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q4_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q4_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q5_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q5_k.log
|
|
(time ./bin/llama-perplexity --model ${model_q6_k} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-tg-q6_k.log
|
|
|
|
(time ./bin/llama-imatrix --model ${model_f16} -f ${wiki_test} -t 1 -ngl 999 -c 2048 -b 512 --chunks 4 ) 2>&1 | tee -a $OUT/${ci}-imatrix.log
|
|
|
|
(time ./bin/llama-save-load-state -ngl 10 --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
(time ./bin/llama-save-load-state -fa -ngl 10 --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
(time ./bin/llama-save-load-state -ngl 99 --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
(time ./bin/llama-save-load-state -fa -ngl 99 --model ${model_q4_0} ) 2>&1 | tee -a $OUT/${ci}-save-load-state.log
|
|
|
|
function check_ppl {
|
|
qnt="$1"
|
|
ppl=$(echo "$2" | grep -oE "[0-9]+\.[0-9]+" | tail -n 1)
|
|
|
|
if [ $(echo "$ppl > 20.0" | bc) -eq 1 ]; then
|
|
printf ' - %s @ %s (FAIL: ppl > 20.0)\n' "$qnt" "$ppl"
|
|
return 20
|
|
fi
|
|
|
|
printf ' - %s @ %s OK\n' "$qnt" "$ppl"
|
|
return 0
|
|
}
|
|
|
|
check_ppl "f16" "$(cat $OUT/${ci}-tg-f16.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q8_0" "$(cat $OUT/${ci}-tg-q8_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_0" "$(cat $OUT/${ci}-tg-q4_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_1" "$(cat $OUT/${ci}-tg-q4_1.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_0" "$(cat $OUT/${ci}-tg-q5_0.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_1" "$(cat $OUT/${ci}-tg-q5_1.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
#check_ppl "q2_k" "$(cat $OUT/${ci}-tg-q2_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log # note: ppl > 20.0 for this quant and model
|
|
check_ppl "q3_k" "$(cat $OUT/${ci}-tg-q3_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q4_k" "$(cat $OUT/${ci}-tg-q4_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q5_k" "$(cat $OUT/${ci}-tg-q5_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
check_ppl "q6_k" "$(cat $OUT/${ci}-tg-q6_k.log | grep "^\[1\]")" | tee -a $OUT/${ci}-ppl.log
|
|
|
|
cat $OUT/${ci}-imatrix.log | grep "Final" >> $OUT/${ci}-imatrix-sum.log
|
|
|
|
set +e
|
|
}
|
|
|
|
function gg_sum_pythia_2_8b {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'Pythia 2.8B:\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '- perplexity:\n%s\n' "$(cat $OUT/${ci}-ppl.log)"
|
|
gg_printf '- imatrix:\n```\n%s\n```\n' "$(cat $OUT/${ci}-imatrix-sum.log)"
|
|
gg_printf '- f16: \n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-f16.log)"
|
|
gg_printf '- q8_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q8_0.log)"
|
|
gg_printf '- q4_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_0.log)"
|
|
gg_printf '- q4_1:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_1.log)"
|
|
gg_printf '- q5_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_0.log)"
|
|
gg_printf '- q5_1:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_1.log)"
|
|
gg_printf '- q2_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q2_k.log)"
|
|
gg_printf '- q3_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q3_k.log)"
|
|
gg_printf '- q4_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q4_k.log)"
|
|
gg_printf '- q5_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q5_k.log)"
|
|
gg_printf '- q6_k:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q6_k.log)"
|
|
gg_printf '- save-load-state: \n```\n%s\n```\n' "$(cat $OUT/${ci}-save-load-state.log)"
|
|
}
|
|
|
|
# bge-small
|
|
|
|
function gg_run_embd_bge_small {
|
|
cd ${SRC}
|
|
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/config.json
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/tokenizer.json
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/tokenizer_config.json
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/special_tokens_map.json
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/pytorch_model.bin
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/sentence_bert_config.json
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/vocab.txt
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/modules.json
|
|
gg_wget models-mnt/bge-small/ https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/config.json
|
|
|
|
gg_wget models-mnt/bge-small/1_Pooling https://huggingface.co/BAAI/bge-small-en-v1.5/raw/main/1_Pooling/config.json
|
|
|
|
path_models="../models-mnt/bge-small"
|
|
|
|
rm -rf build-ci-release && mkdir build-ci-release && cd build-ci-release
|
|
|
|
set -e
|
|
|
|
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
|
|
|
python3 ../convert_hf_to_gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
|
|
|
model_f16="${path_models}/ggml-model-f16.gguf"
|
|
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
|
|
|
./bin/llama-quantize ${model_f16} ${model_q8_0} q8_0
|
|
|
|
(time ./bin/llama-embedding --model ${model_f16} -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-f16.log
|
|
(time ./bin/llama-embedding --model ${model_q8_0} -p "I believe the meaning of life is" ) 2>&1 | tee -a $OUT/${ci}-tg-q8_0.log
|
|
|
|
set +e
|
|
}
|
|
|
|
function gg_check_build_requirements {
|
|
if ! command -v cmake &> /dev/null; then
|
|
gg_printf 'cmake not found, please install'
|
|
fi
|
|
|
|
if ! command -v make &> /dev/null; then
|
|
gg_printf 'make not found, please install'
|
|
fi
|
|
|
|
if ! command -v ctest &> /dev/null; then
|
|
gg_printf 'ctest not found, please install'
|
|
fi
|
|
}
|
|
|
|
function gg_sum_embd_bge_small {
|
|
gg_printf '### %s\n\n' "${ci}"
|
|
|
|
gg_printf 'BGE Small (BERT):\n'
|
|
gg_printf '- status: %s\n' "$(cat $OUT/${ci}.exit)"
|
|
gg_printf '- f16: \n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-f16.log)"
|
|
gg_printf '- q8_0:\n```\n%s\n```\n' "$(cat $OUT/${ci}-tg-q8_0.log)"
|
|
}
|
|
|
|
## main
|
|
|
|
if [ -z ${GG_BUILD_LOW_PERF} ]; then
|
|
# Create symlink: ./llama.cpp/models-mnt -> $MNT/models/models-mnt
|
|
rm -rf ${SRC}/models-mnt
|
|
mnt_models=${MNT}/models
|
|
mkdir -p ${mnt_models}
|
|
ln -sfn ${mnt_models} ${SRC}/models-mnt
|
|
|
|
# Create a fresh python3 venv and enter it
|
|
python3 -m venv "$MNT/venv"
|
|
source "$MNT/venv/bin/activate"
|
|
|
|
pip install -r ${SRC}/requirements.txt --disable-pip-version-check
|
|
pip install --editable gguf-py --disable-pip-version-check
|
|
fi
|
|
|
|
ret=0
|
|
|
|
test $ret -eq 0 && gg_run ctest_debug
|
|
test $ret -eq 0 && gg_run ctest_release
|
|
|
|
if [ -z ${GG_BUILD_LOW_PERF} ]; then
|
|
test $ret -eq 0 && gg_run embd_bge_small
|
|
|
|
if [ -z ${GG_BUILD_CLOUD} ] || [ ${GG_BUILD_EXTRA_TESTS_0} ]; then
|
|
test $ret -eq 0 && gg_run test_scripts_debug
|
|
test $ret -eq 0 && gg_run test_scripts_release
|
|
fi
|
|
|
|
if [ -z ${GG_BUILD_VRAM_GB} ] || [ ${GG_BUILD_VRAM_GB} -ge 8 ]; then
|
|
if [ -z ${GG_BUILD_CUDA} ]; then
|
|
test $ret -eq 0 && gg_run pythia_1_4b
|
|
else
|
|
test $ret -eq 0 && gg_run pythia_2_8b
|
|
#test $ret -eq 0 && gg_run open_llama_7b_v2
|
|
fi
|
|
test $ret -eq 0 && gg_run ctest_with_model_debug
|
|
test $ret -eq 0 && gg_run ctest_with_model_release
|
|
fi
|
|
fi
|
|
|
|
exit $ret
|