Files
ik_llama.cpp/github-data/issues/387 - Bug_ bitnet 1.58 on termux segmentation fault.md
2025-07-23 13:31:53 +02:00

35 KiB
Raw Permalink Blame History

🐛 #387 - Bug: bitnet 1.58 on termux segmentation fault

Author Benjamin-Wegener
State Closed
Created 2025-05-06
Updated 2025-05-23

Description

What happened?

trying original microsoft bitnet 1.58 gguf with ~/ik_llama.cpp $ wget https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf?download=true creates segmentation fault using $ ./build/bin/llama-server -mla 3 --model ./models/ggml-model-i2_s.gguf?download=true INFO [ main] build info | tid="527362528504" timestamp=1746553079 build=3666 commit="f7c9a0f0" INFO [ main] system info | tid="527362528504" timestamp=1746553079 n_threads=8 n_threads_batch=-1 total_threads=8 system_info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " llama_model_loader: loaded meta data with 24 key-value pairs and 332 tensors from ./models/ggml-model-i2_s.gguf?download=true (version GGUF V3 (latest)) llama_model_loader: unknown type i2_s llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bitnet-b1.58 llama_model_loader: - kv 1: general.name str = bitnet2b llama_model_loader: - kv 2: bitnet-b1.58.vocab_size u32 = 128256 llama_model_loader: - kv 3: bitnet-b1.58.context_length u32 = 4096 llama_model_loader: - kv 4: bitnet-b1.58.embedding_length u32 = 2560 llama_model_loader: - kv 5: bitnet-b1.58.block_count u32 = 30 llama_model_loader: - kv 6: bitnet-b1.58.feed_forward_length u32 = 6912 llama_model_loader: - kv 7: bitnet-b1.58.rope.dimension_count u32 = 128 llama_model_loader: - kv 8: bitnet-b1.58.attention.head_count u32 = 20 llama_model_loader: - kv 9: bitnet-b1.58.attention.head_count_kv u32 = 5 llama_model_loader: - kv 10: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 11: bitnet-b1.58.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 12: bitnet-b1.58.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 13: general.file_type u32 = 40 llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,128256] = [0.000000, 0.000000, 0.000000, 0.0000...llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 128001 llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 128001 llama_model_loader: - kv 22: tokenizer.chat_template str = {% for message in messages %}{% if lo...llama_model_loader: - kv 23: general.quantization_version u32 = 2 llama_model_loader: - type f32: 121 tensors llama_model_loader: - type f16: 1 tensors llama_model_loader: - type i2_s: 210 tensors llm_load_vocab: missing pre-tokenizer type, using: 'llama3' llm_load_vocab: llm_load_vocab: ************************************ llm_load_vocab: GENERATION QUALITY MAY BE DEGRADED! llm_load_vocab: CONSIDER REGENERATING THE MODEL llm_load_vocab: ************************************ llm_load_vocab: llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.8000 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = bitnet-b1.58 llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 4096llm_load_print_meta: n_embd = 2560llm_load_print_meta: n_layer = 30 llm_load_print_meta: n_head = 20 llm_load_print_meta: n_head_kv = 5 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_swa_pattern = 1 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 640 llm_load_print_meta: n_embd_v_gqa = 640 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 6912llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 4096llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 2B llm_load_print_meta: model ftype = unknown, may not work llm_load_print_meta: model params = 2.413 B llm_load_print_meta: model size = 1.098 GiB (3.911 BPW) llm_load_print_meta: general.name = bitnet2b llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128001 '<|end_of_text|>' llm_load_print_meta: PAD token = 128001 '<|end_of_text|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: ggml ctx size = 0.15 MiB llm_load_tensors: CPU buffer size = 1124.81 MiB ............................... ===================================================================== MLA is only available for LLM_ARCH_DEEPSEEK2 -> turning off MLA ===================================================================== llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0llama_new_context_with_model: mla_attn = 0llama_new_context_with_model: attn_max_b = 0llama_new_context_with_model: fused_moe = 0llama_new_context_with_model: ser = -1, 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1llama_kv_cache_init: CPU KV buffer size = 300.00 MiB llama_new_context_with_model: KV self size = 300.00 MiB, K (f16): 150.00 MiB, V (f16): 150.00 MiB llama_new_context_with_model: CPU output buffer size = 0.98 MiB llama_new_context_with_model: CPU compute buffer size = 255.50 MiB llama_new_context_with_model: graph nodes = 995 llama_new_context_with_model: graph splits = 1 Segmentation fault

note: running the optimized version from https://huggingface.co/tdh111/bitnet-b1.58-2B-4T-GGUF/tree/main is starting, but creating gibberish answers User: hello

Llama: [Nga92SK3#mK^(K"9E(-l^*hg-,C'2!,

Name and Version

~/ik_llama.cpp $ ./build/bin/llama-server --version version: 3666 (f7c9a0f0) built with clang version 20.1.3 for aarch64-unknown-linux-android24

What operating system are you seeing the problem on?

Linux

Relevant log output



💬 Conversation

👤 Benjamin-Wegener commented the 2025-05-06 at 17:42:16:

used cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF cmake --build ./build --config Release -j $(nproc)


👤 ikawrakow commented the 2025-05-06 at 17:45:58:

You need to convert the model. If you don't find how, I'll add the instructions when back at a computer.


👤 Benjamin-Wegener commented the 2025-05-06 at 18:09:09:

thanks, ill report back


👤 Benjamin-Wegener commented the 2025-05-06 at 19:04:56:

~/ik_llama.cpp $ ./build/bin/llama-quantize --allow-requantize ./models/bitnet1582b4t-iq2_bn_r4.gguf?download=true ./models/bitnet.gguf iq2_bn_r4

now the model loads with llama-server using no extra args and standard config in browser but just produces User: hello

Llama: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


👤 ikawrakow commented the 2025-05-06 at 19:37:24:

You need to convert the i2_s model that you downloaded previously

./bin/llama-quantize --allow-requantize iq2_s_model new_model_name iq2_bn_r4
./bin/llama-cli -m new_model_name -n 128 -p "The meaning of life is"

👤 saood06 commented the 2025-05-06 at 19:51:09:

I think the issue is #361 which can be worked around using #347

One indicator of that is if the build process took a short amount of time.

Try adding -DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" to your build. (Also do you mind telling us what device you are trying to run this on?)

The models in https://huggingface.co/tdh111/bitnet-b1.58-2B-4T-GGUF are already preconverted (and I ran into the same garbage output when using them on an Android device without building with the flags above)

To test in the server you can send the following request which is lifted straight from from their transformers PR (the BOS token is ommited as ik_llama.cpp/llama.cpp automatically inserts one):

"User: Hey, are you conscious? Can you talk to me?<|eot_id|>Assistant: "


👤 saood06 commented the 2025-05-06 at 19:51:09:

I think the issue is #361 which can be worked around using #347

One indicator of that is if the build process took a short amount of time.

Try adding -DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" to your build.

To test in the server you can send the following request which is lifted straight from from their transformers PR (the BOS token is ommited as ik_llama.cpp/llama.cpp automatically inserts one):

"User: Hey, are you conscious? Can you talk to me?<|eot_id|>Assistant:"


👤 Benjamin-Wegener commented the 2025-05-07 at 06:28:44:

I think the issue is #361 which can be worked around using #347

One indicator of that is if the build process took a short amount of time.

Try adding -DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" to your build. (Also do you mind telling us what device you are trying to run this on?)

The models in https://huggingface.co/tdh111/bitnet-b1.58-2B-4T-GGUF are already preconverted (and I ran into the same garbage output when using them on an Android device without building with the flags above)

To test in the server you can send the following request which is lifted straight from from their transformers PR (the BOS token is ommited as ik_llama.cpp/llama.cpp automatically inserts one):

"User: Hey, are you conscious? Can you talk to me?<|eot_id|>Assistant:"

that helps, now its working, thank you


👤 Benjamin-Wegener commented the 2025-05-09 at 04:30:45:

just for convenience all subsequential commands to install bitnet (or other cpu models) on a fresh termux aarch64:

apt update && apt install wget cmake git -y
git clone https://github.com/ikawrakow/ik_llama.cpp
cd ik_llama.cpp
cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF -DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16"
cmake --build ./build --config Release -j $(nproc)
wget https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf?download=true -O ./models/ggml-model-i2_s.gguf
./build/bin/llama-quantize --allow-requantize ./models/ggml-model-is_s.gguf ./models/bitnet.gguf iq2_bn_r4
./build/bin/llama-server -mla 3 --model ./models/bitnet.gguf

👤 Benjamin-Wegener commented the 2025-05-09 at 04:30:45:

just for convenience all subsequential commands to install bitnet (or other cpu models) on a fresh termux aarch64: apt update && apt install wget cmake git -y git clone https://github.com/ikawrakow/ik_llama.cpp cd ik_llama.cpp cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF -DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" cmake --build ./build --config Release -j $(nproc) wget https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf?download=true -O ./models/ggml-model-i2_s.gguf ./build/bin/llama-quantize --allow-requantize ./models/ggml-model-is_s.gguf ./models/bitnet.gguf iq2_bn_r4 ./build/bin/llama-server -mla 3 --model ./models/bitnet.gguf


👤 ikawrakow commented the 2025-05-09 at 08:19:12:

@Benjamin-Wegener Thank you for these instructions. Do you mind if I take them and make a Discussion for better visibility. Or, if you prefer, you can do it yourself. Let me know.


👤 Benjamin-Wegener commented the 2025-05-09 at 09:20:13:

sure, will do EDIT: done https://github.com/ikawrakow/ik_llama.cpp/discussions/401


👤 Benjamin-Wegener commented the 2025-05-09 at 09:20:13:

sure, will do


👤 Manamama commented the 2025-05-23 at 08:50:18:

FYI, I have tested your https://github.com/ikawrakow/ik_llama.cpp/issues/387#issuecomment-2865065414 out of curiosity on my "somewhat contaminated" Termux.

Both llama.cpp and yours used to compile fine, but at least today:

  1. llama.cpp still compiles fine (but then seg faults on some ggufs only, see https://github.com/ggml-org/llama.cpp/issues/13708#issuecomment-2902117306)
  2. Your one, when I do just that: https://github.com/ikawrakow/ik_llama.cpp/issues/387#issuecomment-2865065414, causes:
Environment at system:
Linux localhost 4.14.186+ #1 SMP PREEMPT Thu Mar 17 16:28:22 CST 2022 aarch64 Android


PATH: /data/data/com.termux/files/usr/google-cloud-sdk/bin:/data/data/com.termux/files/home/.opam/default/bin:/data/data/com.termux/files/usr/bin:/system/bin/:/data/data/com.termux/files/usr/bin:/system/bin/:/data/data/com.termux/files/usr/bin:/data/data/com.termux/files/usr/bin/texlive:/data/data/com.termux/files/usr/bin/texlive:/data/data/com.termux/files/home/.local/bin:/build-tools/30.0.3

LD_PRELOAD: /data/data/com.termux/files/usr/lib/libtermux-exec-direct-ld-preload.so

LD_LIBRARY_PATH: 

CC: clang
CXX: clang++
C_INCLUDE_PATH: 
FC: lfortran
CFLAGS: 
CXXFLAGS: 
LDFLAGS: -llog -largp -lm
CPPFLAGS: 
CMAKE_PREFIX_PATH: :/data/data/com.termux/files/usr/lib/cmake/Qt6HostInfo

JAVA_HOME: /data/data/com.termux/files/usr/lib/jvm/java-17-openjdk
ANDROID_NDK: /storage/emulated/0/Download/android-ndk-r26b
ANDROID_SDK: /storage/sdcard1/Installs/Android_ndk_sdk/SDK

and then


~/downloads $ git clone https://github.com/ikawrakow/ik_llama.cpp
cd ik_llama.cpp
Cloning into 'ik_llama.cpp'...
remote: Enumerating objects: 29327, done.
remote: Counting objects: 100% (8480/8480), done.
remote: Compressing objects: 100% (788/788), done.
remote: Total 29327 (delta 8003), reused 7707 (delta 7692), pack-reused 20847 (from 2)
Receiving objects: 100% (29327/29327), 34.13 MiB | 98.00 KiB/s, done.
Resolving deltas: 100% (22227/22227), done.
Updating files: 100% (1027/1027), done.
~/downloads/ik_llama.cpp $ cd ik^C
~/downloads/ik_llama.cpp $ ls
 AUTHORS          CMakePresets.json       convert_hf_to_gguf_update.py    examples     gguf-py    Makefile   Package.swift   pyproject.toml      󰌠 requirements.txt  󰙨 tests
 ci               common                  convert_llama_ggml_to_gguf.py   flake.lock   grammars   media      pocs            pyrightconfig.json   scripts           
 cmake            CONTRIBUTING.md         convert_lora_to_gguf.py         flake.nix    include    models     poetry.lock     README.md            spm-headers       
 CMakeLists.txt   convert_hf_to_gguf.py   docs                            ggml         LICENSE    mypy.ini   prompts         requirements        󱧼 src               
~/downloads/ik_llama.cpp $ 
cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF -DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16"
cmake --build ./build --config Release -j $(nproc)
-- The C compiler identification is Clang 20.1.5
-- The CXX compiler identification is Clang 20.1.5
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /data/data/com.termux/files/usr/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /data/data/com.termux/files/usr/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /data/data/com.termux/files/usr/bin/git (found version "2.49.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp=libomp (found version "5.1")
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.1")
-- Found OpenMP: TRUE (found version "5.1")
-- OpenMP found
-- Using optimized iqk matrix multiplications
-- Enabling IQK Flash Attention kernels
-- Using llamafile
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- ARCH_FLAGS = -march=native
-- Configuring done (17.5s)
-- Generating done (1.4s)
-- Build files have been written to: /data/data/com.termux/files/home/downloads/ik_llama.cpp/build
[  0%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[  1%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
... 
[ 79%] Building CXX object examples/perplexity/CMakeFiles/llama-perplexity.dir/perplexity.cpp.o
[ 80%] Linking CXX executable ../../bin/llama-perplexity
[ 80%] Built target llama-perplexity
[ 81%] Building CXX object examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/quantize-stats.cpp.o
/data/data/com.termux/files/home/downloads/ik_llama.cpp/examples/quantize-stats/quantize-stats.cpp:782:57: error: expected ')'
  782 |                         if (sumqx*sumqx*sumq2i[j] > best]) {
      |                                                         ^
/data/data/com.termux/files/home/downloads/ik_llama.cpp/examples/quantize-stats/quantize-stats.cpp:782:28: note: to match this '('
  782 |                         if (sumqx*sumqx*sumq2i[j] > best]) {
      |                            ^
/data/data/com.termux/files/home/downloads/ik_llama.cpp/examples/quantize-stats/quantize-stats.cpp:782:57: error: expected expression
  782 |                         if (sumqx*sumqx*sumq2i[j] > best]) {
      |                                                         ^
/data/data/com.termux/files/home/downloads/ik_llama.cpp/examples/quantize-stats/quantize-stats.cpp:782:58: error: expected expression
  782 |                         if (sumqx*sumqx*sumq2i[j] > best]) {
      |                                                          ^
3 errors generated.
make[2]: *** [examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/build.make:79: examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/quantize-stats.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3920: examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

I have taken a peek at this quantize-stats.cpp and these strings asre indeed there, but I am bad in counting the closing brackets vs the opening ones by hand ...


👤 Manamama commented the 2025-05-23 at 08:50:18:

FYI, I have tested your https://github.com/ikawrakow/ik_llama.cpp/issues/387#issuecomment-2865065414 out of curiosity on my "somewhat contaminated" Termux.

Both llama.cpp and yours used to compile fine, but at least today:

  1. llama.cpp still compiles fine (but then seg faults on some ggufs only, see https://github.com/ggml-org/llama.cpp/issues/13708#issuecomment-2902117306)
  2. Your one, when I do just that: https://github.com/ikawrakow/ik_llama.cpp/issues/387#issuecomment-2865065414, causes:
Environment at system:
Linux localhost 4.14.186+ #1 SMP PREEMPT Thu Mar 17 16:28:22 CST 2022 aarch64 Android


PATH: /data/data/com.termux/files/usr/google-cloud-sdk/bin:/data/data/com.termux/files/home/.opam/default/bin:/data/data/com.termux/files/usr/bin:/system/bin/:/data/data/com.termux/files/usr/bin:/system/bin/:/data/data/com.termux/files/usr/bin:/data/data/com.termux/files/usr/bin/texlive:/data/data/com.termux/files/usr/bin/texlive:/data/data/com.termux/files/home/.local/bin:/build-tools/30.0.3

LD_PRELOAD: /data/data/com.termux/files/usr/lib/libtermux-exec-direct-ld-preload.so

LD_LIBRARY_PATH: 

CC: clang
CXX: clang++
C_INCLUDE_PATH: 
FC: lfortran
CFLAGS: 
CXXFLAGS: 
LDFLAGS: -llog -largp -lm
CPPFLAGS: 
CMAKE_PREFIX_PATH: :/data/data/com.termux/files/usr/lib/cmake/Qt6HostInfo

JAVA_HOME: /data/data/com.termux/files/usr/lib/jvm/java-17-openjdk
ANDROID_NDK: /storage/emulated/0/Download/android-ndk-r26b
ANDROID_SDK: /storage/sdcard1/Installs/Android_ndk_sdk/SDK

~/downloads $ git clone https://github.com/ikawrakow/ik_llama.cpp cd ik_llama.cpp Cloning into 'ik_llama.cpp'... remote: Enumerating objects: 29327, done. remote: Counting objects: 100% (8480/8480), done. remote: Compressing objects: 100% (788/788), done. remote: Total 29327 (delta 8003), reused 7707 (delta 7692), pack-reused 20847 (from 2) Receiving objects: 100% (29327/29327), 34.13 MiB | 98.00 KiB/s, done. Resolving deltas: 100% (22227/22227), done. Updating files: 100% (1027/1027), done. ~/downloads/ik_llama.cpp $ cd ik^C ~/downloads/ik_llama.cpp $ ls  AUTHORS  CMakePresets.json  convert_hf_to_gguf_update.py  examples  gguf-py  Makefile  Package.swift  pyproject.toml 󰌠 requirements.txt 󰙨 tests  ci  common  convert_llama_ggml_to_gguf.py  flake.lock  grammars  media  pocs  pyrightconfig.json  scripts
 cmake  CONTRIBUTING.md  convert_lora_to_gguf.py  flake.nix  include  models  poetry.lock  README.md  spm-headers
 CMakeLists.txt  convert_hf_to_gguf.py  docs  ggml  LICENSE  mypy.ini  prompts  requirements 󱧼 src
~/downloads/ik_llama.cpp $ cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF -DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" cmake --build ./build --config Release -j $(nproc) -- The C compiler identification is Clang 20.1.5 -- The CXX compiler identification is Clang 20.1.5 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /data/data/com.termux/files/usr/bin/clang - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /data/data/com.termux/files/usr/bin/clang++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /data/data/com.termux/files/usr/bin/git (found version "2.49.0") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Check if compiler accepts -pthread -- Check if compiler accepts -pthread - yes -- Found Threads: TRUE -- Found OpenMP_C: -fopenmp=libomp (found version "5.1") -- Found OpenMP_CXX: -fopenmp=libomp (found version "5.1") -- Found OpenMP: TRUE (found version "5.1") -- OpenMP found -- Using optimized iqk matrix multiplications -- Enabling IQK Flash Attention kernels -- Using llamafile -- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF. -- CMAKE_SYSTEM_PROCESSOR: aarch64 -- ARM detected -- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E -- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- ARCH_FLAGS = -march=native -- Configuring done (17.5s) -- Generating done (1.4s) -- Build files have been written to: /data/data/com.termux/files/home/downloads/ik_llama.cpp/build [ 0%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o [ 1%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o ... [ 79%] Building CXX object examples/perplexity/CMakeFiles/llama-perplexity.dir/perplexity.cpp.o [ 80%] Linking CXX executable ../../bin/llama-perplexity [ 80%] Built target llama-perplexity [ 81%] Building CXX object examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/quantize-stats.cpp.o /data/data/com.termux/files/home/downloads/ik_llama.cpp/examples/quantize-stats/quantize-stats.cpp:782:57: error: expected ')' 782 | if (sumqxsumqxsumq2i[j] > best]) { | ^ /data/data/com.termux/files/home/downloads/ik_llama.cpp/examples/quantize-stats/quantize-stats.cpp:782:28: note: to match this '(' 782 | if (sumqxsumqxsumq2i[j] > best]) { | ^ /data/data/com.termux/files/home/downloads/ik_llama.cpp/examples/quantize-stats/quantize-stats.cpp:782:57: error: expected expression 782 | if (sumqxsumqxsumq2i[j] > best]) { | ^ /data/data/com.termux/files/home/downloads/ik_llama.cpp/examples/quantize-stats/quantize-stats.cpp:782:58: error: expected expression 782 | if (sumqxsumqxsumq2i[j] > best]) { | ^ 3 errors generated. make[2]: *** [examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/build.make:79: examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/quantize-stats.cpp.o] Error 1 make[1]: *** [CMakeFiles/Makefile2:3920: examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/all] Error 2 make: *** [Makefile:146: all] Error 2

 
I have taken a peek at this `quantize-stats.cpp` and these strings asre indeed there, but I am bad in counting the closing brackets vs the opening ones by hand ...

👤 ikawrakow commented the 2025-05-23 at 09:02:05:

Does #445 fix it?


👤 Manamama commented the 2025-05-23 at 18:34:02:

Yes, it compiles now. Testing:

wget https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf?download=true -O ./models/ggml-model-i2_s.gguf
./build/bin/llama-quantize --allow-requantize ./models/ggml-model-is_s.gguf ./models/bitnet.gguf iq2_bn_r4
./build/bin/llama-server -mla 3 --model ./models/bitnet.gguf

...

It fails now with:

Resolving cdn-lfs-us-1.hf.co (cdn-lfs-us-1.hf.co)... 18.164.52.87, 18.164.52.5, 18.164.52.44, ...         Connecting to cdn-lfs-us-1.hf.co (cdn-lfs-us-1.hf.co)|18.164.52.87|:443... connected.                     HTTP request sent, awaiting response... 200 OK       Length: 1187801280 (1.1G) [application/octet-stream] Saving to: ./models/ggml-model-i2_s.gguf                                                                ./models/ggml 100%   1.11G   774KB/s    in 25m 14s                                                        2025-05-23 20:58:34 (766 KB/s) - ./models/ggml-model-i2_s.gguf saved [1187801280/1187801280]                                                                 CANNOT LINK EXECUTABLE "./build/bin/llama-quantize": cannot locate symbol "ggml_backend_reg_get_count" referenced by "/data/data/com.termux/files/home/downloads/ik_llama.cpp/build/bin/llama-quantize"...          CANNOT LINK EXECUTABLE "./build/bin/llama-server": cannot locate symbol "llama_get_kv_cache_token_count" referenced by "/data/data/com.termux/files/home/downloads/ik_llama.cpp/build/bin/llama-server"...          ~/downloads/ik_llama.cpp $

This may be needed, once again: https://github.com/ikawrakow/ik_llama.cpp/issues/388#issue-3043737093

Quick update: my trick does not help either.

~/downloads/ik_llama.cpp $ ./build/bin/llama-quantize --allow-requantize ./models/ggml-model-is_s.gguf ./models/bitnet.gguf iq2_bn_r4                          CANNOT LINK EXECUTABLE "./build/bin/llama-quantize": cannot locate symbol "ggml_backend_reg_get_count" referenced by "/data/data/com.termux/files/home/downloads/ik_llama.cpp/build/bin/llama-quantize"...          ~/downloads/ik_llama.cpp $ ldd "/data/data/com.termux/files/home/downloads/ik_llama.cpp/build/bin/llama-quantize"                                                      liblog.so => /system/lib64/liblog.so                 libargp.so => /data/data/com.termux/files/usr/lib/libargp.so                                              libc.so => /system/lib64/libc.so                     libllama.so => /data/data/com.termux/files/usr/lib/libllama.so
        libggml.so => /data/data/com.termux/files/usr/lib/libggml.so                                              libc++_shared.so => /data/data/com.termux/files/usr/lib/libc++_shared.so
        libdl.so => /system/lib64/libdl.so                   libm.so => /system/lib64/libm.so
        libc++.so => /system/lib64/libc++.so                 ld-android.so => /system/lib64/ld-android.so         libclang_rt.asan-aarch64-android.so => /system/lib64/libclang_rt.asan-aarch64-android.so                  libggml-cpu.so => /data/data/com.termux/files/usr/lib/libggml-cpu.so                                      libggml-base.so => /data/data/com.termux/files/usr/lib/libggml-base.so                            ~/downloads/ik_llama.cpp $

after recompilation, too.

Ver. 1.3


👤 Manamama commented the 2025-05-23 at 18:34:02:

Yes, it compiles now. Testing:

wget https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf?download=true -O ./models/ggml-model-i2_s.gguf
./build/bin/llama-quantize --allow-requantize ./models/ggml-model-is_s.gguf ./models/bitnet.gguf iq2_bn_r4
./build/bin/llama-server -mla 3 --model ./models/bitnet.gguf

...