Files
ik_llama.cpp/github-data/issues/576 - Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md
2025-07-23 13:31:53 +02:00

5.7 KiB

🐛 #576 - Bug: llama-server crash with "Deepseek2 does not support K-shift"

Author ewhacc
State Closed
Created 2025-07-03
Updated 2025-07-04

Description

What happened?

llama-server crashed with a message "llama.cpp:18430: Deepseek2 does not support K-shift" It was during jobs using ubergarm's DeepSeek-V3-0324-IQ2_K_R4.

Relaunched it, it keeps going. So, It's not reproducible. In what circumstance, will "Deepseek2 does not support K-shift" be shown?

Name and Version

$ ik_llama.cpp/build/bin/llama-server --version version: 3774 (bce7697d) built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

# Build
cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1 -DGGML_CUDA_F16=ON

# Run
llama-server --model $model_path \
    --alias DeepSeek-V3-0324 \
    --ctx-size 98304 \
    -mla 3 -fa -amb 512 -fmoe \
    -b 4096 -ub 4096 \
    --n-gpu-layers 63 \
    --override-tensor exps=CPU \
    --parallel 2 --threads 32 \
    --host 0.0.0.0 --port 5000

# Log
INFO [   launch_slot_with_task] slot is processing task | tid="138385419128832" timestamp=1751529199 id_slot=1 id_task=106779
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529199 id_slot=1 id_task=106779 p0=9
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529229 id_slot=1 id_task=106779 p0=4105
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529259 id_slot=1 id_task=106779 p0=8201
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529290 id_slot=1 id_task=106779 p0=12297
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529321 id_slot=1 id_task=106779 p0=16393
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529351 id_slot=1 id_task=106779 p0=20489
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529382 id_slot=1 id_task=106779 p0=24585
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529413 id_slot=1 id_task=106779 p0=28681
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529444 id_slot=1 id_task=106779 p0=32777
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529475 id_slot=1 id_task=106779 p0=36873
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529506 id_slot=1 id_task=106779 p0=40969
INFO [            update_slots] kv cache rm [p0, end) | tid="138385419128832" timestamp=1751529537 id_slot=1 id_task=106779 p0=45065
INFO [            update_slots] slot context shift | tid="138385419128832" timestamp=1751529662 id_slot=1 id_task=106779 n_keep=1 n_left=49150 n_discard=24575 n_ctx=98304 n_past=49151 n_system_tokens=0 n_cache_tokens=49151
/home/..../ik_llama.cpp/src/llama.cpp:18430: Deepseek2 does not support K-shift
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.

💬 Conversation

👤 ikawrakow commented the 2025-07-03 at 11:38:54:

In what circumstance, will "Deepseek2 does not support K-shift" be shown?

When you reach the maximum context length.


👤 ewhacc commented the 2025-07-03 at 18:15:28:

When you reach the maximum context length.

Did I reach the maximum context length? p0=45065 just before crash.

n_keep=1 n_left=49150 n_discard=24575 n_ctx=98304 n_past=49151 n_system_tokens=0 n_cache_tokens=49151

Crashed again for the different prompt, but at the same p0=45065.

It was ok with R1. I'm going to check with R1 again.


👤 saood06 commented the 2025-07-03 at 22:29:38:

When you reach the maximum context length.

Did I reach the maximum context length? p0=45065 just before crash.

n_keep=1 n_left=49150 n_discard=24575 n_ctx=98304 n_past=49151 n_system_tokens=0 n_cache_tokens=49151

Crashed again for the different prompt, but at the same p0=45065.

Yes.

You set --parallel 2, which makes your max context per slot (with 0 system tokens) to 49,152 (98304 / 2). Your batch size is 4,096 and so you'd expect to see the last reported context length to be between 45,056 - 49,152, which 45065 falls into. That is the current way slots handle context limit, the cap is set to (n_ctx - n_system_tokens) divided by the number of slots.


👤 saood06 commented the 2025-07-03 at 22:29:38:

When you reach the maximum context length.

Did I reach the maximum context length? p0=45065 just before crash.

n_keep=1 n_left=49150 n_discard=24575 n_ctx=98304 n_past=49151 n_system_tokens=0 n_cache_tokens=49151

Crashed again for the different prompt, but at the same p0=45065.

Yes.

You set --parallel 2, which makes your max context per slot (with 0 system tokens) to 49,152 (98304 / 2). Your batch size is 4,096 and so you'd expect to see the last reported context length to be between 45,056 - 49,152, which 45065 falls into.


👤 ewhacc commented the 2025-07-04 at 05:16:41:

@saood06

Thank so much! Yeah, that is the difference from my previous run.

I suspected --parallel 2 but didn't know it divides the context length.