mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-01 11:51:53 +00:00
2736 lines
197 KiB
Markdown
2736 lines
197 KiB
Markdown
### 📝 [#380](https://github.com/ikawrakow/ik_llama.cpp/issues/380) - Drop at the start of generation
|
||
|
||
| **Author** | `intulint` |
|
||
| :--- | :--- |
|
||
| **State** | ❌ **Closed** |
|
||
| **Created** | 2025-05-04 |
|
||
| **Updated** | 2025-05-25 |
|
||
|
||
---
|
||
|
||
#### Description
|
||
|
||
After the generation starts, the server crashes. This only happens on the Qwen3-30B-A3B, and I checked different quant. Regular dense models work, including other dense qwen3.
|
||
What could be the problem? I liked the acceleration in dense models, I thought moe would fly.
|
||
But it doesn't work. It crashes without an error, it just goes to the command line when generation starts.
|
||
|
||
win10, Microsoft Visual Studio\2022, main branch
|
||
|
||
cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF
|
||
cmake --build ./build --config Release -j 16
|
||
|
||
./llama-server.exe -t 7 -c 4096 -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf
|
||
|
||
---
|
||
|
||
#### 💬 Conversation
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **05:12:28**:<br>
|
||
|
||
Can you post the output of the above commands (including the `cmake` commands)? Thanks.
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **10:10:19**:<br>
|
||
|
||
Sure, but it turned out to be a lot of text. I also noticed that it takes a long time to assemble in a single thread of unicode.cpp
|
||
unicode-data.cpp. I don't know if this is normal or not.
|
||
From a third-party frontend, generation does not occur at all and the program exits. If you connect from the native server, then about 140 tokens are generated and again it crashes without messages.
|
||
|
||
|
||
**********************************************************************
|
||
** Visual Studio 2022 Developer Command Prompt v17.13.6
|
||
** Copyright (c) 2022 Microsoft Corporation
|
||
**********************************************************************
|
||
|
||
C:\Program Files\Microsoft Visual Studio\2022\Community>cd C:\neuro\ik_llama.cpp
|
||
|
||
C:\neuro\ik_llama.cpp>git pull
|
||
Already up to date.
|
||
|
||
C:\neuro\ik_llama.cpp>cmake -B ./build -DGGML_CUDA=OFF -DGGML_BLAS=OFF
|
||
-- Building for: Visual Studio 17 2022
|
||
-- Selecting Windows SDK version 10.0.20348.0 to target Windows 10.0.19045.
|
||
-- The C compiler identification is MSVC 19.43.34810.0
|
||
-- The CXX compiler identification is MSVC 19.43.34810.0
|
||
-- Detecting C compiler ABI info
|
||
-- Detecting C compiler ABI info - done
|
||
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.43.34808/bin/Hostx64/x64/cl.exe - skipped
|
||
-- Detecting C compile features
|
||
-- Detecting C compile features - done
|
||
-- Detecting CXX compiler ABI info
|
||
-- Detecting CXX compiler ABI info - done
|
||
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.43.34808/bin/Hostx64/x64/cl.exe - skipped
|
||
-- Detecting CXX compile features
|
||
-- Detecting CXX compile features - done
|
||
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.47.1.windows.2")
|
||
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
|
||
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
|
||
-- Looking for pthread_create in pthreads
|
||
-- Looking for pthread_create in pthreads - not found
|
||
-- Looking for pthread_create in pthread
|
||
-- Looking for pthread_create in pthread - not found
|
||
-- Found Threads: TRUE
|
||
-- Found OpenMP_C: -openmp (found version "2.0")
|
||
-- Found OpenMP_CXX: -openmp (found version "2.0")
|
||
-- Found OpenMP: TRUE (found version "2.0")
|
||
-- OpenMP found
|
||
-- Using optimized iqk matrix multiplications
|
||
-- Using llamafile
|
||
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
|
||
-- CMAKE_SYSTEM_PROCESSOR: AMD64
|
||
-- CMAKE_GENERATOR_PLATFORM:
|
||
-- x86 detected
|
||
-- Performing Test HAS_AVX_1
|
||
-- Performing Test HAS_AVX_1 - Success
|
||
-- Performing Test HAS_AVX2_1
|
||
-- Performing Test HAS_AVX2_1 - Success
|
||
-- Performing Test HAS_FMA_1
|
||
-- Performing Test HAS_FMA_1 - Success
|
||
-- Performing Test HAS_AVX512_1
|
||
-- Performing Test HAS_AVX512_1 - Failed
|
||
-- Performing Test HAS_AVX512_2
|
||
-- Performing Test HAS_AVX512_2 - Failed
|
||
-- Configuring done (24.9s)
|
||
-- Generating done (1.9s)
|
||
-- Build files have been written to: C:/neuro/ik_llama.cpp/build
|
||
|
||
C:\neuro\ik_llama.cpp>cmake --build ./build --config Release -j 16
|
||
Версия MSBuild 17.13.19+0d9f5a35a для .NET Framework
|
||
|
||
1>Checking Build System
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/gguf-hash/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/gguf-hash/CMakeLists.txt
|
||
Generating build details from Git
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/ggml/src/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/gguf-hash/CMakeLists.txt
|
||
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.47.1.windows.2")
|
||
sha1.c
|
||
xxhash.c
|
||
sha256.c
|
||
ggml.c
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/common/CMakeLists.txt
|
||
build-info.cpp
|
||
ggml-alloc.c
|
||
sha1.vcxproj -> C:\neuro\ik_llama.cpp\build\examples\gguf-hash\sha1.dir\Release\sha1.lib
|
||
build_info.vcxproj -> C:\neuro\ik_llama.cpp\build\common\build_info.dir\Release\build_info.lib
|
||
sha256.vcxproj -> C:\neuro\ik_llama.cpp\build\examples\gguf-hash\sha256.dir\Release\sha256.lib
|
||
ggml-backend.c
|
||
xxhash.vcxproj -> C:\neuro\ik_llama.cpp\build\examples\gguf-hash\xxhash.dir\Release\xxhash.lib
|
||
ggml-quants.c
|
||
C:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\ucrt\assert.h(21,9): warning C4005: 'static_assert': mac
|
||
ro redefinition [C:\neuro\ik_llama.cpp\build\ggml\src\ggml.vcxproj]
|
||
(compiling source file '../../../ggml/src/ggml-quants.c')
|
||
C:\neuro\ik_llama.cpp\ggml\src\ggml-common.h(69,9):
|
||
see previous definition of 'static_assert'
|
||
|
||
ggml-aarch64.c
|
||
C:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\ucrt\assert.h(21,9): warning C4005: 'static_assert': mac
|
||
ro redefinition [C:\neuro\ik_llama.cpp\build\ggml\src\ggml.vcxproj]
|
||
(compiling source file '../../../ggml/src/ggml-aarch64.c')
|
||
C:\neuro\ik_llama.cpp\ggml\src\ggml-common.h(69,9):
|
||
see previous definition of 'static_assert'
|
||
|
||
Generating Code...
|
||
sgemm.cpp
|
||
iqk_mul_mat.cpp
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_mul_mat.cpp(177,16): warning C4267: 'initializing': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\ggml\src\ggml.vcxproj]
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_mul_mat.cpp(260,16): warning C4267: 'initializing': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\ggml\src\ggml.vcxproj]
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_mul_mat.cpp(9584,9): warning C4065: switch statement contains 'default' but n
|
||
o 'case' labels [C:\neuro\ik_llama.cpp\build\ggml\src\ggml.vcxproj]
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_mul_mat.cpp(3049,84): warning C4244: 'argument': conversion from 'const uint1
|
||
6_t' to 'char', possible loss of data [C:\neuro\ik_llama.cpp\build\ggml\src\ggml.vcxproj]
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_mul_mat.cpp(3049,84):
|
||
the template instantiation context (the oldest one first) is
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_mul_mat.cpp(9649,21):
|
||
see reference to function template instantiation 'void `anonymous-namespace'::MulMat::set_functions<`anony
|
||
mous-namespace'::DequantizerIQ2KS>(`anonymous-namespace'::MulMat &)' being compiled
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_mul_mat.cpp(9511,30):
|
||
see reference to function template instantiation 'void `anonymous-namespace'::mul_mat_qX_K_q8_K_T<Dequanti
|
||
zer,1>(int,const void *,size_t,const `anonymous-namespace'::DataInfo &,int)' being compiled
|
||
with
|
||
[
|
||
Dequantizer=`anonymous-namespace'::DequantizerIQ2KS
|
||
]
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_mul_mat.cpp(3240,35):
|
||
see reference to function template instantiation '__m256i `anonymous-namespace'::DequantizerIQ2KS::new_blo
|
||
ck<`anonymous-namespace'::Q8<1,block_q8_K>>(int,const Q8 &,__m256 *)' being compiled
|
||
with
|
||
[
|
||
Q8=`anonymous-namespace'::Q8<1,block_q8_K>
|
||
]
|
||
|
||
iqk_flash_attn.cpp
|
||
C:\neuro\ik_llama.cpp\ggml\src\iqk\iqk_flash_attn.cpp(88,24): warning C4244: '=': conversion from 'uint64_t' to 'int
|
||
', possible loss of data [C:\neuro\ik_llama.cpp\build\ggml\src\ggml.vcxproj]
|
||
iqk_quantize.cpp
|
||
Generating Code...
|
||
Auto build dll exports
|
||
Creating library C:/neuro/ik_llama.cpp/build/ggml/src/Release/ggml.lib and object C:/neuro/ik_llama.cpp/build/g
|
||
gml/src/Release/ggml.exp
|
||
ggml.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\ggml.dll
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/src/CMakeLists.txt
|
||
llama.cpp
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(2635,40): warning C4305: 'initializing': truncation from 'double' to 'float' [C:
|
||
\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(5511,17): warning C4065: switch statement contains 'default' but no 'case' label
|
||
s [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(5520,17): warning C4065: switch statement contains 'default' but no 'case' label
|
||
s [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24): warning C4477: 'printf' : format string '%ld' requires an argument of
|
||
type 'long', but variadic argument 2 has type 'int64_t' [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%lld' in the format string
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%Id' in the format string
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%I64d' in the format string
|
||
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24): warning C4477: 'printf' : format string '%ld' requires an argument of
|
||
type 'long', but variadic argument 3 has type 'int64_t' [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%lld' in the format string
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%Id' in the format string
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%I64d' in the format string
|
||
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24): warning C4477: 'printf' : format string '%ld' requires an argument of
|
||
type 'long', but variadic argument 4 has type 'int64_t' [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%lld' in the format string
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%Id' in the format string
|
||
C:\neuro\ik_llama.cpp\src\llama.cpp(8970,24):
|
||
consider using '%I64d' in the format string
|
||
|
||
llama-vocab.cpp
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(138,26): warning C4244: 'return': conversion from 'long' to 'uint8_t', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(211,35): warning C4267: 'argument': conversion from 'size_t' to 'int', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(211,30): warning C4267: 'argument': conversion from 'size_t' to 'int', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(543,39): warning C4267: 'argument': conversion from 'size_t' to 'int', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(543,34): warning C4267: 'argument': conversion from 'size_t' to 'int', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(583,82): warning C4267: '=': conversion from 'size_t' to 'llm_symbol::inde
|
||
x', possible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(586,61): warning C4267: '=': conversion from 'size_t' to 'int', possible l
|
||
oss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(680,37): warning C4267: 'initializing': conversion from 'size_t' to 'int',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(680,25): warning C4267: 'initializing': conversion from 'size_t' to 'const
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-vocab.cpp(1543,20): warning C4267: 'return': conversion from 'size_t' to 'int32_t',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
llama-grammar.cpp
|
||
llama-sampling.cpp
|
||
C:\neuro\ik_llama.cpp\src\llama-sampling.cpp(26,20): warning C4244: '=': conversion from 'time_t' to 'uint32_t', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-sampling.cpp(70,23): warning C4267: '=': conversion from 'size_t' to 'int32_t', poss
|
||
ible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-sampling.cpp(405,33): warning C4244: '=': conversion from 'double' to 'float', possi
|
||
ble loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-sampling.cpp(409,34): warning C4244: '/=': conversion from 'double' to 'float', poss
|
||
ible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-sampling.cpp(510,34): warning C4244: 'initializing': conversion from 'float' to 'int
|
||
32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-sampling.cpp(510,27): warning C4244: 'initializing': conversion from 'float' to 'con
|
||
st int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
C:\neuro\ik_llama.cpp\src\llama-sampling.cpp(530,61): warning C4244: 'argument': conversion from 'const int32_t' to
|
||
'float', possible loss of data [C:\neuro\ik_llama.cpp\build\src\llama.vcxproj]
|
||
unicode.cpp
|
||
unicode-data.cpp
|
||
Generating Code...
|
||
Auto build dll exports
|
||
Creating library C:/neuro/ik_llama.cpp/build/src/Release/llama.lib and object C:/neuro/ik_llama.cpp/build/src/R
|
||
elease/llama.exp
|
||
llama.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama.dll
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/llava/CMakeLists.txt
|
||
llava.cpp
|
||
C:\neuro\ik_llama.cpp\examples\llava\llava.cpp(346,24): warning C4244: 'initializing': conversion from 'double' to '
|
||
float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
clip.cpp
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(590,32): warning C4267: 'initializing': conversion from 'size_t' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(590,26): warning C4267: 'initializing': conversion from 'size_t' to 'c
|
||
onst int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(824,149): warning C4244: 'argument': conversion from 'int64_t' to 'int
|
||
', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(824,130): warning C4244: 'argument': conversion from 'int64_t' to 'int
|
||
', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(824,111): warning C4244: 'argument': conversion from 'int64_t' to 'int
|
||
', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(824,92): warning C4244: 'argument': conversion from 'int64_t' to 'int'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(838,23): warning C4244: 'initializing': conversion from 'int64_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(838,43): warning C4244: 'initializing': conversion from 'int64_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(872,149): warning C4244: 'argument': conversion from 'int64_t' to 'int
|
||
', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(872,130): warning C4244: 'argument': conversion from 'int64_t' to 'int
|
||
', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(872,111): warning C4244: 'argument': conversion from 'int64_t' to 'int
|
||
', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(872,92): warning C4244: 'argument': conversion from 'int64_t' to 'int'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(887,23): warning C4244: 'initializing': conversion from 'int64_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(887,43): warning C4244: 'initializing': conversion from 'int64_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1218,27): warning C4267: 'initializing': conversion from 'size_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1010,9): warning C4297: 'clip_model_load': function assumed not to thr
|
||
ow an exception but does [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1010,9):
|
||
__declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function
|
||
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1429,13): warning C4297: 'clip_model_load': function assumed not to th
|
||
row an exception but does [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1429,13):
|
||
__declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function
|
||
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1529,48): warning C4267: 'argument': conversion from 'size_t' to 'int'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,58): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,46): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,88): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,77): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,98): warning C4244: 'argument': conversion from 'float' to 'const
|
||
unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,137): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,125): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,163): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,154): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,173): warning C4244: 'argument': conversion from 'float' to 'cons
|
||
t unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1627,103): warning C4244: '=': conversion from 'int' to 'float', possi
|
||
ble loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,58): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,46): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,88): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,77): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,98): warning C4244: 'argument': conversion from 'float' to 'const
|
||
unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,137): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,125): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,163): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,154): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,173): warning C4244: 'argument': conversion from 'float' to 'cons
|
||
t unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1628,103): warning C4244: '=': conversion from 'int' to 'float', possi
|
||
ble loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,58): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,46): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,88): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,77): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,98): warning C4244: 'argument': conversion from 'float' to 'const
|
||
unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,137): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,125): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,163): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,154): warning C4244: 'argument': conversion from 'int' to 'float'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,173): warning C4244: 'argument': conversion from 'float' to 'cons
|
||
t unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1629,103): warning C4244: '=': conversion from 'int' to 'float', possi
|
||
ble loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1630,58): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1630,46): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1630,84): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1630,75): warning C4244: 'argument': conversion from 'int' to 'float',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1630,94): warning C4244: 'argument': conversion from 'float' to 'const
|
||
unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1632,45): warning C4244: '=': conversion from 'double' to 'float', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1633,40): warning C4244: '=': conversion from 'double' to 'float', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1634,60): warning C4244: '=': conversion from 'double' to 'float', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1642,45): warning C4244: '=': conversion from 'double' to 'float', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1643,40): warning C4244: '=': conversion from 'double' to 'float', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1644,60): warning C4244: '=': conversion from 'double' to 'float', pos
|
||
sible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,49): warning C4244: 'initializing': conversion from 'const _Ty' t
|
||
o 'uint8_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,49): warning C4244: with [C:\neuro\ik_llama.cpp\build\exa
|
||
mples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,49): warning C4244: [ [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,49): warning C4244: _Ty=float [C:\neuro\ik_llama.cpp\
|
||
build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,49): warning C4244: ] [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,39): warning C4244: 'initializing': conversion from 'const _Ty' t
|
||
o 'const uint8_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,39): warning C4244: with [C:\neuro\ik_llama.cpp\build\exa
|
||
mples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,39): warning C4244: [ [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,39): warning C4244: _Ty=float [C:\neuro\ik_llama.cpp\
|
||
build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1647,39): warning C4244: ] [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1648,68): warning C4244: '=': conversion from 'float' to '_Ty', possib
|
||
le loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1648,68): warning C4244: with [C:\neuro\ik_llama.cpp\build\exa
|
||
mples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1648,68): warning C4244: [ [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1648,68): warning C4244: _Ty=uint8_t [C:\neuro\ik_llama.cp
|
||
p\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1648,68): warning C4244: ] [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1821,21): warning C4244: 'initializing': conversion from 'double' to '
|
||
float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1838,32): warning C4244: 'initializing': conversion from 'double' to '
|
||
float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1838,27): warning C4244: 'initializing': conversion from 'double' to '
|
||
const float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1839,63): warning C4244: 'initializing': conversion from 'double' to '
|
||
float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1839,23): warning C4244: 'initializing': conversion from 'double' to '
|
||
const float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1840,30): warning C4244: 'initializing': conversion from 'double' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1840,24): warning C4244: 'initializing': conversion from 'double' to '
|
||
const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1903,32): warning C4244: 'initializing': conversion from 'double' to '
|
||
float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1903,27): warning C4244: 'initializing': conversion from 'double' to '
|
||
const float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1904,63): warning C4244: 'initializing': conversion from 'double' to '
|
||
float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1904,23): warning C4244: 'initializing': conversion from 'double' to '
|
||
const float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1905,30): warning C4244: 'initializing': conversion from 'double' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(1905,24): warning C4244: 'initializing': conversion from 'double' to '
|
||
const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,44): warning C4244: 'initializing': conversion from 'const _Ty' t
|
||
o 'uint8_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,44): warning C4244: with [C:\neuro\ik_llama.cpp\build\exa
|
||
mples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,44): warning C4244: [ [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,44): warning C4244: _Ty=float [C:\neuro\ik_llama.cpp\
|
||
build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,44): warning C4244: ] [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,34): warning C4244: 'initializing': conversion from 'const _Ty' t
|
||
o 'const uint8_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,34): warning C4244: with [C:\neuro\ik_llama.cpp\build\exa
|
||
mples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,34): warning C4244: [ [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,34): warning C4244: _Ty=float [C:\neuro\ik_llama.cpp\
|
||
build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2077,34): warning C4244: ] [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2157,11): warning C4267: 'initializing': conversion from 'size_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2158,11): warning C4267: 'initializing': conversion from 'size_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2162,24): warning C4244: '=': conversion from 'double' to '_Ty', possi
|
||
ble loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2162,24): warning C4244: with [C:\neuro\ik_llama.cpp\build\exa
|
||
mples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2162,24): warning C4244: [ [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2162,24): warning C4244: _Ty=float [C:\neuro\ik_llama.cpp\
|
||
build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2162,24): warning C4244: ] [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2184,11): warning C4267: 'initializing': conversion from 'size_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2185,11): warning C4267: 'initializing': conversion from 'size_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2259,20): warning C4267: 'initializing': conversion from 'size_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2320,47): warning C4244: '=': conversion from 'double' to 'int', possi
|
||
ble loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2539,68): warning C4244: 'return': conversion from 'int64_t' to 'int',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2542,56): warning C4244: 'return': conversion from 'int64_t' to 'int',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2545,46): warning C4244: 'return': conversion from 'int64_t' to 'int',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2548,46): warning C4244: 'return': conversion from 'int64_t' to 'int',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2555,5): warning C4297: 'clip_n_mmproj_embd': function assumed not to
|
||
throw an exception but does [C:\neuro\ik_llama.cpp\build\examples\llava\llava.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\clip.cpp(2555,5):
|
||
__declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function
|
||
|
||
Generating Code...
|
||
llava.vcxproj -> C:\neuro\ik_llama.cpp\build\examples\llava\llava.dir\Release\llava.lib
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/common/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/benchmark/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/quantize-stats/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/llava/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/llava/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/gguf/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
common.cpp
|
||
benchmark-matmult.cpp
|
||
gguf.cpp
|
||
quantize-stats.cpp
|
||
Creating library C:/neuro/ik_llama.cpp/build/examples/llava/Release/llava_shared.lib and object C:/neuro/ik_lla
|
||
ma.cpp/build/examples/llava/Release/llava_shared.exp
|
||
llava_static.vcxproj -> C:\neuro\ik_llama.cpp\build\examples\llava\Release\llava_static.lib
|
||
test-c.c
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/gguf-hash/CMakeLists.txt
|
||
llava_shared.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llava_shared.dll
|
||
C:\neuro\ik_llama.cpp\examples\gguf\gguf.cpp(69,31): warning C4244: '=': conversion from 'int' to 'float', possible
|
||
loss of data [C:\neuro\ik_llama.cpp\build\examples\gguf\llama-gguf.vcxproj]
|
||
test-c.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-c.exe
|
||
gguf-hash.cpp
|
||
llama-gguf.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-gguf.exe
|
||
llama-bench-matmult.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-bench-matmult.exe
|
||
C:\neuro\ik_llama.cpp\common\common.cpp(328,30): warning C4996: 'strdup': The POSIX name for this item is deprecated
|
||
. Instead, use the ISO C and C++ conformant name: _strdup. See online help for details. [C:\neuro\ik_llama.cpp\build
|
||
\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gguf-hash\gguf-hash.cpp(383,55): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'uint32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gguf-hash\llama-gguf-hash.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gguf-hash\gguf-hash.cpp(412,80): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'uint32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gguf-hash\llama-gguf-hash.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gguf-hash\gguf-hash.cpp(453,78): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'uint32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gguf-hash\llama-gguf-hash.vcxproj]
|
||
llama-gguf-hash.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-gguf-hash.exe
|
||
llama-quantize-stats.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-quantize-stats.exe
|
||
sampling.cpp
|
||
C:\neuro\ik_llama.cpp\common\sampling.cpp(105,45): warning C4267: 'initializing': conversion from 'size_t' to 'int',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\sampling.cpp(105,20): warning C4267: 'initializing': conversion from 'size_t' to 'const
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
console.cpp
|
||
C:\neuro\ik_llama.cpp\common\console.cpp(253,30): warning C4267: 'initializing': conversion from 'size_t' to 'DWORD'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\console.cpp(407,28): warning C4267: 'initializing': conversion from 'size_t' to 'int',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
grammar-parser.cpp
|
||
json-schema-to-grammar.cpp
|
||
C:\neuro\ik_llama.cpp\common\json-schema-to-grammar.cpp(139,46): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\json-schema-to-grammar.cpp(139,37): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\json-schema-to-grammar.cpp(154,50): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\json-schema-to-grammar.cpp(154,41): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\json-schema-to-grammar.cpp(234,29): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\json-schema-to-grammar.cpp(245,33): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\json-schema-to-grammar.cpp(558,60): warning C4101: 'e': unreferenced local variable [C:
|
||
\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
train.cpp
|
||
ngram-cache.cpp
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(20,50): warning C4244: 'argument': conversion from 'int64_t' to 'const
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(100,16): warning C4267: 'initializing': conversion from 'size_t' to 'in
|
||
t', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(147,34): warning C4267: 'initializing': conversion from 'size_t' to 'in
|
||
t', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(147,24): warning C4267: 'initializing': conversion from 'size_t' to 'co
|
||
nst int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(156,82): warning C4267: 'initializing': conversion from 'size_t' to 'in
|
||
t', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(156,38): warning C4267: 'initializing': conversion from 'size_t' to 'co
|
||
nst int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(170,77): warning C4267: 'initializing': conversion from 'size_t' to 'in
|
||
t', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(170,38): warning C4267: 'initializing': conversion from 'size_t' to 'co
|
||
nst int', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(202,50): warning C4267: 'initializing': conversion from 'size_t' to 'in
|
||
t32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
C:\neuro\ik_llama.cpp\common\ngram-cache.cpp(202,31): warning C4267: 'initializing': conversion from 'size_t' to 'co
|
||
nst int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\common\common.vcxproj]
|
||
Generating Code...
|
||
common.vcxproj -> C:\neuro\ik_llama.cpp\build\common\Release\common.lib
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/llava/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/lookup/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/gguf-split/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/sweep-bench/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/tokenize/CMakeLists.txt
|
||
lookup-merge.cpp
|
||
llava-cli.cpp
|
||
test-sampling.cpp
|
||
test-json-schema-to-grammar.cpp
|
||
test-quantize-fns.cpp
|
||
test-quantize-perf.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(157,34): warning C4244: 'argument': conversion from 'llama_token' to '
|
||
float', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(164,45): warning C4267: 'initializing': conversion from 'size_t' to 'l
|
||
lama_token', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(164,36): warning C4267: 'initializing': conversion from 'size_t' to 'c
|
||
onst llama_token', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(179,38): warning C4267: 'initializing': conversion from 'size_t' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(179,24): warning C4267: 'initializing': conversion from 'size_t' to 'c
|
||
onst int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(189,67): warning C4267: 'initializing': conversion from 'size_t' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(189,39): warning C4267: 'initializing': conversion from 'size_t' to 'c
|
||
onst int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(190,55): warning C4244: 'initializing': conversion from 'float' to 'in
|
||
t', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(190,48): warning C4244: 'initializing': conversion from 'float' to 'co
|
||
nst int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(192,33): warning C4267: '=': conversion from 'size_t' to 'llama_token'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(212,31): warning C4244: 'initializing': conversion from 'float' to 'in
|
||
t', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(216,34): warning C4244: '=': conversion from 'float' to 'llama_token',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(229,12): warning C4477: 'printf' : format string '%05ld' requires an a
|
||
rgument of type 'long', but variadic argument 2 has type 'const size_t' [C:\neuro\ik_llama.cpp\build\tests\test-samp
|
||
ling.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(229,12):
|
||
consider using '%zd' in the format string
|
||
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/export-lora/CMakeLists.txt
|
||
C:\neuro\ik_llama.cpp\tests\test-sampling.cpp(275,49): warning C4305: 'argument': truncation from 'double' to 'const
|
||
float' [C:\neuro\ik_llama.cpp\build\tests\test-sampling.vcxproj]
|
||
test-tokenizer-1-spm.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
test-rope.cpp
|
||
gguf-split.cpp
|
||
test-tokenizer-0.cpp
|
||
test-model-load-cancel.cpp
|
||
get-model.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Generating Code...
|
||
get-model.cpp
|
||
get-model.cpp
|
||
Generating Code...
|
||
Generating Code...
|
||
C:\neuro\ik_llama.cpp\examples\llava\llava-cli.cpp(89,105): warning C4267: 'argument': conversion from 'size_t' to '
|
||
int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llama-llava-cli.vcxproj]
|
||
get-model.cpp
|
||
get-model.cpp
|
||
Generating Code...
|
||
Generating Code...
|
||
sweep-bench.cpp
|
||
export-lora.cpp
|
||
tokenize.cpp
|
||
test-backend-ops.cpp
|
||
test-grad0.cpp
|
||
test-chat-template.cpp
|
||
get-model.cpp
|
||
Generating Code...
|
||
test-grammar-integration.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/passkey/CMakeLists.txt
|
||
test-tokenizer-1-bpe.cpp
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(601,20): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(632,24): warning C4244: 'initializing': conversion from 'int64_t' t
|
||
o 'double', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,87): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,87): warning C4244: with [C:\neuro\ik_llama.cpp\build\t
|
||
ests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,87): warning C4244: [ [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,87): warning C4244: _Ty=int64_t [C:\neuro\ik_llama.
|
||
cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,87): warning C4244: ] [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,75): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,75): warning C4244: with [C:\neuro\ik_llama.cpp\build\t
|
||
ests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,75): warning C4244: [ [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,75): warning C4244: _Ty=int64_t [C:\neuro\ik_llama.
|
||
cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,75): warning C4244: ] [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,63): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,63): warning C4244: with [C:\neuro\ik_llama.cpp\build\t
|
||
ests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,63): warning C4244: [ [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,63): warning C4244: _Ty=int64_t [C:\neuro\ik_llama.
|
||
cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,63): warning C4244: ] [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,51): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,51): warning C4244: with [C:\neuro\ik_llama.cpp\build\t
|
||
ests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,51): warning C4244: [ [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,51): warning C4244: _Ty=int64_t [C:\neuro\ik_llama.
|
||
cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(778,51): warning C4244: ] [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,87): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,87): warning C4244: with [C:\neuro\ik_llama.cpp\build\t
|
||
ests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,87): warning C4244: [ [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,87): warning C4244: _Ty=int64_t [C:\neuro\ik_llama.
|
||
cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,87): warning C4244: ] [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,75): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,75): warning C4244: with [C:\neuro\ik_llama.cpp\build\t
|
||
ests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,75): warning C4244: [ [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,75): warning C4244: _Ty=int64_t [C:\neuro\ik_llama.
|
||
cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,75): warning C4244: ] [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,63): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,63): warning C4244: with [C:\neuro\ik_llama.cpp\build\t
|
||
ests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,63): warning C4244: [ [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,63): warning C4244: _Ty=int64_t [C:\neuro\ik_llama.
|
||
cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,63): warning C4244: ] [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,51): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,51): warning C4244: with [C:\neuro\ik_llama.cpp\build\t
|
||
ests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,51): warning C4244: [ [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,51): warning C4244: _Ty=int64_t [C:\neuro\ik_llama.
|
||
cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(814,51): warning C4244: ] [C:\neuro\ik_llama.cpp\build\test
|
||
s\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1280,85): warning C4244: 'argument': conversion from 'const int' to
|
||
'float', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1280,81): warning C4244: 'argument': conversion from 'const int' to
|
||
'float', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1431,35): warning C4244: '=': conversion from 'int' to '_Ty', possi
|
||
ble loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1431,35): warning C4244: with [C:\neuro\ik_llama.cpp\build\
|
||
tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1431,35): warning C4244: [ [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1431,35): warning C4244: _Ty=float [C:\neuro\ik_llama.c
|
||
pp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1431,35): warning C4244: ] [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,94): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,94): warning C4244: with [C:\neuro\ik_llama.cpp\build\
|
||
tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,94): warning C4244: [ [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,94): warning C4244: _Ty=int64_t [C:\neuro\ik_llama
|
||
.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,94): warning C4244: ] [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,83): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,83): warning C4244: with [C:\neuro\ik_llama.cpp\build\
|
||
tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,83): warning C4244: [ [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,83): warning C4244: _Ty=int64_t [C:\neuro\ik_llama
|
||
.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,83): warning C4244: ] [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,73): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,73): warning C4244: with [C:\neuro\ik_llama.cpp\build\
|
||
tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,73): warning C4244: [ [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,73): warning C4244: _Ty=int64_t [C:\neuro\ik_llama
|
||
.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,73): warning C4244: ] [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,62): warning C4244: 'argument': conversion from 'const _Ty' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,62): warning C4244: with [C:\neuro\ik_llama.cpp\build\
|
||
tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,62): warning C4244: [ [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,62): warning C4244: _Ty=int64_t [C:\neuro\ik_llama
|
||
.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1504,62): warning C4244: ] [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(1677,77): warning C4244: 'argument': conversion from 'const int64_t
|
||
' to 'float', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2377,32): warning C4244: 'initializing': conversion from 'const _El
|
||
em' to 'float', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2377,32): warning C4244: with [C:\neuro\ik_llama.cpp\build\
|
||
tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2377,32): warning C4244: [ [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2377,32): warning C4244: _Elem=int [C:\neuro\ik_llama.c
|
||
pp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2377,32): warning C4244: ] [C:\neuro\ik_llama.cpp\build\tes
|
||
ts\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2383,125): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2386,129): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2387,129): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2388,129): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2392,129): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2393,129): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2394,129): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2395,129): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2396,129): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-backend-ops.cpp(2399,125): warning C4244: 'argument': conversion from 'float' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-backend-ops.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-chat-template.cpp(117,143): warning C4267: 'argument': conversion from 'size_t' to
|
||
'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-chat-template.vcxproj]
|
||
C:\neuro\ik_llama.cpp\tests\test-chat-template.cpp(131,32): warning C4267: 'argument': conversion from 'size_t' to '
|
||
int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-chat-template.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gguf-split\gguf-split.cpp(257,68): warning C4267: 'argument': conversion from 'size_t
|
||
' to 'uint16_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gguf-split\llama-gguf-split.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gguf-split\gguf-split.cpp(278,16): warning C4477: 'printf' : format string '%ld' requ
|
||
ires an argument of type 'long', but variadic argument 1 has type 'unsigned __int64' [C:\neuro\ik_llama.cpp\build\ex
|
||
amples\gguf-split\llama-gguf-split.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gguf-split\gguf-split.cpp(278,16):
|
||
consider using '%zd' in the format string
|
||
|
||
C:\neuro\ik_llama.cpp\examples\gguf-split\gguf-split.cpp(288,20): warning C4477: 'printf' : format string '%ld' requ
|
||
ires an argument of type 'long', but variadic argument 3 has type 'size_t' [C:\neuro\ik_llama.cpp\build\examples\ggu
|
||
f-split\llama-gguf-split.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gguf-split\gguf-split.cpp(288,20):
|
||
consider using '%zd' in the format string
|
||
|
||
C:\neuro\ik_llama.cpp\examples\gguf-split\gguf-split.cpp(295,21): warning C4267: 'initializing': conversion from 'si
|
||
ze_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gguf-split\llama-gguf-split.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gguf-split\gguf-split.cpp(369,17): warning C4267: 'initializing': conversion from 'si
|
||
ze_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gguf-split\llama-gguf-split.vcxproj]
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/save-load-state/CMakeLists.txt
|
||
test-llama-grammar.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/simple/CMakeLists.txt
|
||
C:\neuro\ik_llama.cpp\examples\export-lora\export-lora.cpp(254,16): warning C4477: 'printf' : format string '%ld' re
|
||
quires an argument of type 'long', but variadic argument 2 has type 'size_t' [C:\neuro\ik_llama.cpp\build\examples\e
|
||
xport-lora\llama-export-lora.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\export-lora\export-lora.cpp(254,16):
|
||
consider using '%zd' in the format string
|
||
|
||
C:\neuro\ik_llama.cpp\examples\export-lora\export-lora.cpp(255,16): warning C4477: 'printf' : format string '%ld' re
|
||
quires an argument of type 'long', but variadic argument 2 has type 'unsigned __int64' [C:\neuro\ik_llama.cpp\build\
|
||
examples\export-lora\llama-export-lora.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\export-lora\export-lora.cpp(255,16):
|
||
consider using '%zd' in the format string
|
||
|
||
C:\neuro\ik_llama.cpp\examples\export-lora\export-lora.cpp(337,24): warning C4477: 'printf' : format string '%ld' re
|
||
quires an argument of type 'long', but variadic argument 2 has type 'size_t' [C:\neuro\ik_llama.cpp\build\examples\e
|
||
xport-lora\llama-export-lora.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\export-lora\export-lora.cpp(337,24):
|
||
consider using '%zd' in the format string
|
||
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(94,77): warning C4267: 'argument': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\tokenize\llama-tokenize.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(98,57): warning C4267: 'argument': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\tokenize\llama-tokenize.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(150,91): warning C4267: 'argument': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\tokenize\llama-tokenize.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(155,25): warning C4267: 'initializing': conversion from 'size_t
|
||
' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\tokenize\llama-tokenize.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(172,52): warning C4267: 'argument': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\tokenize\llama-tokenize.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(185,31): warning C4267: 'initializing': conversion from 'size_t
|
||
' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\tokenize\llama-tokenize.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(185,20): warning C4267: 'initializing': conversion from 'size_t
|
||
' to 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\tokenize\llama-tokenize.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(399,16): warning C4477: 'printf' : format string '%ld' requires
|
||
an argument of type 'long', but variadic argument 1 has type 'unsigned __int64' [C:\neuro\ik_llama.cpp\build\exampl
|
||
es\tokenize\llama-tokenize.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\tokenize\tokenize.cpp(399,16):
|
||
consider using '%zd' in the format string
|
||
|
||
get-model.cpp
|
||
passkey.cpp
|
||
test-autorelease.cpp
|
||
save-load-state.cpp
|
||
simple.cpp
|
||
Generating Code...
|
||
get-model.cpp
|
||
test-tokenizer-1-spm.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-tokenizer-1-spm.exe
|
||
llama-lookup-merge.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-lookup-merge.exe
|
||
C:\neuro\ik_llama.cpp\tests\test-llama-grammar.cpp(205,20): warning C4267: '=': conversion from 'size_t' to 'uint32_
|
||
t', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-llama-grammar.vcxproj]
|
||
get-model.cpp
|
||
Generating Code...
|
||
Generating Code...
|
||
get-model.cpp
|
||
C:\neuro\ik_llama.cpp\examples\save-load-state\save-load-state.cpp(45,69): warning C4267: 'argument': conversion fro
|
||
m 'size_t' to 'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\save-load-state\llama-save-load
|
||
-state.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\save-load-state\save-load-state.cpp(46,26): warning C4267: '+=': conversion from 'siz
|
||
e_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\save-load-state\llama-save-load-state.vcx
|
||
proj]
|
||
C:\neuro\ik_llama.cpp\examples\simple\simple.cpp(64,45): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\simple\llama-simple.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\simple\simple.cpp(64,24): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\simple\llama-simple.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\simple\simple.cpp(92,48): warning C4267: 'argument': conversion from 'size_t' to 'lla
|
||
ma_pos', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\simple\llama-simple.vcxproj]
|
||
Generating Code...
|
||
C:\neuro\ik_llama.cpp\examples\passkey\passkey.cpp(29,23): warning C4244: 'argument': conversion from 'time_t' to 'u
|
||
nsigned int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\passkey\llama-passkey.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\passkey\passkey.cpp(94,80): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\passkey\llama-passkey.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\passkey\passkey.cpp(94,31): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\passkey\llama-passkey.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\passkey\passkey.cpp(96,46): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\passkey\llama-passkey.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\passkey\passkey.cpp(96,28): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\passkey\llama-passkey.vcxproj]
|
||
get-model.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/lookup/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/pocs/vdot/CMakeLists.txt
|
||
Generating Code...
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/retrieval/CMakeLists.txt
|
||
Creating library C:/neuro/ik_llama.cpp/build/examples/llava/Release/llama-llava-cli.lib and object C:/neuro/ik_
|
||
llama.cpp/build/examples/llava/Release/llama-llava-cli.exp
|
||
lookup.cpp
|
||
test-sampling.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-sampling.exe
|
||
q8dot.cpp
|
||
test-grad0.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-grad0.exe
|
||
test-rope.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-rope.exe
|
||
llama-llava-cli.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-llava-cli.exe
|
||
retrieval.cpp
|
||
test-quantize-fns.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-quantize-fns.exe
|
||
test-tokenizer-1-bpe.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-tokenizer-1-bpe.exe
|
||
test-autorelease.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-autorelease.exe
|
||
llama-tokenize.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-tokenize.exe
|
||
test-tokenizer-0.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-tokenizer-0.exe
|
||
get-model.cpp
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup.cpp(56,102): warning C4267: 'argument': conversion from 'size_t' to 'in
|
||
t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup.cpp(92,33): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup.cpp(92,23): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup.cpp(105,16): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup.cpp(210,57): warning C4267: 'argument': conversion from 'size_t' to 'll
|
||
ama_pos', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup.cpp(214,35): warning C4267: '+=': conversion from 'size_t' to 'int', po
|
||
ssible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\retrieval\retrieval.cpp(79,43): warning C4267: 'argument': conversion from 'size_t' t
|
||
o 'llama_pos', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\retrieval\llama-retrieval.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\retrieval\retrieval.cpp(146,12): warning C4477: 'printf' : format string '%ld' requir
|
||
es an argument of type 'long', but variadic argument 1 has type 'unsigned __int64' [C:\neuro\ik_llama.cpp\build\exam
|
||
ples\retrieval\llama-retrieval.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\retrieval\retrieval.cpp(146,12):
|
||
consider using '%zd' in the format string
|
||
|
||
C:\neuro\ik_llama.cpp\examples\retrieval\retrieval.cpp(214,37): warning C4267: 'initializing': conversion from 'size
|
||
_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\retrieval\llama-retrieval.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\retrieval\retrieval.cpp(214,24): warning C4267: 'initializing': conversion from 'size
|
||
_t' to 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\retrieval\llama-retrieval.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\retrieval\retrieval.cpp(215,49): warning C4244: 'argument': conversion from 'const ui
|
||
nt64_t' to 'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\retrieval\llama-retrieval.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\retrieval\retrieval.cpp(263,59): warning C4244: 'argument': conversion from 'const ui
|
||
nt64_t' to 'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\retrieval\llama-retrieval.vcxproj]
|
||
Generating Code...
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/gritlm/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/llava/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/main/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/pocs/vdot/CMakeLists.txt
|
||
test-chat-template.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-chat-template.exe
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/perplexity/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/cvector-generator/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/embedding/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/tests/CMakeLists.txt
|
||
gritlm.cpp
|
||
minicpmv-cli.cpp
|
||
vdot.cpp
|
||
main.cpp
|
||
perplexity.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/convert-llama2c-to-ggml/CMakeLists.txt
|
||
C:\neuro\ik_llama.cpp\examples\gritlm\gritlm.cpp(23,43): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gritlm\llama-gritlm.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gritlm\gritlm.cpp(23,30): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'const int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gritlm\llama-gritlm.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gritlm\gritlm.cpp(30,82): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gritlm\llama-gritlm.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gritlm\gritlm.cpp(30,30): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'const int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gritlm\llama-gritlm.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\minicpmv-cli.cpp(198,27): warning C4244: 'initializing': conversion from 'doubl
|
||
e' to 'float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llama-minicpmv-cli.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llava\minicpmv-cli.cpp(204,30): warning C4244: 'initializing': conversion from 'doubl
|
||
e' to 'float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llava\llama-minicpmv-cli.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\gritlm\gritlm.cpp(77,65): warning C4244: 'argument': conversion from 'uint64_t' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\gritlm\llama-gritlm.vcxproj]
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/speculative/CMakeLists.txt
|
||
cvector-generator.cpp
|
||
embedding.cpp
|
||
test-quantize-perf.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-quantize-perf.exe
|
||
convert-llama2c-to-ggml.cpp
|
||
test-model-load-cancel.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-model-load-cancel.exe
|
||
llama-gguf-split.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-gguf-split.exe
|
||
llama-retrieval.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-retrieval.exe
|
||
test-backend-ops.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-backend-ops.exe
|
||
test-json-schema-to-grammar.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-json-schema-to-grammar.exe
|
||
test-grammar-parser.cpp
|
||
llama-q8dot.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-q8dot.exe
|
||
llama-lookup.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-lookup.exe
|
||
llama-simple.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-simple.exe
|
||
llama-export-lora.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-export-lora.exe
|
||
speculative.cpp
|
||
C:\neuro\ik_llama.cpp\tests\test-grammar-parser.cpp(39,73): warning C4267: 'argument': conversion from 'size_t' to '
|
||
unsigned int', possible loss of data [C:\neuro\ik_llama.cpp\build\tests\test-grammar-parser.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\main\main.cpp(399,19): warning C4804: '>': unsafe use of type 'bool' in operation [C:
|
||
\neuro\ik_llama.cpp\build\examples\main\llama-cli.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\pca.hpp(29,43): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cvector-generator.vcx
|
||
proj]
|
||
(compiling source file '../../examples/cvector-generator/cvector-generator.cpp')
|
||
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\pca.hpp(41,23): warning C4305: 'initializing': truncation from 'dou
|
||
ble' to 'float' [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cvector-generator.vcxproj]
|
||
(compiling source file '../../examples/cvector-generator/cvector-generator.cpp')
|
||
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\pca.hpp(318,26): warning C4267: '=': conversion from 'size_t' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cvector-generator.vcxproj]
|
||
(compiling source file '../../examples/cvector-generator/cvector-generator.cpp')
|
||
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\pca.hpp(319,39): warning C4267: '=': conversion from 'size_t' to 'i
|
||
nt', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cvector-generator.vcxproj]
|
||
(compiling source file '../../examples/cvector-generator/cvector-generator.cpp')
|
||
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(99,41): warning C4244: 'argument': conversion
|
||
from 'float' to 'const unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-gener
|
||
ator\llama-cvector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(100,41): warning C4244: 'argument': conversio
|
||
n from 'float' to 'const unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-gene
|
||
rator\llama-cvector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(101,50): warning C4244: 'argument': conversio
|
||
n from 'float' to 'const unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-gene
|
||
rator\llama-cvector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(106,60): warning C4244: 'argument': conversio
|
||
n from 'float' to 'const unsigned __int64', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-gene
|
||
rator\llama-cvector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(117,24): warning C4244: 'initializing': conve
|
||
rsion from 'int64_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-c
|
||
vector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(127,45): warning C4305: 'argument': truncatio
|
||
n from 'double' to 'float' [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cvector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(133,28): warning C4267: 'initializing': conve
|
||
rsion from 'size_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cv
|
||
ector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(135,20): warning C4244: 'initializing': conve
|
||
rsion from 'int64_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-c
|
||
vector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(232,24): warning C4267: 'initializing': conve
|
||
rsion from 'size_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cv
|
||
ector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(342,73): warning C4267: 'argument': conversio
|
||
n from 'size_t' to 'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cv
|
||
ector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(355,71): warning C4267: 'argument': conversio
|
||
n from 'size_t' to 'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cv
|
||
ector-generator.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\cvector-generator\cvector-generator.cpp(450,29): warning C4267: '=': conversion from
|
||
'size_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\cvector-generator\llama-cvector-gener
|
||
ator.vcxproj]
|
||
get-model.cpp
|
||
Generating Code...
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/eval-callback/CMakeLists.txt
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/gbnf-validator/CMakeLists.txt
|
||
Generating colorthemes.css.hpp
|
||
test-llama-grammar.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-llama-grammar.exe
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(47,27): warning C4244: '=': conversion from 'time_t' to '
|
||
uint32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(154,33): warning C4267: 'initializing': conversion from '
|
||
size_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(154,23): warning C4267: 'initializing': conversion from '
|
||
size_t' to 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\speculative\llama-speculative.vc
|
||
xproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(175,20): warning C4267: 'initializing': conversion from '
|
||
size_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(176,20): warning C4267: 'initializing': conversion from '
|
||
size_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(244,102): warning C4267: 'argument': conversion from 'siz
|
||
e_t' to '_Ty', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(244,102): warning C4267: with [C:\neuro\ik_llama.
|
||
cpp\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(244,102): warning C4267: [ [C:\neuro\ik_llama.cpp
|
||
\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(244,102): warning C4267: _Ty=unsigned int [C:
|
||
\neuro\ik_llama.cpp\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(244,102): warning C4267: ] [C:\neuro\ik_llama.cpp
|
||
\build\examples\speculative\llama-speculative.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\speculative\speculative.cpp(260,33): warning C4244: 'initializing': conversion from '
|
||
double' to 'float', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\speculative\llama-speculative.vcxpro
|
||
j]
|
||
Generating style.css.hpp
|
||
llama-passkey.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-passkey.exe
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/lookup/CMakeLists.txt
|
||
eval-callback.cpp
|
||
gbnf-validator.cpp
|
||
Generating theme-beeninorder.css.hpp
|
||
test-grammar-integration.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-grammar-integration.exe
|
||
Generating theme-ketivah.css.hpp
|
||
Creating library C:/neuro/ik_llama.cpp/build/examples/llava/Release/llama-minicpmv-cli.lib and object C:/neuro/
|
||
ik_llama.cpp/build/examples/llava/Release/llama-minicpmv-cli.exp
|
||
lookup-stats.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/infill/CMakeLists.txt
|
||
Generating theme-mangotango.css.hpp
|
||
llama-gritlm.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-gritlm.exe
|
||
llama-minicpmv-cli.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-minicpmv-cli.exe
|
||
infill.cpp
|
||
Generating theme-playground.css.hpp
|
||
test-grammar-parser.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\test-grammar-parser.exe
|
||
llama-embedding.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-embedding.exe
|
||
C:\neuro\ik_llama.cpp\examples\eval-callback\eval-callback.cpp(134,73): warning C4267: 'argument': conversion from '
|
||
size_t' to 'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\eval-callback\llama-eval-callback.
|
||
vcxproj]
|
||
llama-save-load-state.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-save-load-state.exe
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/batched/CMakeLists.txt
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup-stats.cpp(66,33): warning C4267: 'initializing': conversion from 'size_
|
||
t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup-stats.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup-stats.cpp(66,23): warning C4267: 'initializing': conversion from 'size_
|
||
t' to 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup-stats.vcxproj]
|
||
Generating theme-polarnight.css.hpp
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup-stats.cpp(92,39): warning C4267: '+=': conversion from 'size_t' to 'int
|
||
', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup-stats.vcxproj]
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/batched-bench/CMakeLists.txt
|
||
Generating theme-snowstorm.css.hpp
|
||
Generating index.html.hpp
|
||
batched.cpp
|
||
batched-bench.cpp
|
||
llama-convert-llama2c-to-ggml.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-convert-llama2c-to-ggml.exe
|
||
llama-cvector-generator.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-cvector-generator.exe
|
||
llama-gbnf-validator.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-gbnf-validator.exe
|
||
llama-perplexity.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-perplexity.exe
|
||
llama-sweep-bench.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-sweep-bench.exe
|
||
llama-eval-callback.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-eval-callback.exe
|
||
Generating index-new.html.hpp
|
||
C:\neuro\ik_llama.cpp\examples\batched\batched.cpp(57,45): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\batched\llama-batched.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\batched\batched.cpp(57,24): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\batched\llama-batched.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\batched\batched.cpp(96,50): warning C4267: 'argument': conversion from 'size_t' to 'i
|
||
nt32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\batched\llama-batched.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\batched\batched.cpp(105,48): warning C4267: 'argument': conversion from 'size_t' to '
|
||
llama_pos', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\batched\llama-batched.vcxproj]
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/lookahead/CMakeLists.txt
|
||
Generating index.js.hpp
|
||
Generating completion.js.hpp
|
||
Generating system-prompts.js.hpp
|
||
llama-lookup-stats.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-lookup-stats.exe
|
||
lookahead.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/baby-llama/CMakeLists.txt
|
||
Generating prompt-formats.js.hpp
|
||
llama-batched-bench.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-batched-bench.exe
|
||
Generating json-schema-to-grammar.mjs.hpp
|
||
baby-llama.cpp
|
||
llama-batched.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-batched.exe
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/server/CMakeLists.txt
|
||
server.cpp
|
||
llama-infill.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-infill.exe
|
||
C:\neuro\ik_llama.cpp\examples\lookahead\lookahead.cpp(90,33): warning C4267: 'initializing': conversion from 'size_
|
||
t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookahead\llama-lookahead.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookahead\lookahead.cpp(90,23): warning C4267: 'initializing': conversion from 'size_
|
||
t' to 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookahead\llama-lookahead.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookahead\lookahead.cpp(107,16): warning C4267: 'initializing': conversion from 'size
|
||
_t' to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookahead\llama-lookahead.vcxproj]
|
||
llama-speculative.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-speculative.exe
|
||
C:\neuro\ik_llama.cpp\examples\lookahead\lookahead.cpp(364,129): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookahead\llama-lookahead.vcxproj]
|
||
llama-cli.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-cli.exe
|
||
llama-vdot.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-vdot.exe
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/quantize/CMakeLists.txt
|
||
llama-baby-llama.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-baby-llama.exe
|
||
C:\neuro\ik_llama.cpp\examples\server\utils.hpp(171,16): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
(compiling source file '../../../examples/server/server.cpp')
|
||
|
||
C:\neuro\ik_llama.cpp\examples\server\utils.hpp(182,52): warning C4267: '=': conversion from 'size_t' to 'uint8_t',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
(compiling source file '../../../examples/server/server.cpp')
|
||
|
||
C:\neuro\ik_llama.cpp\examples\server\utils.hpp(203,48): warning C4267: '=': conversion from 'size_t' to 'uint8_t',
|
||
possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
(compiling source file '../../../examples/server/server.cpp')
|
||
|
||
quantize.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/parallel/CMakeLists.txt
|
||
parallel.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/lookup/CMakeLists.txt
|
||
llama-lookahead.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-lookahead.exe
|
||
C:\neuro\ik_llama.cpp\examples\parallel\parallel.cpp(163,21): warning C4267: '=': conversion from 'size_t' to 'int32
|
||
_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\parallel\llama-parallel.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\parallel\parallel.cpp(169,55): warning C4267: 'initializing': conversion from 'size_t
|
||
' to 'int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\parallel\llama-parallel.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\parallel\parallel.cpp(169,35): warning C4267: 'initializing': conversion from 'size_t
|
||
' to 'const int32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\parallel\llama-parallel.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\parallel\parallel.cpp(263,68): warning C4267: 'argument': conversion from 'size_t' to
|
||
'llama_pos', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\parallel\llama-parallel.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\parallel\parallel.cpp(271,58): warning C4267: '=': conversion from 'size_t' to 'int32
|
||
_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\parallel\llama-parallel.vcxproj]
|
||
lookup-create.cpp
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/imatrix/CMakeLists.txt
|
||
imatrix.cpp
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(361,48): warning C4244: '+=': conversion from 'const double' to 'ui
|
||
nt64_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(362,48): warning C4244: '+=': conversion from 'const double' to 'ui
|
||
nt64_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(368,43): warning C4244: '+=': conversion from 'const double' to 'ui
|
||
nt64_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(369,43): warning C4244: '+=': conversion from 'const double' to 'ui
|
||
nt64_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(842,37): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(845,29): warning C4267: 'initializing': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/examples/llama-bench/CMakeLists.txt
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(1570,73): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(1570,32): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'const int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\lookup\lookup-create.cpp(39,96): warning C4267: 'argument': conversion from 'size_t'
|
||
to 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\lookup\llama-lookup-create.vcxproj]
|
||
llama-bench.cpp
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(1969,103): warning C4267: 'argument': conversion from 'size_t' to '
|
||
llama_pos', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(2001,71): warning C4267: 'argument': conversion from 'size_t' to 'l
|
||
lama_pos', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(2083,66): warning C4267: '=': conversion from 'size_t' to 'int32_t'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(2143,74): warning C4267: '=': conversion from 'size_t' to 'int32_t'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(2167,58): warning C4267: '=': conversion from 'size_t' to 'int32_t'
|
||
, possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(2203,46): warning C4805: '!=': unsafe mix of type 'int32_t' and typ
|
||
e 'bool' in operation [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(2253,97): warning C4267: 'argument': conversion from 'size_t' to 'l
|
||
lama_pos', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(2421,57): warning C4267: 'argument': conversion from 'size_t' to 'i
|
||
nt32_t', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\server\server.cpp(3363,21): warning C4267: 'initializing': conversion from 'size_t' t
|
||
o 'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\server\llama-server.vcxproj]
|
||
llama-parallel.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-parallel.exe
|
||
llama-quantize.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-quantize.exe
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(409,30): warning C4996: 'strdup': The POSIX name for this
|
||
item is deprecated. Instead, use the ISO C and C++ conformant name: _strdup. See online help for details. [C:\neuro
|
||
\ik_llama.cpp\build\examples\llama-bench\llama-bench.vcxproj]
|
||
llama-lookup-create.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-lookup-create.exe
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(1235,31): warning C4267: '=': conversion from 'size_t' to
|
||
'int', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llama-bench\llama-bench.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(92,13): warning C4244: 'initializing': conversion from 'd
|
||
ouble' to 'T', possible loss of data [C:\neuro\ik_llama.cpp\build\examples\llama-bench\llama-bench.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(92,13): warning C4244: with [C:\neuro\ik_llama.cp
|
||
p\build\examples\llama-bench\llama-bench.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(92,13): warning C4244: [ [C:\neuro\ik_llama.cpp\b
|
||
uild\examples\llama-bench\llama-bench.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(92,13): warning C4244: T=uint64_t [C:\neuro\i
|
||
k_llama.cpp\build\examples\llama-bench\llama-bench.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(92,13): warning C4244: ] [C:\neuro\ik_llama.cpp\b
|
||
uild\examples\llama-bench\llama-bench.vcxproj]
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(92,13):
|
||
the template instantiation context (the oldest one first) is
|
||
C:\neuro\ik_llama.cpp\examples\llama-bench\llama-bench.cpp(1145,18):
|
||
see reference to function template instantiation 'T stdev<uint64_t>(const std::vector<uint64_t,std::alloca
|
||
tor<uint64_t>> &)' being compiled
|
||
with
|
||
[
|
||
T=uint64_t
|
||
]
|
||
|
||
llama-imatrix.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-imatrix.exe
|
||
llama-bench.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-bench.exe
|
||
llama-server.vcxproj -> C:\neuro\ik_llama.cpp\build\bin\Release\llama-server.exe
|
||
Building Custom Rule C:/neuro/ik_llama.cpp/CMakeLists.txt
|
||
|
||
C:\neuro\ik_llama.cpp>
|
||
|
||
------------------------------------------------------------------
|
||
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release> ./llama-server.exe -t 7 -c 4096 -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf
|
||
INFO [ main] build info | tid="11116" timestamp=1746438993 build=3667 commit="e3fec173"
|
||
INFO [ main] system info | tid="11116" timestamp=1746438993 n_threads=7 n_threads_batch=-1 total_threads=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "
|
||
llama_model_loader: loaded meta data with 35 key-value pairs and 579 tensors from F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest))
|
||
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
||
llama_model_loader: - kv 0: general.architecture str = qwen3moe
|
||
llama_model_loader: - kv 1: general.type str = model
|
||
llama_model_loader: - kv 2: general.name str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 3: general.basename str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 4: general.quantized_by str = Unsloth
|
||
llama_model_loader: - kv 5: general.size_label str = 30B-A3B
|
||
llama_model_loader: - kv 6: general.repo_url str = https://huggingface.co/unsloth
|
||
llama_model_loader: - kv 7: qwen3moe.block_count u32 = 48
|
||
llama_model_loader: - kv 8: qwen3moe.context_length u32 = 40960
|
||
llama_model_loader: - kv 9: qwen3moe.embedding_length u32 = 2048
|
||
llama_model_loader: - kv 10: qwen3moe.feed_forward_length u32 = 6144
|
||
llama_model_loader: - kv 11: qwen3moe.attention.head_count u32 = 32
|
||
llama_model_loader: - kv 12: qwen3moe.attention.head_count_kv u32 = 4
|
||
llama_model_loader: - kv 13: qwen3moe.rope.freq_base f32 = 1000000.000000
|
||
llama_model_loader: - kv 14: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0.000001
|
||
llama_model_loader: - kv 15: qwen3moe.expert_used_count u32 = 8
|
||
llama_model_loader: - kv 16: qwen3moe.attention.key_length u32 = 128
|
||
llama_model_loader: - kv 17: qwen3moe.attention.value_length u32 = 128
|
||
llama_model_loader: - kv 18: qwen3moe.expert_count u32 = 128
|
||
llama_model_loader: - kv 19: qwen3moe.expert_feed_forward_length u32 = 768
|
||
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
|
||
llama_model_loader: - kv 21: tokenizer.ggml.pre str = qwen2
|
||
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
||
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
||
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,151387] = ["─а ─а", "─а─а ─а─а", "i n", "─а t",...
|
||
llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 151645
|
||
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 151654
|
||
llama_model_loader: - kv 27: tokenizer.ggml.add_bos_token bool = false
|
||
llama_model_loader: - kv 28: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
|
||
llama_model_loader: - kv 29: general.quantization_version u32 = 2
|
||
llama_model_loader: - kv 30: general.file_type u32 = 17
|
||
llama_model_loader: - kv 31: quantize.imatrix.file str = Qwen3-30B-A3B-GGUF/imatrix_unsloth.dat
|
||
llama_model_loader: - kv 32: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-30B-A3B.txt
|
||
llama_model_loader: - kv 33: quantize.imatrix.entries_count i32 = 384
|
||
llama_model_loader: - kv 34: quantize.imatrix.chunks_count i32 = 32
|
||
llama_model_loader: - type f32: 241 tensors
|
||
llama_model_loader: - type q5_K: 289 tensors
|
||
llama_model_loader: - type q6_K: 49 tensors
|
||
llm_load_vocab: special tokens cache size = 26
|
||
llm_load_vocab: token to piece cache size = 0.9311 MB
|
||
llm_load_print_meta: format = GGUF V3 (latest)
|
||
llm_load_print_meta: arch = qwen3moe
|
||
llm_load_print_meta: vocab type = BPE
|
||
llm_load_print_meta: n_vocab = 151936
|
||
llm_load_print_meta: n_merges = 151387
|
||
llm_load_print_meta: vocab_only = 0
|
||
llm_load_print_meta: n_ctx_train = 40960
|
||
llm_load_print_meta: n_embd = 2048
|
||
llm_load_print_meta: n_layer = 48
|
||
llm_load_print_meta: n_head = 32
|
||
llm_load_print_meta: n_head_kv = 4
|
||
llm_load_print_meta: n_rot = 128
|
||
llm_load_print_meta: n_swa = 0
|
||
llm_load_print_meta: n_swa_pattern = 1
|
||
llm_load_print_meta: n_embd_head_k = 128
|
||
llm_load_print_meta: n_embd_head_v = 128
|
||
llm_load_print_meta: n_gqa = 8
|
||
llm_load_print_meta: n_embd_k_gqa = 512
|
||
llm_load_print_meta: n_embd_v_gqa = 512
|
||
llm_load_print_meta: f_norm_eps = 0.0e+00
|
||
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
||
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
||
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
||
llm_load_print_meta: f_logit_scale = 0.0e+00
|
||
llm_load_print_meta: n_ff = 6144
|
||
llm_load_print_meta: n_expert = 128
|
||
llm_load_print_meta: n_expert_used = 8
|
||
llm_load_print_meta: causal attn = 1
|
||
llm_load_print_meta: pooling type = 0
|
||
llm_load_print_meta: rope type = 2
|
||
llm_load_print_meta: rope scaling = linear
|
||
llm_load_print_meta: freq_base_train = 1000000.0
|
||
llm_load_print_meta: freq_scale_train = 1
|
||
llm_load_print_meta: n_ctx_orig_yarn = 40960
|
||
llm_load_print_meta: rope_finetuned = unknown
|
||
llm_load_print_meta: ssm_d_conv = 0
|
||
llm_load_print_meta: ssm_d_inner = 0
|
||
llm_load_print_meta: ssm_d_state = 0
|
||
llm_load_print_meta: ssm_dt_rank = 0
|
||
llm_load_print_meta: model type = ?B
|
||
llm_load_print_meta: model ftype = Q5_K - Medium
|
||
llm_load_print_meta: model params = 30.532 B
|
||
llm_load_print_meta: model size = 20.228 GiB (5.691 BPW)
|
||
llm_load_print_meta: repeating layers = 19.791 GiB (5.684 BPW, 29.910 B parameters)
|
||
llm_load_print_meta: general.name = Qwen3-30B-A3B
|
||
llm_load_print_meta: BOS token = 11 ','
|
||
llm_load_print_meta: EOS token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: PAD token = 151654 '<|vision_pad|>'
|
||
llm_load_print_meta: LF token = 148848 '├Д─м'
|
||
llm_load_print_meta: EOT token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: max token length = 256
|
||
llm_load_print_meta: n_ff_exp = 768
|
||
llm_load_tensors: ggml ctx size = 0.25 MiB
|
||
llm_load_tensors: CPU buffer size = 20713.44 MiB
|
||
...................................................................................................
|
||
llama_new_context_with_model: n_ctx = 4096
|
||
llama_new_context_with_model: n_batch = 2048
|
||
llama_new_context_with_model: n_ubatch = 512
|
||
llama_new_context_with_model: flash_attn = 0
|
||
llama_new_context_with_model: mla_attn = 0
|
||
llama_new_context_with_model: attn_max_b = 0
|
||
llama_new_context_with_model: fused_moe = 0
|
||
llama_new_context_with_model: ser = -1, 0
|
||
llama_new_context_with_model: freq_base = 1000000.0
|
||
llama_new_context_with_model: freq_scale = 1
|
||
llama_kv_cache_init: CPU KV buffer size = 384.00 MiB
|
||
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
|
||
llama_new_context_with_model: CPU output buffer size = 1.16 MiB
|
||
llama_new_context_with_model: CPU compute buffer size = 304.75 MiB
|
||
llama_new_context_with_model: graph nodes = 2165
|
||
llama_new_context_with_model: graph splits = 1
|
||
INFO [ init] initializing slots | tid="11116" timestamp=1746439008 n_slots=1
|
||
INFO [ init] new slot | tid="11116" timestamp=1746439008 id_slot=0 n_ctx_slot=4096
|
||
INFO [ main] model loaded | tid="11116" timestamp=1746439008
|
||
INFO [ main] chat template | tid="11116" timestamp=1746439008 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
|
||
INFO [ main] HTTP server listening | tid="11116" timestamp=1746439008 hostname="127.0.0.1" port="8080" n_threads_http="15"
|
||
INFO [ update_slots] all slots are idle | tid="11116" timestamp=1746439008
|
||
INFO [ log_server_request] request | tid="19268" timestamp=1746439081 remote_addr="127.0.0.1" remote_port=63234 status=404 method="GET" path="/models" params={}
|
||
INFO [ launch_slot_with_task] slot is processing task | tid="11116" timestamp=1746439086 id_slot=0 id_task=0
|
||
INFO [ update_slots] kv cache rm [p0, end) | tid="11116" timestamp=1746439086 id_slot=0 id_task=0 p0=0
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release>
|
||
|
||
------------------------------------------------------------------
|
||
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release> ./llama-server.exe -t 7 -c 4096 -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf
|
||
INFO [ main] build info | tid="21556" timestamp=1746439373 build=3667 commit="e3fec173"
|
||
INFO [ main] system info | tid="21556" timestamp=1746439373 n_threads=7 n_threads_batch=-1 total_threads=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "
|
||
llama_model_loader: loaded meta data with 35 key-value pairs and 579 tensors from F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest))
|
||
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
||
llama_model_loader: - kv 0: general.architecture str = qwen3moe
|
||
llama_model_loader: - kv 1: general.type str = model
|
||
llama_model_loader: - kv 2: general.name str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 3: general.basename str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 4: general.quantized_by str = Unsloth
|
||
llama_model_loader: - kv 5: general.size_label str = 30B-A3B
|
||
llama_model_loader: - kv 6: general.repo_url str = https://huggingface.co/unsloth
|
||
llama_model_loader: - kv 7: qwen3moe.block_count u32 = 48
|
||
llama_model_loader: - kv 8: qwen3moe.context_length u32 = 40960
|
||
llama_model_loader: - kv 9: qwen3moe.embedding_length u32 = 2048
|
||
llama_model_loader: - kv 10: qwen3moe.feed_forward_length u32 = 6144
|
||
llama_model_loader: - kv 11: qwen3moe.attention.head_count u32 = 32
|
||
llama_model_loader: - kv 12: qwen3moe.attention.head_count_kv u32 = 4
|
||
llama_model_loader: - kv 13: qwen3moe.rope.freq_base f32 = 1000000.000000
|
||
llama_model_loader: - kv 14: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0.000001
|
||
llama_model_loader: - kv 15: qwen3moe.expert_used_count u32 = 8
|
||
llama_model_loader: - kv 16: qwen3moe.attention.key_length u32 = 128
|
||
llama_model_loader: - kv 17: qwen3moe.attention.value_length u32 = 128
|
||
llama_model_loader: - kv 18: qwen3moe.expert_count u32 = 128
|
||
llama_model_loader: - kv 19: qwen3moe.expert_feed_forward_length u32 = 768
|
||
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
|
||
llama_model_loader: - kv 21: tokenizer.ggml.pre str = qwen2
|
||
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
||
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
||
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,151387] = ["─а ─а", "─а─а ─а─а", "i n", "─а t",...
|
||
llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 151645
|
||
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 151654
|
||
llama_model_loader: - kv 27: tokenizer.ggml.add_bos_token bool = false
|
||
llama_model_loader: - kv 28: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
|
||
llama_model_loader: - kv 29: general.quantization_version u32 = 2
|
||
llama_model_loader: - kv 30: general.file_type u32 = 17
|
||
llama_model_loader: - kv 31: quantize.imatrix.file str = Qwen3-30B-A3B-GGUF/imatrix_unsloth.dat
|
||
llama_model_loader: - kv 32: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-30B-A3B.txt
|
||
llama_model_loader: - kv 33: quantize.imatrix.entries_count i32 = 384
|
||
llama_model_loader: - kv 34: quantize.imatrix.chunks_count i32 = 32
|
||
llama_model_loader: - type f32: 241 tensors
|
||
llama_model_loader: - type q5_K: 289 tensors
|
||
llama_model_loader: - type q6_K: 49 tensors
|
||
llm_load_vocab: special tokens cache size = 26
|
||
llm_load_vocab: token to piece cache size = 0.9311 MB
|
||
llm_load_print_meta: format = GGUF V3 (latest)
|
||
llm_load_print_meta: arch = qwen3moe
|
||
llm_load_print_meta: vocab type = BPE
|
||
llm_load_print_meta: n_vocab = 151936
|
||
llm_load_print_meta: n_merges = 151387
|
||
llm_load_print_meta: vocab_only = 0
|
||
llm_load_print_meta: n_ctx_train = 40960
|
||
llm_load_print_meta: n_embd = 2048
|
||
llm_load_print_meta: n_layer = 48
|
||
llm_load_print_meta: n_head = 32
|
||
llm_load_print_meta: n_head_kv = 4
|
||
llm_load_print_meta: n_rot = 128
|
||
llm_load_print_meta: n_swa = 0
|
||
llm_load_print_meta: n_swa_pattern = 1
|
||
llm_load_print_meta: n_embd_head_k = 128
|
||
llm_load_print_meta: n_embd_head_v = 128
|
||
llm_load_print_meta: n_gqa = 8
|
||
llm_load_print_meta: n_embd_k_gqa = 512
|
||
llm_load_print_meta: n_embd_v_gqa = 512
|
||
llm_load_print_meta: f_norm_eps = 0.0e+00
|
||
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
||
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
||
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
||
llm_load_print_meta: f_logit_scale = 0.0e+00
|
||
llm_load_print_meta: n_ff = 6144
|
||
llm_load_print_meta: n_expert = 128
|
||
llm_load_print_meta: n_expert_used = 8
|
||
llm_load_print_meta: causal attn = 1
|
||
llm_load_print_meta: pooling type = 0
|
||
llm_load_print_meta: rope type = 2
|
||
llm_load_print_meta: rope scaling = linear
|
||
llm_load_print_meta: freq_base_train = 1000000.0
|
||
llm_load_print_meta: freq_scale_train = 1
|
||
llm_load_print_meta: n_ctx_orig_yarn = 40960
|
||
llm_load_print_meta: rope_finetuned = unknown
|
||
llm_load_print_meta: ssm_d_conv = 0
|
||
llm_load_print_meta: ssm_d_inner = 0
|
||
llm_load_print_meta: ssm_d_state = 0
|
||
llm_load_print_meta: ssm_dt_rank = 0
|
||
llm_load_print_meta: model type = ?B
|
||
llm_load_print_meta: model ftype = Q5_K - Medium
|
||
llm_load_print_meta: model params = 30.532 B
|
||
llm_load_print_meta: model size = 20.228 GiB (5.691 BPW)
|
||
llm_load_print_meta: repeating layers = 19.791 GiB (5.684 BPW, 29.910 B parameters)
|
||
llm_load_print_meta: general.name = Qwen3-30B-A3B
|
||
llm_load_print_meta: BOS token = 11 ','
|
||
llm_load_print_meta: EOS token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: PAD token = 151654 '<|vision_pad|>'
|
||
llm_load_print_meta: LF token = 148848 '├Д─м'
|
||
llm_load_print_meta: EOT token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: max token length = 256
|
||
llm_load_print_meta: n_ff_exp = 768
|
||
llm_load_tensors: ggml ctx size = 0.25 MiB
|
||
llm_load_tensors: CPU buffer size = 20713.44 MiB
|
||
...................................................................................................
|
||
llama_new_context_with_model: n_ctx = 4096
|
||
llama_new_context_with_model: n_batch = 2048
|
||
llama_new_context_with_model: n_ubatch = 512
|
||
llama_new_context_with_model: flash_attn = 0
|
||
llama_new_context_with_model: mla_attn = 0
|
||
llama_new_context_with_model: attn_max_b = 0
|
||
llama_new_context_with_model: fused_moe = 0
|
||
llama_new_context_with_model: ser = -1, 0
|
||
llama_new_context_with_model: freq_base = 1000000.0
|
||
llama_new_context_with_model: freq_scale = 1
|
||
llama_kv_cache_init: CPU KV buffer size = 384.00 MiB
|
||
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
|
||
llama_new_context_with_model: CPU output buffer size = 1.16 MiB
|
||
llama_new_context_with_model: CPU compute buffer size = 304.75 MiB
|
||
llama_new_context_with_model: graph nodes = 2165
|
||
llama_new_context_with_model: graph splits = 1
|
||
INFO [ init] initializing slots | tid="21556" timestamp=1746439379 n_slots=1
|
||
INFO [ init] new slot | tid="21556" timestamp=1746439379 id_slot=0 n_ctx_slot=4096
|
||
INFO [ main] model loaded | tid="21556" timestamp=1746439379
|
||
INFO [ main] chat template | tid="21556" timestamp=1746439379 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
|
||
INFO [ main] HTTP server listening | tid="21556" timestamp=1746439379 hostname="127.0.0.1" port="8080" n_threads_http="15"
|
||
INFO [ update_slots] all slots are idle | tid="21556" timestamp=1746439379
|
||
INFO [ log_server_request] request | tid="16816" timestamp=1746439384 remote_addr="127.0.0.1" remote_port=57484 status=200 method="GET" path="/" params={}
|
||
INFO [ log_server_request] request | tid="15152" timestamp=1746439384 remote_addr="127.0.0.1" remote_port=61232 status=200 method="GET" path="/completion.js" params={}
|
||
INFO [ log_server_request] request | tid="19108" timestamp=1746439384 remote_addr="127.0.0.1" remote_port=61590 status=200 method="GET" path="/json-schema-to-grammar.mjs" params={}
|
||
INFO [ log_server_request] request | tid="16816" timestamp=1746439384 remote_addr="127.0.0.1" remote_port=57484 status=200 method="GET" path="/index.js" params={}
|
||
INFO [ log_server_request] request | tid="16816" timestamp=1746439384 remote_addr="127.0.0.1" remote_port=57484 status=404 method="GET" path="/favicon.ico" params={}
|
||
INFO [ launch_slot_with_task] slot is processing task | tid="21556" timestamp=1746439391 id_slot=0 id_task=0
|
||
INFO [ update_slots] kv cache rm [p0, end) | tid="21556" timestamp=1746439391 id_slot=0 id_task=0 p0=0
|
||
INFO [ print_timings] prompt eval time = 1253.52 ms / 50 tokens ( 25.07 ms per token, 39.89 tokens per second) | tid="21556" timestamp=1746439402 id_slot=0 id_task=0 t_prompt_processing=1253.524 n_prompt_tokens_processed=50 t_token=25.070479999999996 n_tokens_second=39.88754902179775
|
||
INFO [ print_timings] generation eval time = 10483.45 ms / 120 runs ( 87.36 ms per token, 11.45 tokens per second) | tid="21556" timestamp=1746439402 id_slot=0 id_task=0 t_token_generation=10483.451 n_decoded=120 t_token=87.36209166666666 n_tokens_second=11.44661237983561
|
||
INFO [ print_timings] total time = 11736.97 ms | tid="21556" timestamp=1746439402 id_slot=0 id_task=0 t_prompt_processing=1253.524 t_token_generation=10483.451 t_total=11736.974999999999
|
||
INFO [ update_slots] slot released | tid="21556" timestamp=1746439402 id_slot=0 id_task=0 n_ctx=4096 n_past=169 n_system_tokens=0 n_cache_tokens=169 truncated=false
|
||
INFO [ update_slots] all slots are idle | tid="21556" timestamp=1746439402
|
||
INFO [ log_server_request] request | tid="17584" timestamp=1746439402 remote_addr="127.0.0.1" remote_port=64288 status=200 method="POST" path="/completion" params={}
|
||
INFO [ update_slots] all slots are idle | tid="21556" timestamp=1746439402
|
||
INFO [ launch_slot_with_task] slot is processing task | tid="21556" timestamp=1746439409 id_slot=0 id_task=122
|
||
INFO [ update_slots] kv cache rm [p0, end) | tid="21556" timestamp=1746439409 id_slot=0 id_task=122 p0=49
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release>
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **10:14:49**:<br>
|
||
|
||
Even the benchmark crashes during generation. I don't know what the problem is, but it seems to be related to what happens during generation.
|
||
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release> .\llama-sweep-bench.exe -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf -c 4096 -t 7
|
||
llama_model_loader: loaded meta data with 35 key-value pairs and 579 tensors from F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest))
|
||
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
||
llama_model_loader: - kv 0: general.architecture str = qwen3moe
|
||
llama_model_loader: - kv 1: general.type str = model
|
||
llama_model_loader: - kv 2: general.name str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 3: general.basename str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 4: general.quantized_by str = Unsloth
|
||
llama_model_loader: - kv 5: general.size_label str = 30B-A3B
|
||
llama_model_loader: - kv 6: general.repo_url str = https://huggingface.co/unsloth
|
||
llama_model_loader: - kv 7: qwen3moe.block_count u32 = 48
|
||
llama_model_loader: - kv 8: qwen3moe.context_length u32 = 40960
|
||
llama_model_loader: - kv 9: qwen3moe.embedding_length u32 = 2048
|
||
llama_model_loader: - kv 10: qwen3moe.feed_forward_length u32 = 6144
|
||
llama_model_loader: - kv 11: qwen3moe.attention.head_count u32 = 32
|
||
llama_model_loader: - kv 12: qwen3moe.attention.head_count_kv u32 = 4
|
||
llama_model_loader: - kv 13: qwen3moe.rope.freq_base f32 = 1000000.000000
|
||
llama_model_loader: - kv 14: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0.000001
|
||
llama_model_loader: - kv 15: qwen3moe.expert_used_count u32 = 8
|
||
llama_model_loader: - kv 16: qwen3moe.attention.key_length u32 = 128
|
||
llama_model_loader: - kv 17: qwen3moe.attention.value_length u32 = 128
|
||
llama_model_loader: - kv 18: qwen3moe.expert_count u32 = 128
|
||
llama_model_loader: - kv 19: qwen3moe.expert_feed_forward_length u32 = 768
|
||
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
|
||
llama_model_loader: - kv 21: tokenizer.ggml.pre str = qwen2
|
||
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
||
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
||
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,151387] = ["─а ─а", "─а─а ─а─а", "i n", "─а t",...
|
||
llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 151645
|
||
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 151654
|
||
llama_model_loader: - kv 27: tokenizer.ggml.add_bos_token bool = false
|
||
llama_model_loader: - kv 28: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
|
||
llama_model_loader: - kv 29: general.quantization_version u32 = 2
|
||
llama_model_loader: - kv 30: general.file_type u32 = 17
|
||
llama_model_loader: - kv 31: quantize.imatrix.file str = Qwen3-30B-A3B-GGUF/imatrix_unsloth.dat
|
||
llama_model_loader: - kv 32: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-30B-A3B.txt
|
||
llama_model_loader: - kv 33: quantize.imatrix.entries_count i32 = 384
|
||
llama_model_loader: - kv 34: quantize.imatrix.chunks_count i32 = 32
|
||
llama_model_loader: - type f32: 241 tensors
|
||
llama_model_loader: - type q5_K: 289 tensors
|
||
llama_model_loader: - type q6_K: 49 tensors
|
||
llm_load_vocab: special tokens cache size = 26
|
||
llm_load_vocab: token to piece cache size = 0.9311 MB
|
||
llm_load_print_meta: format = GGUF V3 (latest)
|
||
llm_load_print_meta: arch = qwen3moe
|
||
llm_load_print_meta: vocab type = BPE
|
||
llm_load_print_meta: n_vocab = 151936
|
||
llm_load_print_meta: n_merges = 151387
|
||
llm_load_print_meta: vocab_only = 0
|
||
llm_load_print_meta: n_ctx_train = 40960
|
||
llm_load_print_meta: n_embd = 2048
|
||
llm_load_print_meta: n_layer = 48
|
||
llm_load_print_meta: n_head = 32
|
||
llm_load_print_meta: n_head_kv = 4
|
||
llm_load_print_meta: n_rot = 128
|
||
llm_load_print_meta: n_swa = 0
|
||
llm_load_print_meta: n_swa_pattern = 1
|
||
llm_load_print_meta: n_embd_head_k = 128
|
||
llm_load_print_meta: n_embd_head_v = 128
|
||
llm_load_print_meta: n_gqa = 8
|
||
llm_load_print_meta: n_embd_k_gqa = 512
|
||
llm_load_print_meta: n_embd_v_gqa = 512
|
||
llm_load_print_meta: f_norm_eps = 0.0e+00
|
||
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
||
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
||
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
||
llm_load_print_meta: f_logit_scale = 0.0e+00
|
||
llm_load_print_meta: n_ff = 6144
|
||
llm_load_print_meta: n_expert = 128
|
||
llm_load_print_meta: n_expert_used = 8
|
||
llm_load_print_meta: causal attn = 1
|
||
llm_load_print_meta: pooling type = 0
|
||
llm_load_print_meta: rope type = 2
|
||
llm_load_print_meta: rope scaling = linear
|
||
llm_load_print_meta: freq_base_train = 1000000.0
|
||
llm_load_print_meta: freq_scale_train = 1
|
||
llm_load_print_meta: n_ctx_orig_yarn = 40960
|
||
llm_load_print_meta: rope_finetuned = unknown
|
||
llm_load_print_meta: ssm_d_conv = 0
|
||
llm_load_print_meta: ssm_d_inner = 0
|
||
llm_load_print_meta: ssm_d_state = 0
|
||
llm_load_print_meta: ssm_dt_rank = 0
|
||
llm_load_print_meta: model type = ?B
|
||
llm_load_print_meta: model ftype = Q5_K - Medium
|
||
llm_load_print_meta: model params = 30.532 B
|
||
llm_load_print_meta: model size = 20.228 GiB (5.691 BPW)
|
||
llm_load_print_meta: repeating layers = 19.791 GiB (5.684 BPW, 29.910 B parameters)
|
||
llm_load_print_meta: general.name = Qwen3-30B-A3B
|
||
llm_load_print_meta: BOS token = 11 ','
|
||
llm_load_print_meta: EOS token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: PAD token = 151654 '<|vision_pad|>'
|
||
llm_load_print_meta: LF token = 148848 '├Д─м'
|
||
llm_load_print_meta: EOT token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: max token length = 256
|
||
llm_load_print_meta: n_ff_exp = 768
|
||
llm_load_tensors: ggml ctx size = 0.25 MiB
|
||
llm_load_tensors: CPU buffer size = 20713.44 MiB
|
||
...................................................................................................
|
||
llama_new_context_with_model: n_ctx = 4096
|
||
llama_new_context_with_model: n_batch = 2048
|
||
llama_new_context_with_model: n_ubatch = 512
|
||
llama_new_context_with_model: flash_attn = 0
|
||
llama_new_context_with_model: mla_attn = 0
|
||
llama_new_context_with_model: attn_max_b = 0
|
||
llama_new_context_with_model: fused_moe = 0
|
||
llama_new_context_with_model: ser = -1, 0
|
||
llama_new_context_with_model: freq_base = 1000000.0
|
||
llama_new_context_with_model: freq_scale = 1
|
||
llama_kv_cache_init: CPU KV buffer size = 384.00 MiB
|
||
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
|
||
llama_new_context_with_model: CPU output buffer size = 0.58 MiB
|
||
llama_new_context_with_model: CPU compute buffer size = 304.75 MiB
|
||
llama_new_context_with_model: graph nodes = 2165
|
||
llama_new_context_with_model: graph splits = 1
|
||
|
||
main: n_kv_max = 4096, n_batch = 2048, n_ubatch = 512, flash_attn = 0, n_gpu_layers = -1, n_threads = 7, n_threads_batch = 7
|
||
|
||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||
|-------|--------|--------|----------|----------|----------|----------|
|
||
| 512 | 128 | 0 | 10.780 | 47.49 | 8.250 | 15.51 |
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release>
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **10:22:33**:<br>
|
||
|
||
Can you try running with `-t 8`?
|
||
|
||
If that works, try also adding `-fa -rtr -fmoe`.
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **10:22:33**:<br>
|
||
|
||
Can you try running with `-t 8`?
|
||
|
||
If that works, try also adding `-fa -rtr`.
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **10:42:45**:<br>
|
||
|
||
8 cores make no difference.
|
||
-fa -rtr -fmoe Finally it works, but I noticed that every time before writing a comma the generation stops for half a second. The first time I see this.
|
||
In the llama.cpp avx2 release, generation is much faster.
|
||
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release> ./llama-server.exe -t 8 -c 4096 -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf
|
||
INFO [ main] build info | tid="11244" timestamp=1746440931 build=3667 commit="e3fec173"
|
||
INFO [ main] system info | tid="11244" timestamp=1746440931 n_threads=8 n_threads_batch=-1 total_threads=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "
|
||
llama_model_loader: loaded meta data with 35 key-value pairs and 579 tensors from F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest))
|
||
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
||
llama_model_loader: - kv 0: general.architecture str = qwen3moe
|
||
llama_model_loader: - kv 1: general.type str = model
|
||
llama_model_loader: - kv 2: general.name str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 3: general.basename str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 4: general.quantized_by str = Unsloth
|
||
llama_model_loader: - kv 5: general.size_label str = 30B-A3B
|
||
llama_model_loader: - kv 6: general.repo_url str = https://huggingface.co/unsloth
|
||
llama_model_loader: - kv 7: qwen3moe.block_count u32 = 48
|
||
llama_model_loader: - kv 8: qwen3moe.context_length u32 = 40960
|
||
llama_model_loader: - kv 9: qwen3moe.embedding_length u32 = 2048
|
||
llama_model_loader: - kv 10: qwen3moe.feed_forward_length u32 = 6144
|
||
llama_model_loader: - kv 11: qwen3moe.attention.head_count u32 = 32
|
||
llama_model_loader: - kv 12: qwen3moe.attention.head_count_kv u32 = 4
|
||
llama_model_loader: - kv 13: qwen3moe.rope.freq_base f32 = 1000000.000000
|
||
llama_model_loader: - kv 14: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0.000001
|
||
llama_model_loader: - kv 15: qwen3moe.expert_used_count u32 = 8
|
||
llama_model_loader: - kv 16: qwen3moe.attention.key_length u32 = 128
|
||
llama_model_loader: - kv 17: qwen3moe.attention.value_length u32 = 128
|
||
llama_model_loader: - kv 18: qwen3moe.expert_count u32 = 128
|
||
llama_model_loader: - kv 19: qwen3moe.expert_feed_forward_length u32 = 768
|
||
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
|
||
llama_model_loader: - kv 21: tokenizer.ggml.pre str = qwen2
|
||
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
||
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
||
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,151387] = ["─а ─а", "─а─а ─а─а", "i n", "─а t",...
|
||
llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 151645
|
||
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 151654
|
||
llama_model_loader: - kv 27: tokenizer.ggml.add_bos_token bool = false
|
||
llama_model_loader: - kv 28: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
|
||
llama_model_loader: - kv 29: general.quantization_version u32 = 2
|
||
llama_model_loader: - kv 30: general.file_type u32 = 17
|
||
llama_model_loader: - kv 31: quantize.imatrix.file str = Qwen3-30B-A3B-GGUF/imatrix_unsloth.dat
|
||
llama_model_loader: - kv 32: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-30B-A3B.txt
|
||
llama_model_loader: - kv 33: quantize.imatrix.entries_count i32 = 384
|
||
llama_model_loader: - kv 34: quantize.imatrix.chunks_count i32 = 32
|
||
llama_model_loader: - type f32: 241 tensors
|
||
llama_model_loader: - type q5_K: 289 tensors
|
||
llama_model_loader: - type q6_K: 49 tensors
|
||
llm_load_vocab: special tokens cache size = 26
|
||
llm_load_vocab: token to piece cache size = 0.9311 MB
|
||
llm_load_print_meta: format = GGUF V3 (latest)
|
||
llm_load_print_meta: arch = qwen3moe
|
||
llm_load_print_meta: vocab type = BPE
|
||
llm_load_print_meta: n_vocab = 151936
|
||
llm_load_print_meta: n_merges = 151387
|
||
llm_load_print_meta: vocab_only = 0
|
||
llm_load_print_meta: n_ctx_train = 40960
|
||
llm_load_print_meta: n_embd = 2048
|
||
llm_load_print_meta: n_layer = 48
|
||
llm_load_print_meta: n_head = 32
|
||
llm_load_print_meta: n_head_kv = 4
|
||
llm_load_print_meta: n_rot = 128
|
||
llm_load_print_meta: n_swa = 0
|
||
llm_load_print_meta: n_swa_pattern = 1
|
||
llm_load_print_meta: n_embd_head_k = 128
|
||
llm_load_print_meta: n_embd_head_v = 128
|
||
llm_load_print_meta: n_gqa = 8
|
||
llm_load_print_meta: n_embd_k_gqa = 512
|
||
llm_load_print_meta: n_embd_v_gqa = 512
|
||
llm_load_print_meta: f_norm_eps = 0.0e+00
|
||
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
||
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
||
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
||
llm_load_print_meta: f_logit_scale = 0.0e+00
|
||
llm_load_print_meta: n_ff = 6144
|
||
llm_load_print_meta: n_expert = 128
|
||
llm_load_print_meta: n_expert_used = 8
|
||
llm_load_print_meta: causal attn = 1
|
||
llm_load_print_meta: pooling type = 0
|
||
llm_load_print_meta: rope type = 2
|
||
llm_load_print_meta: rope scaling = linear
|
||
llm_load_print_meta: freq_base_train = 1000000.0
|
||
llm_load_print_meta: freq_scale_train = 1
|
||
llm_load_print_meta: n_ctx_orig_yarn = 40960
|
||
llm_load_print_meta: rope_finetuned = unknown
|
||
llm_load_print_meta: ssm_d_conv = 0
|
||
llm_load_print_meta: ssm_d_inner = 0
|
||
llm_load_print_meta: ssm_d_state = 0
|
||
llm_load_print_meta: ssm_dt_rank = 0
|
||
llm_load_print_meta: model type = ?B
|
||
llm_load_print_meta: model ftype = Q5_K - Medium
|
||
llm_load_print_meta: model params = 30.532 B
|
||
llm_load_print_meta: model size = 20.228 GiB (5.691 BPW)
|
||
llm_load_print_meta: repeating layers = 19.791 GiB (5.684 BPW, 29.910 B parameters)
|
||
llm_load_print_meta: general.name = Qwen3-30B-A3B
|
||
llm_load_print_meta: BOS token = 11 ','
|
||
llm_load_print_meta: EOS token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: PAD token = 151654 '<|vision_pad|>'
|
||
llm_load_print_meta: LF token = 148848 '├Д─м'
|
||
llm_load_print_meta: EOT token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: max token length = 256
|
||
llm_load_print_meta: n_ff_exp = 768
|
||
llm_load_tensors: ggml ctx size = 0.25 MiB
|
||
llm_load_tensors: CPU buffer size = 20713.44 MiB
|
||
...................................................................................................
|
||
llama_new_context_with_model: n_ctx = 4096
|
||
llama_new_context_with_model: n_batch = 2048
|
||
llama_new_context_with_model: n_ubatch = 512
|
||
llama_new_context_with_model: flash_attn = 0
|
||
llama_new_context_with_model: mla_attn = 0
|
||
llama_new_context_with_model: attn_max_b = 0
|
||
llama_new_context_with_model: fused_moe = 0
|
||
llama_new_context_with_model: ser = -1, 0
|
||
llama_new_context_with_model: freq_base = 1000000.0
|
||
llama_new_context_with_model: freq_scale = 1
|
||
llama_kv_cache_init: CPU KV buffer size = 384.00 MiB
|
||
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
|
||
llama_new_context_with_model: CPU output buffer size = 1.16 MiB
|
||
llama_new_context_with_model: CPU compute buffer size = 304.75 MiB
|
||
llama_new_context_with_model: graph nodes = 2165
|
||
llama_new_context_with_model: graph splits = 1
|
||
INFO [ init] initializing slots | tid="11244" timestamp=1746440937 n_slots=1
|
||
INFO [ init] new slot | tid="11244" timestamp=1746440937 id_slot=0 n_ctx_slot=4096
|
||
INFO [ main] model loaded | tid="11244" timestamp=1746440937
|
||
INFO [ main] chat template | tid="11244" timestamp=1746440937 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
|
||
INFO [ main] HTTP server listening | tid="11244" timestamp=1746440937 hostname="127.0.0.1" port="8080" n_threads_http="15"
|
||
INFO [ update_slots] all slots are idle | tid="11244" timestamp=1746440937
|
||
INFO [ launch_slot_with_task] slot is processing task | tid="11244" timestamp=1746440956 id_slot=0 id_task=0
|
||
INFO [ update_slots] kv cache rm [p0, end) | tid="11244" timestamp=1746440956 id_slot=0 id_task=0 p0=0
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release>
|
||
|
||
--------------------------------------------------------
|
||
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release> ./llama-server.exe -t 8 -c 4096 -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf -fa -rtr -fmoe
|
||
INFO [ main] build info | tid="12376" timestamp=1746441162 build=3667 commit="e3fec173"
|
||
INFO [ main] system info | tid="12376" timestamp=1746441162 n_threads=8 n_threads_batch=-1 total_threads=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "
|
||
llama_model_loader: loaded meta data with 35 key-value pairs and 579 tensors from F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest))
|
||
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
||
llama_model_loader: - kv 0: general.architecture str = qwen3moe
|
||
llama_model_loader: - kv 1: general.type str = model
|
||
llama_model_loader: - kv 2: general.name str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 3: general.basename str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 4: general.quantized_by str = Unsloth
|
||
llama_model_loader: - kv 5: general.size_label str = 30B-A3B
|
||
llama_model_loader: - kv 6: general.repo_url str = https://huggingface.co/unsloth
|
||
llama_model_loader: - kv 7: qwen3moe.block_count u32 = 48
|
||
llama_model_loader: - kv 8: qwen3moe.context_length u32 = 40960
|
||
llama_model_loader: - kv 9: qwen3moe.embedding_length u32 = 2048
|
||
llama_model_loader: - kv 10: qwen3moe.feed_forward_length u32 = 6144
|
||
llama_model_loader: - kv 11: qwen3moe.attention.head_count u32 = 32
|
||
llama_model_loader: - kv 12: qwen3moe.attention.head_count_kv u32 = 4
|
||
llama_model_loader: - kv 13: qwen3moe.rope.freq_base f32 = 1000000.000000
|
||
llama_model_loader: - kv 14: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0.000001
|
||
llama_model_loader: - kv 15: qwen3moe.expert_used_count u32 = 8
|
||
llama_model_loader: - kv 16: qwen3moe.attention.key_length u32 = 128
|
||
llama_model_loader: - kv 17: qwen3moe.attention.value_length u32 = 128
|
||
llama_model_loader: - kv 18: qwen3moe.expert_count u32 = 128
|
||
llama_model_loader: - kv 19: qwen3moe.expert_feed_forward_length u32 = 768
|
||
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
|
||
llama_model_loader: - kv 21: tokenizer.ggml.pre str = qwen2
|
||
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
||
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
||
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,151387] = ["─а ─а", "─а─а ─а─а", "i n", "─а t",...
|
||
llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 151645
|
||
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 151654
|
||
llama_model_loader: - kv 27: tokenizer.ggml.add_bos_token bool = false
|
||
llama_model_loader: - kv 28: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
|
||
llama_model_loader: - kv 29: general.quantization_version u32 = 2
|
||
llama_model_loader: - kv 30: general.file_type u32 = 17
|
||
llama_model_loader: - kv 31: quantize.imatrix.file str = Qwen3-30B-A3B-GGUF/imatrix_unsloth.dat
|
||
llama_model_loader: - kv 32: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-30B-A3B.txt
|
||
llama_model_loader: - kv 33: quantize.imatrix.entries_count i32 = 384
|
||
llama_model_loader: - kv 34: quantize.imatrix.chunks_count i32 = 32
|
||
llama_model_loader: - type f32: 241 tensors
|
||
llama_model_loader: - type q5_K: 289 tensors
|
||
llama_model_loader: - type q6_K: 49 tensors
|
||
llm_load_vocab: special tokens cache size = 26
|
||
llm_load_vocab: token to piece cache size = 0.9311 MB
|
||
llm_load_print_meta: format = GGUF V3 (latest)
|
||
llm_load_print_meta: arch = qwen3moe
|
||
llm_load_print_meta: vocab type = BPE
|
||
llm_load_print_meta: n_vocab = 151936
|
||
llm_load_print_meta: n_merges = 151387
|
||
llm_load_print_meta: vocab_only = 0
|
||
llm_load_print_meta: n_ctx_train = 40960
|
||
llm_load_print_meta: n_embd = 2048
|
||
llm_load_print_meta: n_layer = 48
|
||
llm_load_print_meta: n_head = 32
|
||
llm_load_print_meta: n_head_kv = 4
|
||
llm_load_print_meta: n_rot = 128
|
||
llm_load_print_meta: n_swa = 0
|
||
llm_load_print_meta: n_swa_pattern = 1
|
||
llm_load_print_meta: n_embd_head_k = 128
|
||
llm_load_print_meta: n_embd_head_v = 128
|
||
llm_load_print_meta: n_gqa = 8
|
||
llm_load_print_meta: n_embd_k_gqa = 512
|
||
llm_load_print_meta: n_embd_v_gqa = 512
|
||
llm_load_print_meta: f_norm_eps = 0.0e+00
|
||
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
||
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
||
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
||
llm_load_print_meta: f_logit_scale = 0.0e+00
|
||
llm_load_print_meta: n_ff = 6144
|
||
llm_load_print_meta: n_expert = 128
|
||
llm_load_print_meta: n_expert_used = 8
|
||
llm_load_print_meta: causal attn = 1
|
||
llm_load_print_meta: pooling type = 0
|
||
llm_load_print_meta: rope type = 2
|
||
llm_load_print_meta: rope scaling = linear
|
||
llm_load_print_meta: freq_base_train = 1000000.0
|
||
llm_load_print_meta: freq_scale_train = 1
|
||
llm_load_print_meta: n_ctx_orig_yarn = 40960
|
||
llm_load_print_meta: rope_finetuned = unknown
|
||
llm_load_print_meta: ssm_d_conv = 0
|
||
llm_load_print_meta: ssm_d_inner = 0
|
||
llm_load_print_meta: ssm_d_state = 0
|
||
llm_load_print_meta: ssm_dt_rank = 0
|
||
llm_load_print_meta: model type = ?B
|
||
llm_load_print_meta: model ftype = Q5_K - Medium
|
||
llm_load_print_meta: model params = 30.532 B
|
||
llm_load_print_meta: model size = 20.228 GiB (5.691 BPW)
|
||
llm_load_print_meta: repeating layers = 19.791 GiB (5.684 BPW, 29.910 B parameters)
|
||
llm_load_print_meta: general.name = Qwen3-30B-A3B
|
||
llm_load_print_meta: BOS token = 11 ','
|
||
llm_load_print_meta: EOS token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: PAD token = 151654 '<|vision_pad|>'
|
||
llm_load_print_meta: LF token = 148848 '├Д─м'
|
||
llm_load_print_meta: EOT token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: max token length = 256
|
||
llm_load_print_meta: n_ff_exp = 768
|
||
llm_load_tensors: ggml ctx size = 0.25 MiB
|
||
llm_load_tensors: CPU buffer size = 20713.44 MiB
|
||
...................................................................................................
|
||
============ Repacked 337 tensors
|
||
llama_new_context_with_model: n_ctx = 4096
|
||
llama_new_context_with_model: n_batch = 2048
|
||
llama_new_context_with_model: n_ubatch = 512
|
||
llama_new_context_with_model: flash_attn = 1
|
||
llama_new_context_with_model: mla_attn = 0
|
||
llama_new_context_with_model: attn_max_b = 0
|
||
llama_new_context_with_model: fused_moe = 1
|
||
llama_new_context_with_model: ser = -1, 0
|
||
llama_new_context_with_model: freq_base = 1000000.0
|
||
llama_new_context_with_model: freq_scale = 1
|
||
llama_kv_cache_init: CPU KV buffer size = 384.00 MiB
|
||
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
|
||
llama_new_context_with_model: CPU output buffer size = 1.16 MiB
|
||
llama_new_context_with_model: CPU compute buffer size = 300.75 MiB
|
||
llama_new_context_with_model: graph nodes = 1878
|
||
llama_new_context_with_model: graph splits = 1
|
||
INFO [ init] initializing slots | tid="12376" timestamp=1746441190 n_slots=1
|
||
INFO [ init] new slot | tid="12376" timestamp=1746441190 id_slot=0 n_ctx_slot=4096
|
||
INFO [ main] model loaded | tid="12376" timestamp=1746441190
|
||
INFO [ main] chat template | tid="12376" timestamp=1746441190 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
|
||
INFO [ main] HTTP server listening | tid="12376" timestamp=1746441190 hostname="127.0.0.1" port="8080" n_threads_http="15"
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441190
|
||
INFO [ launch_slot_with_task] slot is processing task | tid="12376" timestamp=1746441214 id_slot=0 id_task=0
|
||
INFO [ update_slots] kv cache rm [p0, end) | tid="12376" timestamp=1746441214 id_slot=0 id_task=0 p0=0
|
||
INFO [ print_timings] prompt eval time = 767.18 ms / 51 tokens ( 15.04 ms per token, 66.48 tokens per second) | tid="12376" timestamp=1746441236 id_slot=0 id_task=0 t_prompt_processing=767.178 n_prompt_tokens_processed=51 t_token=15.04270588235294 n_tokens_second=66.47740159389348
|
||
INFO [ print_timings] generation eval time = 21654.80 ms / 288 runs ( 75.19 ms per token, 13.30 tokens per second) | tid="12376" timestamp=1746441236 id_slot=0 id_task=0 t_token_generation=21654.802 n_decoded=288 t_token=75.19028472222222 n_tokens_second=13.299590548091828
|
||
INFO [ print_timings] total time = 22421.98 ms | tid="12376" timestamp=1746441236 id_slot=0 id_task=0 t_prompt_processing=767.178 t_token_generation=21654.802 t_total=22421.98
|
||
INFO [ update_slots] slot released | tid="12376" timestamp=1746441236 id_slot=0 id_task=0 n_ctx=4096 n_past=338 n_system_tokens=0 n_cache_tokens=338 truncated=false
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441236
|
||
INFO [ log_server_request] request | tid="21628" timestamp=1746441236 remote_addr="127.0.0.1" remote_port=65237 status=200 method="POST" path="/completion" params={}
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441236
|
||
INFO [ launch_slot_with_task] slot is processing task | tid="12376" timestamp=1746441247 id_slot=0 id_task=290
|
||
INFO [ update_slots] kv cache rm [p0, end) | tid="12376" timestamp=1746441247 id_slot=0 id_task=290 p0=50
|
||
INFO [ print_timings] prompt eval time = 4001.53 ms / 296 tokens ( 13.52 ms per token, 73.97 tokens per second) | tid="12376" timestamp=1746441271 id_slot=0 id_task=290 t_prompt_processing=4001.527 n_prompt_tokens_processed=296 t_token=13.518672297297297 n_tokens_second=73.9717612801313
|
||
INFO [ print_timings] generation eval time = 19925.00 ms / 245 runs ( 81.33 ms per token, 12.30 tokens per second) | tid="12376" timestamp=1746441271 id_slot=0 id_task=290 t_token_generation=19924.999 n_decoded=245 t_token=81.32652653061224 n_tokens_second=12.296111031172448
|
||
INFO [ print_timings] total time = 23926.53 ms | tid="12376" timestamp=1746441271 id_slot=0 id_task=290 t_prompt_processing=4001.527 t_token_generation=19924.999 t_total=23926.525999999998
|
||
INFO [ update_slots] slot released | tid="12376" timestamp=1746441271 id_slot=0 id_task=290 n_ctx=4096 n_past=590 n_system_tokens=0 n_cache_tokens=590 truncated=false
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441271
|
||
INFO [ log_server_request] request | tid="21948" timestamp=1746441271 remote_addr="127.0.0.1" remote_port=50253 status=200 method="POST" path="/completion" params={}
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441271
|
||
INFO [ launch_slot_with_task] slot is processing task | tid="12376" timestamp=1746441283 id_slot=0 id_task=537
|
||
INFO [ update_slots] kv cache rm [p0, end) | tid="12376" timestamp=1746441283 id_slot=0 id_task=537 p0=3
|
||
INFO [ print_timings] prompt eval time = 7425.26 ms / 523 tokens ( 14.20 ms per token, 70.44 tokens per second) | tid="12376" timestamp=1746441292 id_slot=0 id_task=537 t_prompt_processing=7425.256 n_prompt_tokens_processed=523 t_token=14.197430210325049 n_tokens_second=70.43528196199566
|
||
INFO [ print_timings] generation eval time = 1970.69 ms / 24 runs ( 82.11 ms per token, 12.18 tokens per second) | tid="12376" timestamp=1746441292 id_slot=0 id_task=537 t_token_generation=1970.687 n_decoded=24 t_token=82.11195833333333 n_tokens_second=12.178494098758453
|
||
INFO [ print_timings] total time = 9395.94 ms | tid="12376" timestamp=1746441292 id_slot=0 id_task=537 t_prompt_processing=7425.256 t_token_generation=1970.687 t_total=9395.943
|
||
INFO [ update_slots] slot released | tid="12376" timestamp=1746441292 id_slot=0 id_task=537 n_ctx=4096 n_past=549 n_system_tokens=0 n_cache_tokens=549 truncated=false
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441292
|
||
INFO [ log_server_request] request | tid="14164" timestamp=1746441292 remote_addr="127.0.0.1" remote_port=55394 status=200 method="POST" path="/completion" params={}
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441292
|
||
INFO [ log_server_request] request | tid="20768" timestamp=1746441292 remote_addr="127.0.0.1" remote_port=64794 status=200 method="POST" path="/tokenize" params={}
|
||
INFO [ log_server_request] request | tid="18372" timestamp=1746441301 remote_addr="127.0.0.1" remote_port=51189 status=404 method="GET" path="/models" params={}
|
||
INFO [ log_server_request] request | tid="18372" timestamp=1746441303 remote_addr="127.0.0.1" remote_port=51189 status=404 method="GET" path="/models" params={}
|
||
INFO [ launch_slot_with_task] slot is processing task | tid="12376" timestamp=1746441304 id_slot=0 id_task=563
|
||
INFO [ update_slots] kv cache rm [p0, end) | tid="12376" timestamp=1746441304 id_slot=0 id_task=563 p0=0
|
||
INFO [ print_timings] prompt eval time = 6708.66 ms / 512 tokens ( 13.10 ms per token, 76.32 tokens per second) | tid="12376" timestamp=1746441368 id_slot=0 id_task=563 t_prompt_processing=6708.662 n_prompt_tokens_processed=512 t_token=13.10285546875 n_tokens_second=76.3192421976245
|
||
INFO [ print_timings] generation eval time = 56613.50 ms / 647 runs ( 87.50 ms per token, 11.43 tokens per second) | tid="12376" timestamp=1746441368 id_slot=0 id_task=563 t_token_generation=56613.499 n_decoded=647 t_token=87.50154404945904 n_tokens_second=11.428369760364042
|
||
INFO [ print_timings] total time = 63322.16 ms | tid="12376" timestamp=1746441368 id_slot=0 id_task=563 t_prompt_processing=6708.662 t_token_generation=56613.499 t_total=63322.16100000001
|
||
INFO [ update_slots] slot released | tid="12376" timestamp=1746441368 id_slot=0 id_task=563 n_ctx=4096 n_past=1158 n_system_tokens=0 n_cache_tokens=0 truncated=false
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441368
|
||
INFO [ log_server_request] request | tid="18372" timestamp=1746441368 remote_addr="127.0.0.1" remote_port=51189 status=200 method="POST" path="/chat/completions" params={}
|
||
INFO [ update_slots] all slots are idle | tid="12376" timestamp=1746441368
|
||
|
||
---------------------------------------------------
|
||
|
||
PS C:\neuro\llama-avx2> ./llama-server.exe -t 8 -c 4096 -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf
|
||
build: 5273 (8ae5ebcf) with MSVC 19.43.34808.0 for x64
|
||
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
|
||
|
||
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
|
||
|
||
main: binding port with default address family
|
||
main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 15
|
||
main: loading model
|
||
srv load_model: loading model 'F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf'
|
||
llama_model_loader: loaded meta data with 35 key-value pairs and 579 tensors from F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest))
|
||
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
||
llama_model_loader: - kv 0: general.architecture str = qwen3moe
|
||
llama_model_loader: - kv 1: general.type str = model
|
||
llama_model_loader: - kv 2: general.name str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 3: general.basename str = Qwen3-30B-A3B
|
||
llama_model_loader: - kv 4: general.quantized_by str = Unsloth
|
||
llama_model_loader: - kv 5: general.size_label str = 30B-A3B
|
||
llama_model_loader: - kv 6: general.repo_url str = https://huggingface.co/unsloth
|
||
llama_model_loader: - kv 7: qwen3moe.block_count u32 = 48
|
||
llama_model_loader: - kv 8: qwen3moe.context_length u32 = 40960
|
||
llama_model_loader: - kv 9: qwen3moe.embedding_length u32 = 2048
|
||
llama_model_loader: - kv 10: qwen3moe.feed_forward_length u32 = 6144
|
||
llama_model_loader: - kv 11: qwen3moe.attention.head_count u32 = 32
|
||
llama_model_loader: - kv 12: qwen3moe.attention.head_count_kv u32 = 4
|
||
llama_model_loader: - kv 13: qwen3moe.rope.freq_base f32 = 1000000.000000
|
||
llama_model_loader: - kv 14: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0.000001
|
||
llama_model_loader: - kv 15: qwen3moe.expert_used_count u32 = 8
|
||
llama_model_loader: - kv 16: qwen3moe.attention.key_length u32 = 128
|
||
llama_model_loader: - kv 17: qwen3moe.attention.value_length u32 = 128
|
||
llama_model_loader: - kv 18: qwen3moe.expert_count u32 = 128
|
||
llama_model_loader: - kv 19: qwen3moe.expert_feed_forward_length u32 = 768
|
||
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
|
||
llama_model_loader: - kv 21: tokenizer.ggml.pre str = qwen2
|
||
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
||
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
||
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,151387] = ["─а ─а", "─а─а ─а─а", "i n", "─а t",...
|
||
llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 151645
|
||
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 151654
|
||
llama_model_loader: - kv 27: tokenizer.ggml.add_bos_token bool = false
|
||
llama_model_loader: - kv 28: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
|
||
llama_model_loader: - kv 29: general.quantization_version u32 = 2
|
||
llama_model_loader: - kv 30: general.file_type u32 = 17
|
||
llama_model_loader: - kv 31: quantize.imatrix.file str = Qwen3-30B-A3B-GGUF/imatrix_unsloth.dat
|
||
llama_model_loader: - kv 32: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-30B-A3B.txt
|
||
llama_model_loader: - kv 33: quantize.imatrix.entries_count i32 = 384
|
||
llama_model_loader: - kv 34: quantize.imatrix.chunks_count i32 = 32
|
||
llama_model_loader: - type f32: 241 tensors
|
||
llama_model_loader: - type q5_K: 289 tensors
|
||
llama_model_loader: - type q6_K: 49 tensors
|
||
print_info: file format = GGUF V3 (latest)
|
||
print_info: file type = Q5_K - Medium
|
||
print_info: file size = 20.23 GiB (5.69 BPW)
|
||
load: special tokens cache size = 26
|
||
load: token to piece cache size = 0.9311 MB
|
||
print_info: arch = qwen3moe
|
||
print_info: vocab_only = 0
|
||
print_info: n_ctx_train = 40960
|
||
print_info: n_embd = 2048
|
||
print_info: n_layer = 48
|
||
print_info: n_head = 32
|
||
print_info: n_head_kv = 4
|
||
print_info: n_rot = 128
|
||
print_info: n_swa = 0
|
||
print_info: n_swa_pattern = 1
|
||
print_info: n_embd_head_k = 128
|
||
print_info: n_embd_head_v = 128
|
||
print_info: n_gqa = 8
|
||
print_info: n_embd_k_gqa = 512
|
||
print_info: n_embd_v_gqa = 512
|
||
print_info: f_norm_eps = 0.0e+00
|
||
print_info: f_norm_rms_eps = 1.0e-06
|
||
print_info: f_clamp_kqv = 0.0e+00
|
||
print_info: f_max_alibi_bias = 0.0e+00
|
||
print_info: f_logit_scale = 0.0e+00
|
||
print_info: f_attn_scale = 0.0e+00
|
||
print_info: n_ff = 6144
|
||
print_info: n_expert = 128
|
||
print_info: n_expert_used = 8
|
||
print_info: causal attn = 1
|
||
print_info: pooling type = 0
|
||
print_info: rope type = 2
|
||
print_info: rope scaling = linear
|
||
print_info: freq_base_train = 1000000.0
|
||
print_info: freq_scale_train = 1
|
||
print_info: n_ctx_orig_yarn = 40960
|
||
print_info: rope_finetuned = unknown
|
||
print_info: ssm_d_conv = 0
|
||
print_info: ssm_d_inner = 0
|
||
print_info: ssm_d_state = 0
|
||
print_info: ssm_dt_rank = 0
|
||
print_info: ssm_dt_b_c_rms = 0
|
||
print_info: model type = 30B.A3B
|
||
print_info: model params = 30.53 B
|
||
print_info: general.name = Qwen3-30B-A3B
|
||
print_info: n_ff_exp = 768
|
||
print_info: vocab type = BPE
|
||
print_info: n_vocab = 151936
|
||
print_info: n_merges = 151387
|
||
print_info: BOS token = 11 ','
|
||
print_info: EOS token = 151645 '<|im_end|>'
|
||
print_info: EOT token = 151645 '<|im_end|>'
|
||
print_info: PAD token = 151654 '<|vision_pad|>'
|
||
print_info: LF token = 198 '─К'
|
||
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
|
||
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
|
||
print_info: FIM MID token = 151660 '<|fim_middle|>'
|
||
print_info: FIM PAD token = 151662 '<|fim_pad|>'
|
||
print_info: FIM REP token = 151663 '<|repo_name|>'
|
||
print_info: FIM SEP token = 151664 '<|file_sep|>'
|
||
print_info: EOG token = 151643 '<|endoftext|>'
|
||
print_info: EOG token = 151645 '<|im_end|>'
|
||
print_info: EOG token = 151662 '<|fim_pad|>'
|
||
print_info: EOG token = 151663 '<|repo_name|>'
|
||
print_info: EOG token = 151664 '<|file_sep|>'
|
||
print_info: max token length = 256
|
||
load_tensors: loading model tensors, this can take a while... (mmap = true)
|
||
load_tensors: offloading 0 repeating layers to GPU
|
||
load_tensors: offloaded 0/49 layers to GPU
|
||
load_tensors: CPU_Mapped model buffer size = 20713.44 MiB
|
||
...................................................................................................
|
||
llama_context: constructing llama_context
|
||
llama_context: n_seq_max = 1
|
||
llama_context: n_ctx = 4096
|
||
llama_context: n_ctx_per_seq = 4096
|
||
llama_context: n_batch = 2048
|
||
llama_context: n_ubatch = 512
|
||
llama_context: causal_attn = 1
|
||
llama_context: flash_attn = 0
|
||
llama_context: freq_base = 1000000.0
|
||
llama_context: freq_scale = 1
|
||
llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
|
||
llama_context: CPU output buffer size = 0.58 MiB
|
||
llama_kv_cache_unified: kv_size = 4096, type_k = 'f16', type_v = 'f16', n_layer = 48, can_shift = 1, padding = 32
|
||
llama_kv_cache_unified: CPU KV buffer size = 384.00 MiB
|
||
llama_kv_cache_unified: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
|
||
llama_context: CPU compute buffer size = 300.75 MiB
|
||
llama_context: graph nodes = 3126
|
||
llama_context: graph splits = 1
|
||
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
|
||
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
|
||
srv log_server_r: request: GET / 127.0.0.1 503
|
||
srv log_server_r: request: GET / 127.0.0.1 503
|
||
srv init: initializing slots, n_slots = 1
|
||
slot init: id 0 | task -1 | new slot n_ctx_slot = 4096
|
||
main: model loaded
|
||
main: chat template, chat_template: {%- if tools %}
|
||
{{- '<|im_start|>system\n' }}
|
||
{%- if messages[0].role == 'system' %}
|
||
{{- messages[0].content + '\n\n' }}
|
||
{%- endif %}
|
||
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||
{%- for tool in tools %}
|
||
{{- "\n" }}
|
||
{{- tool | tojson }}
|
||
{%- endfor %}
|
||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||
{%- else %}
|
||
{%- if messages[0].role == 'system' %}
|
||
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||
{%- endif %}
|
||
{%- endif %}
|
||
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||
{%- for forward_message in messages %}
|
||
{%- set index = (messages|length - 1) - loop.index0 %}
|
||
{%- set message = messages[index] %}
|
||
{%- set tool_start = '<tool_response>' %}
|
||
{%- set tool_start_length = tool_start|length %}
|
||
{%- set start_of_message = message.content[:tool_start_length] %}
|
||
{%- set tool_end = '</tool_response>' %}
|
||
{%- set tool_end_length = tool_end|length %}
|
||
{%- set start_pos = (message.content|length) - tool_end_length %}
|
||
{%- if start_pos < 0 %}
|
||
{%- set start_pos = 0 %}
|
||
{%- endif %}
|
||
{%- set end_of_message = message.content[start_pos:] %}
|
||
{%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
|
||
{%- set ns.multi_step_tool = false %}
|
||
{%- set ns.last_query_index = index %}
|
||
{%- endif %}
|
||
{%- endfor %}
|
||
{%- for message in messages %}
|
||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
|
||
{%- elif message.role == "assistant" %}
|
||
{%- set content = message.content %}
|
||
{%- set reasoning_content = '' %}
|
||
{%- if message.reasoning_content is defined and message.reasoning_content is not none %}
|
||
{%- set reasoning_content = message.reasoning_content %}
|
||
{%- else %}
|
||
{%- if '</think>' in message.content %}
|
||
{%- set content = (message.content.split('</think>')|last).lstrip('\n') %}
|
||
{%- set reasoning_content = (message.content.split('</think>')|first).rstrip('\n') %}
|
||
{%- set reasoning_content = (reasoning_content.split('<think>')|last).lstrip('\n') %}
|
||
{%- endif %}
|
||
{%- endif %}
|
||
{%- if loop.index0 > ns.last_query_index %}
|
||
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||
{%- else %}
|
||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||
{%- endif %}
|
||
{%- else %}
|
||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||
{%- endif %}
|
||
{%- if message.tool_calls %}
|
||
{%- for tool_call in message.tool_calls %}
|
||
{%- if (loop.first and content) or (not loop.first) %}
|
||
{{- '\n' }}
|
||
{%- endif %}
|
||
{%- if tool_call.function %}
|
||
{%- set tool_call = tool_call.function %}
|
||
{%- endif %}
|
||
{{- '<tool_call>\n{"name": "' }}
|
||
{{- tool_call.name }}
|
||
{{- '", "arguments": ' }}
|
||
{%- if tool_call.arguments is string %}
|
||
{{- tool_call.arguments }}
|
||
{%- else %}
|
||
{{- tool_call.arguments | tojson }}
|
||
{%- endif %}
|
||
{{- '}\n</tool_call>' }}
|
||
{%- endfor %}
|
||
{%- endif %}
|
||
{{- '<|im_end|>\n' }}
|
||
{%- elif message.role == "tool" %}
|
||
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||
{{- '<|im_start|>user' }}
|
||
{%- endif %}
|
||
{{- '\n<tool_response>\n' }}
|
||
{{- message.content }}
|
||
{{- '\n</tool_response>' }}
|
||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||
{{- '<|im_end|>\n' }}
|
||
{%- endif %}
|
||
{%- endif %}
|
||
{%- endfor %}
|
||
{%- if add_generation_prompt %}
|
||
{{- '<|im_start|>assistant\n' }}
|
||
{%- if enable_thinking is defined and enable_thinking is false %}
|
||
{{- '<think>\n\n</think>\n\n' }}
|
||
{%- endif %}
|
||
{%- endif %}, example_format: '<|im_start|>system
|
||
You are a helpful assistant<|im_end|>
|
||
<|im_start|>user
|
||
Hello<|im_end|>
|
||
<|im_start|>assistant
|
||
Hi there<|im_end|>
|
||
<|im_start|>user
|
||
How are you?<|im_end|>
|
||
<|im_start|>assistant
|
||
'
|
||
main: server is listening on http://127.0.0.1:8080 - starting the main loop
|
||
srv update_slots: all slots are idle
|
||
srv log_server_r: request: GET / 127.0.0.1 200
|
||
srv params_from_: Chat format: Content-only
|
||
slot launch_slot_: id 0 | task 0 | processing task
|
||
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 20
|
||
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
|
||
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 20, n_tokens = 20, progress = 1.000000
|
||
slot update_slots: id 0 | task 0 | prompt done, n_past = 20, n_tokens = 20
|
||
slot release: id 0 | task 0 | stop processing: n_past = 67, truncated = 0
|
||
slot print_timing: id 0 | task 0 |
|
||
prompt eval time = 713.89 ms / 20 tokens ( 35.69 ms per token, 28.02 tokens per second)
|
||
eval time = 3163.91 ms / 48 tokens ( 65.91 ms per token, 15.17 tokens per second)
|
||
total time = 3877.80 ms / 68 tokens
|
||
srv update_slots: all slots are idle
|
||
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **11:00:59**:<br>
|
||
|
||
So, with `-rtr -fa -fmoe` it works, but TG is slow (slower than `llama.cpp`). How much slower?
|
||
What about prompt processing, or when you have a few thousand tokens in the KV cache?
|
||
Is the `llama.cpp` build done with MSVC or with GCC/clang?
|
||
|
||
Without these flags it does not work. If you try `-rtr -fmoe` and `-fa -fmoe` separately, this will help me pinpoint the issue.
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **11:05:55**:<br>
|
||
|
||
The speeds are in my message above, it is of course long, but I tried to give all the information
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **11:15:26**:<br>
|
||
|
||
-fa -fmoe - works, but also pauses before displaying commas. The speed is also low
|
||
|
||
INFO [ print_timings] prompt eval time = 9586.72 ms / 512 tokens ( 18.72 ms per token, 53.41 tokens per second) | tid="16952" timestamp=1746443401 id_slot=0 id_task=354 t_prompt_processing=9586.721 n_prompt_tokens_processed=512 t_token=18.724064453125 n_tokens_second=53.407207740790625
|
||
INFO [ print_timings] generation eval time = 40935.66 ms / 426 runs ( 96.09 ms per token, 10.41 tokens per second) | tid="16952" timestamp=1746443401 id_slot=0 id_task=354 t_token_generation=40935.658 n_decoded=426 t_token=96.09309389671363 n_tokens_second=10.406575118445634
|
||
|
||
-rtr -fmoe - falling
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **11:15:51**:<br>
|
||
|
||
Ah, OK. I see
|
||
* `ik_llama.cpp`: PP = 76.3 t/s (512 tokens), TG = 11.4 t/s (647 tokens)
|
||
* `llama.cpp`: PP = 28.02 t/s (20 tokens), TG = 15.17 t/s (48 tokens)
|
||
|
||
Correct? I think it would be more fair to compare for the same (or at least similar) number of tokens generated and same number of tokens in the prompt.
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **11:35:12**:<br>
|
||
|
||
llama.cpp ~ 1000 - 500
|
||
prompt eval time = 35744.63 ms / 1053 tokens ( 33.95 ms per token, 29.46 tokens per second)
|
||
eval time = 33454.47 ms / 426 tokens ( 78.53 ms per token, 12.73 tokens per second)
|
||
|
||
ik_llama.cpp -fa -fmoe ~ 1000 - 500
|
||
|
||
INFO [ print_timings] prompt eval time = 20147.56 ms / 1057 tokens ( 19.06 ms per token, 52.46 tokens per second) | tid="5624" timestamp=1746444960 id_slot=0 id_task=0 t_prompt_processing=20147.559 n_prompt_tokens_processed=1057 t_token=19.06107757805109 n_tokens_second=52.46293111736265
|
||
INFO [ print_timings] generation eval time = 40472.90 ms / 422 runs ( 95.91 ms per token, 10.43 tokens per second) | tid="5624" timestamp=1746444960 id_slot=0 id_task=0 t_token_generation=40472.905 n_decoded=422 t_token=95.90735781990522 n_tokens_second=10.426728696642853
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **11:41:03**:<br>
|
||
|
||
OK, thanks. I'll look into the failure without flash attention.
|
||
|
||
> -fa -rtr -fmoe Finally it works, but I noticed that every time before writing a comma the generation stops for half a second.
|
||
|
||
Sorry for asking, but in what language is your conversation? I'm asking because a pause before a comma may indicate a performance issue in the token id -> utf-8 conversion code. I haven't looked at that since I forked `llama.cpp` last June, and they may have improved since then.
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **11:43:33**:<br>
|
||
|
||
This is a good question, I somehow didn't pay attention to what language the pauses in generation are in. Usually Russian, but also English. I'll check now. We need generation in English, right? Or is it important that the entire context is in one language?
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **11:46:02**:<br>
|
||
|
||
> Or is it important that the entire context is in one language?
|
||
|
||
I don't know. Just looking for clues what could be slowing it down.
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **11:54:19**:<br>
|
||
|
||
I launched it only in English and looked more closely, a pause in generation appears after or before the comma is displayed. It lasts a noticeable fraction of a second, and generation continues. Usually in such places - "Okay, the", "So, if", "than B, the"
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **11:56:28**:<br>
|
||
|
||
To avoid confusion, I checked in 2 frontends. I noticed pauses only on commas.
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **11:57:24**:<br>
|
||
|
||
Interesting. I don't observe such effects on my Linux box. Are the sampling parameters exactly the same?
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **12:01:40**:<br>
|
||
|
||
In the native front the servers are standard as far as I understand. I only changed the max tokens when measuring the speed. It didn't affect the pauses.
|
||
|
||

|
||
|
||

|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **12:16:22**:<br>
|
||
|
||
Maybe it's a compiler version? I don't know much, but as I understand it, a fresh one was used during assembly. I remember there were messages during assembly about changing the format of variables and that data loss could occur.
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **12:17:11**:<br>
|
||
|
||
For reference, here is what I get on my vanilla AVX2 Linux box using 8 threads with the commands
|
||
```
|
||
./bin/llama-sweep-bench -m Qwen_Qwen3-30B-A3B-Q5_K_M.gguf -c 4096 -t 8 -fa -ctk q8_0 -ctv q8_0 -rtr -fmoe
|
||
```
|
||
|
||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||
|-------|--------|--------|----------|----------|----------|----------|
|
||
| 512 | 128 | 0 | 3.081 | 166.16 | 5.223 | 24.51 |
|
||
| 512 | 128 | 512 | 3.331 | 153.69 | 5.502 | 23.26 |
|
||
| 512 | 128 | 1024 | 3.606 | 141.97 | 5.740 | 22.30 |
|
||
| 512 | 128 | 1536 | 3.873 | 132.20 | 5.984 | 21.39 |
|
||
| 512 | 128 | 2048 | 4.154 | 123.25 | 6.212 | 20.61 |
|
||
| 512 | 128 | 2560 | 4.419 | 115.87 | 6.443 | 19.87 |
|
||
| 512 | 128 | 3072 | 4.691 | 109.15 | 6.685 | 19.15 |
|
||
| 512 | 128 | 3584 | 4.959 | 103.26 | 6.906 | 18.54 |
|
||
|
||
The model is [this one from Bartowski](https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-GGUF/blob/main/Qwen_Qwen3-30B-A3B-Q5_K_M.gguf)
|
||
|
||
The CPU has a Zen3 core, so I'm not expecting it to be faster than a reasonably up-to-date AVX2 capable CPU.
|
||
|
||
In my case it also works without issues with just `-c 4096 -t 8`.
|
||
|
||
So, something goes seriously wrong with the Windows build.
|
||
|
||
Not sure how to debug. I don't have access to a Windows box.
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **12:23:26**:<br>
|
||
|
||
Got it. I'll try to figure out how and by how much to downgrade the compiler, maybe that will help. If not, I don't know what to do next, I'll run it with llama.cpp.
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **12:31:36**:<br>
|
||
|
||
You can try building with `GCC or clang`. I cannot give you instructions how one does that as it is a long time since I last did that, so I have forgotten. But IIRC, the GCC build was running ~40% faster than the MSVC build. It wasn't an LLM, but it did involve algorithms with heavy number crunching. It must have been around 2017-2018, so don't know if MSVC has improved since then.
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **12:33:50**:<br>
|
||
|
||
>Is the llama.cpp build done with MSVC or with GCC/clang?
|
||
|
||
I have written a script that downloads the latest official releases; I have never compiled such large projects myself before.
|
||
|
||
By the way, yes, we found the parameters under which it starts.
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release> .\llama-sweep-bench.exe -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf -c 4096 -t 8 -fa -fmoe
|
||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||
|-------|--------|--------|----------|----------|----------|----------|
|
||
| 512 | 128 | 0 | 9.384 | 54.56 | 8.596 | 14.89 |
|
||
| 512 | 128 | 512 | 10.704 | 47.83 | 8.700 | 14.71 |
|
||
| 512 | 128 | 1024 | 10.833 | 47.26 | 8.572 | 14.93 |
|
||
| 512 | 128 | 1536 | 11.697 | 43.77 | 8.849 | 14.47 |
|
||
| 512 | 128 | 2048 | 12.257 | 41.77 | 9.372 | 13.66 |
|
||
| 512 | 128 | 2560 | 13.290 | 38.53 | 9.859 | 12.98 |
|
||
| 512 | 128 | 3072 | 14.514 | 35.28 | 11.724 | 10.92 |
|
||
| 512 | 128 | 3584 | 14.406 | 35.54 | 10.795 | 11.86 |
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **12:33:50**:<br>
|
||
|
||
>Is the llama.cpp build done with MSVC or with GCC/clang?
|
||
I have written a script that downloads the latest official releases; I have never compiled such large projects myself before.
|
||
|
||
By the way, yes, we found the parameters under which it starts.
|
||
PS C:\neuro\ik_llama.cpp\build\bin\Release> .\llama-sweep-bench.exe -m F:\llm\Qwen3-30B-A3B-Q5_K_M.gguf -c 4096 -t 8 -fa -fmoe
|
||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||
|-------|--------|--------|----------|----------|----------|----------|
|
||
| 512 | 128 | 0 | 9.384 | 54.56 | 8.596 | 14.89 |
|
||
| 512 | 128 | 512 | 10.704 | 47.83 | 8.700 | 14.71 |
|
||
| 512 | 128 | 1024 | 10.833 | 47.26 | 8.572 | 14.93 |
|
||
| 512 | 128 | 1536 | 11.697 | 43.77 | 8.849 | 14.47 |
|
||
| 512 | 128 | 2048 | 12.257 | 41.77 | 9.372 | 13.66 |
|
||
| 512 | 128 | 2560 | 13.290 | 38.53 | 9.859 | 12.98 |
|
||
| 512 | 128 | 3072 | 14.514 | 35.28 | 11.724 | 10.92 |
|
||
| 512 | 128 | 3584 | 14.406 | 35.54 | 10.795 | 11.86 |
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **12:35:11**:<br>
|
||
|
||
Got it, I'll try it in the evening if I figure it out.
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-05** at **12:46:18**:<br>
|
||
|
||
You didn't say what your CPU was, so here another reference point from me on a more recent CPU (Ryzen-7950X). Again using 8 threads to be comparable to yours, same command as above:
|
||
|
||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||
|-------|--------|--------|----------|----------|----------|----------|
|
||
| 512 | 128 | 0 | 1.874 | 273.19 | 5.253 | 24.37 |
|
||
| 512 | 128 | 512 | 1.993 | 256.92 | 5.414 | 23.64 |
|
||
| 512 | 128 | 1024 | 2.131 | 240.24 | 5.523 | 23.17 |
|
||
| 512 | 128 | 1536 | 2.273 | 225.30 | 5.620 | 22.77 |
|
||
| 512 | 128 | 2048 | 2.417 | 211.83 | 5.721 | 22.37 |
|
||
| 512 | 128 | 2560 | 2.549 | 200.86 | 5.821 | 21.99 |
|
||
| 512 | 128 | 3072 | 2.688 | 190.46 | 5.925 | 21.60 |
|
||
| 512 | 128 | 3584 | 2.828 | 181.02 | 6.013 | 21.29 |
|
||
|
||
In comparison, mainline `llama.cpp` on the same computer (just pulled and rebuilt)
|
||
|
||
### With flash attention
|
||
|
||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||
|-------|--------|--------|----------|----------|----------|----------|
|
||
| 512 | 128 | 0 | 6.668 | 76.79 | 5.408 | 23.67 |
|
||
| 512 | 128 | 512 | 8.692 | 58.91 | 6.007 | 21.31 |
|
||
| 512 | 128 | 1024 | 10.831 | 47.27 | 6.781 | 18.88 |
|
||
| 512 | 128 | 1536 | 12.907 | 39.67 | 7.603 | 16.84 |
|
||
| 512 | 128 | 2048 | 14.947 | 34.26 | 8.544 | 14.98 |
|
||
| 512 | 128 | 2560 | 16.958 | 30.19 | 9.603 | 13.33 |
|
||
| 512 | 128 | 3072 | 19.009 | 26.93 | 10.614 | 12.06 |
|
||
| 512 | 128 | 3584 | 21.115 | 24.25 | 11.577 | 11.06 |
|
||
|
||
### Without flash attnetion
|
||
|
||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||
|-------|--------|--------|----------|----------|----------|----------|
|
||
| 512 | 128 | 0 | 6.246 | 81.98 | 5.522 | 23.18 |
|
||
| 512 | 128 | 512 | 6.696 | 76.46 | 5.781 | 22.14 |
|
||
| 512 | 128 | 1024 | 7.157 | 71.54 | 6.009 | 21.30 |
|
||
| 512 | 128 | 1536 | 7.639 | 67.02 | 6.207 | 20.62 |
|
||
| 512 | 128 | 2048 | 8.089 | 63.30 | 6.468 | 19.79 |
|
||
| 512 | 128 | 2560 | 8.577 | 59.70 | 6.708 | 19.08 |
|
||
| 512 | 128 | 3072 | 9.010 | 56.82 | 7.012 | 18.25 |
|
||
| 512 | 128 | 3584 | 9.498 | 53.91 | 7.144 | 17.92 |
|
||
|
||
---
|
||
|
||
👤 **intulint** commented the **2025-05-05** at **12:59:45**:<br>
|
||
|
||
Ah, indeed. This is an assembly on an old server processor 1660v4 with 4 memory channels, 32 GB in total. The speeds during generation are quite good, since the memory gives somewhere around 55 GB/s. Of course, this is not comparable with modern processors.
|
||
|
||
---
|
||
|
||
👤 **saood06** commented the **2025-05-05** at **22:30:50**:<br>
|
||
|
||
> You can try building with `GCC or clang`. I cannot give you instructions how one does that as it is a long time since I last did that, so I have forgotten.
|
||
|
||
The easiest way I found to use non MSVC to compile this on Windows was with https://github.com/skeeto/w64devkit but I don't use that as I can't compile there with CUDA (and my Nvidia GPU is the only advantage of my Windows machine), and it wasn't any faster on my machine even for CPU only from what I remember.
|
||
|
||
---
|
||
|
||
👤 **alex1284B** commented the **2025-05-14** at **16:37:33**:<br>
|
||
|
||
I think I have a similar problem, Qwen3 does not produce valid output after two lines of tokens, I tried different quantz IQ_K Q6, the same problems. But Qwen2.5 is fine. Base llama.cpp works fine also. Linux, only CPU.
|
||
I'm not sure but the line of samplers is different than base llama.cpp
|
||
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
|
||
vs
|
||
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
|
||
|
||
`ik_llama.cpp$ ./build/bin/llama-cli --color -m /home/ollama/models/gguf/Qwen3-30B-A3B-Q6_K_L.gguf --threads 12 --temp 0.6 --min-p 0 --top-k 20 --top-p 0.95 -p "<|im_start|>user\nA drinks machine offers three selections - Tea, Coffee or Random but the machine has been wired up wrongly so that each button does not give what it claims. If each drink costs 50p, how much minimum money do you have to put into the machine to work out which button gives which selection ?<|im_end|>\n<|im_start|>assistant\n"
|
||
Log start
|
||
main: build = 3693 (0435b68e)
|
||
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
|
||
main: seed = 1747238169
|
||
llama_model_loader: loaded meta data with 41 key-value pairs and 579 tensors from /home/ollama/models/gguf/Qwen3-30B-A3B-Q6_K_L.gguf (version GGUF V3 (latest))
|
||
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
||
llama_model_loader: - kv 0: general.architecture str = qwen3moe
|
||
llama_model_loader: - kv 1: general.type str = model
|
||
llama_model_loader: - kv 2: general.name str = Qwen3 30B A3B
|
||
llama_model_loader: - kv 3: general.basename str = Qwen3
|
||
llama_model_loader: - kv 4: general.size_label str = 30B-A3B
|
||
llama_model_loader: - kv 5: general.license str = apache-2.0
|
||
llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/Qwen3-30B...
|
||
llama_model_loader: - kv 7: general.base_model.count u32 = 1
|
||
llama_model_loader: - kv 8: general.base_model.0.name str = Qwen3 30B A3B Base
|
||
llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
|
||
llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-30B...
|
||
llama_model_loader: - kv 11: general.tags arr[str,1] = ["text-generation"]
|
||
llama_model_loader: - kv 12: qwen3moe.block_count u32 = 48
|
||
llama_model_loader: - kv 13: qwen3moe.context_length u32 = 32768
|
||
llama_model_loader: - kv 14: qwen3moe.embedding_length u32 = 2048
|
||
llama_model_loader: - kv 15: qwen3moe.feed_forward_length u32 = 6144
|
||
llama_model_loader: - kv 16: qwen3moe.attention.head_count u32 = 32
|
||
llama_model_loader: - kv 17: qwen3moe.attention.head_count_kv u32 = 4
|
||
llama_model_loader: - kv 18: qwen3moe.rope.freq_base f32 = 1000000,000000
|
||
llama_model_loader: - kv 19: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0,000001
|
||
llama_model_loader: - kv 20: qwen3moe.expert_used_count u32 = 8
|
||
llama_model_loader: - kv 21: qwen3moe.attention.key_length u32 = 128
|
||
llama_model_loader: - kv 22: qwen3moe.attention.value_length u32 = 128
|
||
llama_model_loader: - kv 23: qwen3moe.expert_count u32 = 128
|
||
llama_model_loader: - kv 24: qwen3moe.expert_feed_forward_length u32 = 768
|
||
llama_model_loader: - kv 25: tokenizer.ggml.model str = gpt2
|
||
llama_model_loader: - kv 26: tokenizer.ggml.pre str = qwen2
|
||
llama_model_loader: - kv 27: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
||
llama_model_loader: - kv 28: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
||
llama_model_loader: - kv 29: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
|
||
llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 151645
|
||
llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 151643
|
||
llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 151643
|
||
llama_model_loader: - kv 33: tokenizer.ggml.add_bos_token bool = false
|
||
llama_model_loader: - kv 34: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
|
||
llama_model_loader: - kv 35: general.quantization_version u32 = 2
|
||
llama_model_loader: - kv 36: general.file_type u32 = 18
|
||
llama_model_loader: - kv 37: quantize.imatrix.file str = /models_out/Qwen3-30B-A3B-GGUF/Qwen_Q...
|
||
llama_model_loader: - kv 38: quantize.imatrix.dataset str = /training_data/calibration_datav3.txt
|
||
llama_model_loader: - kv 39: quantize.imatrix.entries_count i32 = 384
|
||
llama_model_loader: - kv 40: quantize.imatrix.chunks_count i32 = 209
|
||
llama_model_loader: - type f32: 241 tensors
|
||
llama_model_loader: - type q8_0: 50 tensors
|
||
llama_model_loader: - type q6_K: 288 tensors
|
||
llm_load_vocab: special tokens cache size = 26
|
||
llm_load_vocab: token to piece cache size = 0,9311 MB
|
||
llm_load_print_meta: format = GGUF V3 (latest)
|
||
llm_load_print_meta: arch = qwen3moe
|
||
llm_load_print_meta: vocab type = BPE
|
||
llm_load_print_meta: n_vocab = 151936
|
||
llm_load_print_meta: n_merges = 151387
|
||
llm_load_print_meta: vocab_only = 0
|
||
llm_load_print_meta: n_ctx_train = 32768
|
||
llm_load_print_meta: n_embd = 2048
|
||
llm_load_print_meta: n_layer = 48
|
||
llm_load_print_meta: n_head = 32
|
||
llm_load_print_meta: n_head_kv = 4
|
||
llm_load_print_meta: n_rot = 128
|
||
llm_load_print_meta: n_swa = 0
|
||
llm_load_print_meta: n_swa_pattern = 1
|
||
llm_load_print_meta: n_embd_head_k = 128
|
||
llm_load_print_meta: n_embd_head_v = 128
|
||
llm_load_print_meta: n_gqa = 8
|
||
llm_load_print_meta: n_embd_k_gqa = 512
|
||
llm_load_print_meta: n_embd_v_gqa = 512
|
||
llm_load_print_meta: f_norm_eps = 0,0e+00
|
||
llm_load_print_meta: f_norm_rms_eps = 1,0e-06
|
||
llm_load_print_meta: f_clamp_kqv = 0,0e+00
|
||
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
|
||
llm_load_print_meta: f_logit_scale = 0,0e+00
|
||
llm_load_print_meta: n_ff = 6144
|
||
llm_load_print_meta: n_expert = 128
|
||
llm_load_print_meta: n_expert_used = 8
|
||
llm_load_print_meta: causal attn = 1
|
||
llm_load_print_meta: pooling type = 0
|
||
llm_load_print_meta: rope type = 2
|
||
llm_load_print_meta: rope scaling = linear
|
||
llm_load_print_meta: freq_base_train = 1000000,0
|
||
llm_load_print_meta: freq_scale_train = 1
|
||
llm_load_print_meta: n_ctx_orig_yarn = 32768
|
||
llm_load_print_meta: rope_finetuned = unknown
|
||
llm_load_print_meta: ssm_d_conv = 0
|
||
llm_load_print_meta: ssm_d_inner = 0
|
||
llm_load_print_meta: ssm_d_state = 0
|
||
llm_load_print_meta: ssm_dt_rank = 0
|
||
llm_load_print_meta: model type = ?B
|
||
llm_load_print_meta: model ftype = Q6_K
|
||
llm_load_print_meta: model params = 30,532 B
|
||
llm_load_print_meta: model size = 23,515 GiB (6,616 BPW)
|
||
llm_load_print_meta: repeating layers = 22,900 GiB (6,577 BPW, 29,910 B parameters)
|
||
llm_load_print_meta: general.name = Qwen3 30B A3B
|
||
llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
|
||
llm_load_print_meta: EOS token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
|
||
llm_load_print_meta: LF token = 148848 'ÄĬ'
|
||
llm_load_print_meta: EOT token = 151645 '<|im_end|>'
|
||
llm_load_print_meta: max token length = 256
|
||
llm_load_print_meta: n_ff_exp = 768
|
||
llm_load_tensors: ggml ctx size = 0,25 MiB
|
||
llm_load_tensors: CPU buffer size = 24079,77 MiB
|
||
....................................................................................................
|
||
llama_new_context_with_model: n_ctx = 32768
|
||
llama_new_context_with_model: n_batch = 2048
|
||
llama_new_context_with_model: n_ubatch = 512
|
||
llama_new_context_with_model: flash_attn = 0
|
||
llama_new_context_with_model: mla_attn = 0
|
||
llama_new_context_with_model: attn_max_b = 0
|
||
llama_new_context_with_model: fused_moe = 0
|
||
llama_new_context_with_model: ser = -1, 0
|
||
llama_new_context_with_model: freq_base = 1000000,0
|
||
llama_new_context_with_model: freq_scale = 1
|
||
llama_kv_cache_init: CPU KV buffer size = 3072,00 MiB
|
||
llama_new_context_with_model: KV self size = 3072,00 MiB, K (f16): 1536,00 MiB, V (f16): 1536,00 MiB
|
||
llama_new_context_with_model: CPU output buffer size = 0,58 MiB
|
||
llama_new_context_with_model: CPU compute buffer size = 2136,01 MiB
|
||
llama_new_context_with_model: graph nodes = 2165
|
||
llama_new_context_with_model: graph splits = 1
|
||
|
||
system_info: n_threads = 12 / 24 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
|
||
sampling:
|
||
repeat_last_n = 64, repeat_penalty = 1,000, frequency_penalty = 0,000, presence_penalty = 0,000
|
||
top_k = 20, tfs_z = 1,000, top_p = 0,950, min_p = 0,000, typical_p = 1,000, temp = 0,600
|
||
mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
|
||
sampling order:
|
||
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
|
||
generate: n_ctx = 32768, n_batch = 2048, n_predict = -1, n_keep = 0
|
||
|
||
|
||
user
|
||
A drinks machine offers three selections - Tea, Coffee or Random but the machine has been wired up wrongly so that each button does not give what it claims. If each drink costs 50p, how much minimum money do you have to put into the machine to work out which button gives which selection ?
|
||
assistant
|
||
<think>
|
||
Okay, so there's this drinks machine with three buttons: Tea, Coffee, and Random. But the problem is, each button is wired up incorrectly. That means if you press Tea, it's not going to give Tea; same with Coffee. And Random is also not giving a random selection. So, the challenge is to figure out how much money you need to put in to determine which button actually gives which drink. Each drink costs 50p, so we need to find the minimum amount required.
|
||
|
||
First, let me try to understand the problem better. The machine has three buttons, each labeled incorrectly. So, the Tea button doesn't give Tea, Coffee doesn't give Coffee, and Random doesn't give a random selection. Wait, but what does. So… well... okay. [. So. So,. Wait, well? Wait,. Wait, but then. Wait,. Wait, but now. Wait, So,,. Wait, Hey,. But... So, let me.\? So. So,. So, Also,ab. But the.\ the.\. Let of, Wait,. Hmm. Let is the \,... So,. Let is probably. So, let. Let, let, actually., and go.,.\,.\,,.\, But, wait,… So,\n). So,... etc. So, \ If,… but… but I'm, \ is the the same thing., \, \ The is the the you.,., \ But, \ So, \, \, \. So, \,.\a. So,!\n't sure, \, \ But, \,.\ but the same. the question. So, \ is the problem. So, \ I'm,.\
|
||
|
||
But,.\, \ So, \ you can you can you have to figure out of, \ the same, \, \, \, \ but there's a lot. So, \, \ Let you get the problem. So, \ I think that's, \, \ but I think that is the problem. So,.\
|
||
|
||
But, \. So, \ the question. you are you can you can you know,, \, \ but I'm. But, \,.\ The problem is the answer the problem. But, \ I'm the answer the previous. but I'm. But, so you need to be careful, but I am I'm a problem. But, the problem. I have to see, I'm, I'm a bit, I know, but the actual, but, that. Then, but, but I'm a lot. So, but, but I don. So, but, I don! So, but I'm not, but, but, but, the number. But, but I can it's a bit, let me, I don… let me. So, but I can you need to see, in your answer. Let, but, but I need to the, but, but, I don, so, but, but I think it's a bit, but I think I have to be, I'm just that's not, and! So, in my, I have to be you know, but I can you need to solve this is a bit of \ what's the
|
||
|
||
Okay! It to be, I have to see, etc, I'm, you are you, andab, and, etc, I'll, I'm not. So, and \ I can you are you, I need to the other than, but I can you, I know, I need to make. But, I don, I think of. But, but, I have to make it's, you, I can I can I'm not, the the to me have you, but, I don… I think, I don… but, I am the which, I have to see, I'm going to be it's, I'm a person, I've been, no, I think. For, but I'm. If, I'm. I'm, the all, that, I'm just to be, I think I don. I don the the same, I will, but I am it's a new. But, but, I'm, or, but, but, but, but, but, but, but. But, but I don, I have been confused. So, but, no, in this is the same, I don? But, I think, but, I think, but, you can't, I want to you. That, but this is, but I can you, I mean, I need to. So, but, but the same, I'm, I’m, but, I can you, I'm on the, I'm just, I can I know, I'm in the, I have to me you, but, but, but I'm not. I don… but, I need to be. I need to know, the question, I think, but, but, but I have to say, I'm not to the only, but, no, I think, I'm going to think, but that, you, and and I'm, but, I have you! It, I think, that. I can you \ I was a) the question, I need, is, or, I have the. The problem.
|
||
|
||
The thing, but, it have to be, I was a lot, but, I know how is the way, but, but, I have to see, I’m not, I think. But, and! Let, I have you! I will be it's. It, but, I, and, I want to be, I don, I'm. I'm, I need to the problem. It, that.
|
||
|
||
|
||
|
||
I need to have you… I have to make, but, and. I need to. So, but, but, if, I'm going to be, but, I have, or, I think about, that, but, I have to get, but, I'm, that, but, and
|
||
, and! I'm, I need to be, I just, I need to the, but, that, but, but, that, I don, I think, but, I don! I'm, in this is a very, the, what is that, you. I'm not. But, I was, I think that's a lot, the, that, I'm going to be the, but, it, I need to say, I'm, and. So, but, I'm, but, I have to be, I am, but, is a problem, I need. I’m in the problem, that, you! I think, I'm, I am, but it, I'm not, I think, if I, in the, in the, that, and, but the, but, I can't. But, I, I'm trying
|
||
|
||
llama_print_timings: load time = 1206,27 ms
|
||
llama_print_timings: sample time = 49,64 ms / 1459 runs ( 0,03 ms per token, 29392,21 tokens per second)
|
||
llama_print_timings: prompt eval time = 337,36 ms / 69 tokens ( 4,89 ms per token, 204,53 tokens per second)
|
||
llama_print_timings: eval time = 60951,79 ms / 1458 runs ( 41,81 ms per token, 23,92 tokens per second)
|
||
llama_print_timings: total time = 61937,29 ms / 1527 tokens`
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-14** at **16:57:33**:<br>
|
||
|
||
@alex1284B
|
||
|
||
I tried your prompt and I see that it does not work. But of you add `-fa -fmoe`, then it works. Please create a separate issue for this. Thanks.
|
||
|
||
---
|
||
|
||
👤 **alex1284B** commented the **2025-05-14** at **17:23:47**:<br>
|
||
|
||
Thank you, I probably missed these options for starting. My bad.
|
||
|
||
---
|
||
|
||
👤 **ikawrakow** commented the **2025-05-25** at **07:10:25**:<br>
|
||
|
||
Closed via #420 |