mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 17:20:01 +00:00
* Add RPC backend in device list to override tensors. * rpc : prevent crashes on invalid input (#9040) Add more checks which prevent RPC server from crashing if invalid input is received from client # Conflicts: # ggml/src/ggml-rpc.cpp * rpc : print error message when failed to connect endpoint (#9042) * Fix RPC error * Add vulkan, sycl to rpc backend * add thread in rpc cpu backend * add cache folder and other improvement in rpc * add header file * support for models with non-512 aligned tensors * rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943) RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency. # Conflicts: # ggml/src/ggml-rpc.cpp * fix(rpc): Improve input validation and error handling (#13069) * fix(rpc): Improve input validation and error handling The `rpc-server` was vulnerable to Denial of Service attacks via several RPC commands (`SET_TENSOR`, `GRAPH_COMPUTE`, etc.). Malformed messages could trigger failed assertions (e.g., invalid `ggml_type`) or out-of-bounds reads/writes leading to `GGML_ABORT` calls, crashing the server process. This PR introduces robust input validation and replaces `abort()` calls with graceful error handling: - **Type Validation:** `deserialize_tensor` now checks if the `tensor->type` is within the valid `GGML_TYPE_COUNT` range *before* calling `ggml_new_tensor_4d`. Returns `nullptr` on invalid type. - **Bounds Checks:** Replaced `GGML_ABORT` in `set_tensor`, `set_tensor_hash`, and `get_tensor` handlers with error logging and returning `false` when data/offset parameters are out of buffer bounds. - **Size Checks:** Added safe arithmetic checks (for overflow) in `graph_compute` when calculating required message sizes based on client-provided `n_nodes` and `n_tensors`. Returns early if the reported sizes conflict with the actual message size or would lead to overflow. - **Error Propagation:** - `create_node` now checks for `nullptr` return values from `deserialize_tensor` and its recursive calls, propagating `nullptr` upwards on failure. Uses `find` instead of `at` for safer map access. - `copy_tensor` now checks for `nullptr` from `deserialize_tensor` and sets the response status to failure if deserialization or bounds checks fail. - `graph_compute` now checks for `nullptr` return from `create_node` and returns failure status correctly. The final return value now reflects the actual computation status. These changes improve the RPC server's resilience against malformed client requests, preventing crashes and ensuring errors are handled more gracefully. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * refactor(rpc): address pr comments removed comments and unnecessary returns Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * refactor(rpc): ambiguous nullptr from create_node rpc_server::create_node could previously return nullptr if the input ID was 0 (valid) or if an internal error (deserialization, recursion failure) occurred (invalid). This ambiguity made error handling difficult for the caller (`graph_compute`). This commit clarifies the meaning of nullptr: - `graph_compute` now checks if the input 'id' was non-zero when `create_node` returns nullptr, correctly identifying failures versus intentional null links. - `create_node` avoids recursive calls for zero IDs and propagates nullptr unambiguously on failure during recursion. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * refactor(rpc): initial zero check in create_node The caller (`graph_compute`) already checks `id != 0` when handling a `nullptr` return from `create_node`, correctly distinguishing intentional null links from actual errors. This makes the initial `if (id == 0)` check redundant. Also removes the log message when a tensor ID is not found in the provided map which was added in this branch. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * fix(rpc): Handle get_alloc_size failure in server Check the return value of `server.get_alloc_size` in the RPC server loop. If the call fails, return early to close the connection. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * refactor(rpc): input size validation in graph_compute Removes detailed, step-by-step size calculations and overflow checks in favor of simpler direct comparisons, assuming 64-bit overflow is unlikely. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * refactor(rpc): remove extra status code setting Removes the explicit setting of `response.result = GGML_STATUS_FAILED` when `create_node` returns `nullptr` within `graph_compute`. Primary signal is the `false` return value in case of failure. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * refactor(rpc): remove redundant check for tensor->type Breaks CI on ubuntu-cpu-make. Tensor type is uint32_t, thus the check is not needed. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> --------- Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> # Conflicts: # ggml/src/ggml-rpc.cpp * rpc : fix cache directory initialization (#13188) Signed-off-by: xiaofei <hbuxiaofei@gmail.com> # Conflicts: # examples/rpc/rpc-server.cpp * rpc : avoid uninitialized memory in serialize_tensor (#13210) Zero out the name and padding buffers. * fix merge error * Add hello command in RPC * bug fix * add rpc header * fix bug for missing rpc names * add tpc no delay for rpc * add back webui --------- Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> Signed-off-by: xiaofei <hbuxiaofei@gmail.com> Co-authored-by: firecoperana <firecoperana> Co-authored-by: Radoslav Gerganov <rgerganov@gmail.com> Co-authored-by: matt23456 <matt23456> Co-authored-by: Ville Vesilehto <ville@vesilehto.fi> Co-authored-by: xiaofei <hbuxiaofei@gmail.com> Co-authored-by: Justin Santa Barbara <justinsb@google.com>
244 lines
7.4 KiB
CMake
244 lines
7.4 KiB
CMake
cmake_minimum_required(VERSION 3.14) # for add_link_options and implicit target directories.
|
|
project("llama.cpp" C CXX)
|
|
include(CheckIncludeFileCXX)
|
|
|
|
#set(CMAKE_WARN_DEPRECATED YES)
|
|
set(CMAKE_WARN_UNUSED_CLI YES)
|
|
|
|
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
|
|
set(CMAKE_CXX_STANDARD 17)
|
|
set(CMAKE_CXX_STANDARD_REQUIRED true)
|
|
|
|
set(CMAKE_CUDA_USE_RESPONSE_FILE_FOR_INCLUDES 0)
|
|
set(CMAKE_CUDA_USE_RESPONSE_FILE_FOR_LIBRARIES 0)
|
|
set(CMAKE_CUDA_USE_RESPONSE_FILE_FOR_OBJECTS 0)
|
|
|
|
if (NOT XCODE AND NOT MSVC AND NOT CMAKE_BUILD_TYPE)
|
|
set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
|
|
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "Debug" "Release" "MinSizeRel" "RelWithDebInfo")
|
|
endif()
|
|
|
|
# Add path to modules
|
|
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")
|
|
|
|
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
|
|
|
|
if (CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
|
|
set(LLAMA_STANDALONE ON)
|
|
|
|
include(git-vars)
|
|
|
|
# configure project version
|
|
# TODO
|
|
else()
|
|
set(LLAMA_STANDALONE OFF)
|
|
endif()
|
|
|
|
if (EMSCRIPTEN)
|
|
set(BUILD_SHARED_LIBS_DEFAULT OFF)
|
|
|
|
option(LLAMA_WASM_SINGLE_FILE "llama: embed WASM inside the generated llama.js" ON)
|
|
else()
|
|
if (MINGW)
|
|
set(BUILD_SHARED_LIBS_DEFAULT OFF)
|
|
else()
|
|
set(BUILD_SHARED_LIBS_DEFAULT ON)
|
|
endif()
|
|
endif()
|
|
|
|
option(BUILD_SHARED_LIBS "build shared libraries" ${BUILD_SHARED_LIBS_DEFAULT})
|
|
|
|
if (WIN32)
|
|
add_compile_definitions(_CRT_SECURE_NO_WARNINGS)
|
|
endif()
|
|
|
|
# force MSVC compiler charset to utf-8
|
|
if (MSVC)
|
|
add_compile_options("$<$<COMPILE_LANGUAGE:C>:/utf-8>")
|
|
add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/utf-8>")
|
|
add_compile_options("$<$<COMPILE_LANGUAGE:C>:/bigobj>")
|
|
add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/bigobj>")
|
|
endif()
|
|
|
|
#
|
|
# option list
|
|
#
|
|
|
|
# debug
|
|
option(LLAMA_ALL_WARNINGS "llama: enable all compiler warnings" ON)
|
|
option(LLAMA_ALL_WARNINGS_3RD_PARTY "llama: enable all compiler warnings in 3rd party libs" OFF)
|
|
|
|
# build
|
|
option(LLAMA_FATAL_WARNINGS "llama: enable -Werror flag" OFF)
|
|
|
|
# sanitizers
|
|
option(LLAMA_SANITIZE_THREAD "llama: enable thread sanitizer" OFF)
|
|
option(LLAMA_SANITIZE_ADDRESS "llama: enable address sanitizer" OFF)
|
|
option(LLAMA_SANITIZE_UNDEFINED "llama: enable undefined sanitizer" OFF)
|
|
|
|
# extra artifacts
|
|
option(LLAMA_BUILD_TESTS "llama: build tests" ${LLAMA_STANDALONE})
|
|
option(LLAMA_BUILD_EXAMPLES "llama: build examples" ${LLAMA_STANDALONE})
|
|
option(LLAMA_BUILD_SERVER "llama: build server example" ${LLAMA_STANDALONE})
|
|
|
|
# 3rd party libs
|
|
option(LLAMA_CURL "llama: use libcurl to download model from an URL" OFF)
|
|
|
|
# Required for relocatable CMake package
|
|
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/build-info.cmake)
|
|
|
|
# override ggml options
|
|
set(GGML_SANITIZE_THREAD ${LLAMA_SANITIZE_THREAD})
|
|
set(GGML_SANITIZE_ADDRESS ${LLAMA_SANITIZE_ADDRESS})
|
|
set(GGML_SANITIZE_UNDEFINED ${LLAMA_SANITIZE_UNDEFINED})
|
|
set(GGML_ALL_WARNINGS ${LLAMA_ALL_WARNINGS})
|
|
set(GGML_FATAL_WARNINGS ${LLAMA_FATAL_WARNINGS})
|
|
|
|
# change the default for these ggml options
|
|
if (NOT DEFINED GGML_LLAMAFILE)
|
|
set(GGML_LLAMAFILE ON)
|
|
endif()
|
|
|
|
if (NOT DEFINED GGML_CUDA_USE_GRAPHS)
|
|
set(GGML_CUDA_USE_GRAPHS ON)
|
|
endif()
|
|
|
|
# transition helpers
|
|
function (llama_option_depr TYPE OLD NEW)
|
|
if (${OLD})
|
|
message(${TYPE} "${OLD} is deprecated and will be removed in the future.\nUse ${NEW} instead\n")
|
|
set(${NEW} ON PARENT_SCOPE)
|
|
endif()
|
|
endfunction()
|
|
|
|
llama_option_depr(FATAL_ERROR LLAMA_CUBLAS GGML_CUDA)
|
|
llama_option_depr(WARNING LLAMA_CUDA GGML_CUDA)
|
|
llama_option_depr(WARNING LLAMA_KOMPUTE GGML_KOMPUTE)
|
|
llama_option_depr(WARNING LLAMA_METAL GGML_METAL)
|
|
llama_option_depr(WARNING LLAMA_METAL_EMBED_LIBRARY GGML_METAL_EMBED_LIBRARY)
|
|
llama_option_depr(WARNING LLAMA_NATIVE GGML_NATIVE)
|
|
llama_option_depr(WARNING LLAMA_RPC GGML_RPC)
|
|
llama_option_depr(WARNING LLAMA_SYCL GGML_SYCL)
|
|
llama_option_depr(WARNING LLAMA_SYCL_F16 GGML_SYCL_F16)
|
|
llama_option_depr(WARNING LLAMA_CANN GGML_CANN)
|
|
|
|
if (NOT MSVC)
|
|
if (LLAMA_SANITIZE_THREAD)
|
|
message(STATUS "Using -fsanitize=thread")
|
|
|
|
add_compile_options(-fsanitize=thread)
|
|
link_libraries (-fsanitize=thread)
|
|
endif()
|
|
|
|
if (LLAMA_SANITIZE_ADDRESS)
|
|
message(STATUS "Using -fsanitize=address")
|
|
|
|
add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
|
|
link_libraries (-fsanitize=address)
|
|
endif()
|
|
|
|
if (LLAMA_SANITIZE_UNDEFINED)
|
|
message(STATUS "Using -fsanitize=undefined")
|
|
|
|
add_compile_options(-fsanitize=undefined)
|
|
link_libraries (-fsanitize=undefined)
|
|
endif()
|
|
endif()
|
|
|
|
#
|
|
# build the library
|
|
#
|
|
|
|
if (NOT TARGET ggml)
|
|
add_subdirectory(ggml)
|
|
# ... otherwise assume ggml is added by a parent CMakeLists.txt
|
|
endif()
|
|
|
|
#
|
|
# build the library
|
|
#
|
|
|
|
add_subdirectory(src)
|
|
|
|
#
|
|
# install
|
|
#
|
|
|
|
include(GNUInstallDirs)
|
|
include(CMakePackageConfigHelpers)
|
|
|
|
set(LLAMA_BUILD_NUMBER ${BUILD_NUMBER})
|
|
set(LLAMA_BUILD_COMMIT ${BUILD_COMMIT})
|
|
set(LLAMA_INSTALL_VERSION 0.0.${BUILD_NUMBER})
|
|
|
|
set(LLAMA_INCLUDE_INSTALL_DIR ${CMAKE_INSTALL_INCLUDEDIR} CACHE PATH "Location of header files")
|
|
set(LLAMA_LIB_INSTALL_DIR ${CMAKE_INSTALL_LIBDIR} CACHE PATH "Location of library files")
|
|
set(LLAMA_BIN_INSTALL_DIR ${CMAKE_INSTALL_BINDIR} CACHE PATH "Location of binary files")
|
|
|
|
|
|
# At the moment some compile definitions are placed within the ggml/src
|
|
# directory but not exported on the `ggml` target. This could be improved by
|
|
# determining _precisely_ which defines are necessary for the llama-config
|
|
# package.
|
|
#
|
|
get_target_property(GGML_DIRECTORY ggml SOURCE_DIR)
|
|
get_directory_property(GGML_DIR_DEFINES DIRECTORY ${GGML_DIRECTORY} COMPILE_DEFINITIONS)
|
|
get_target_property(GGML_TARGET_DEFINES ggml COMPILE_DEFINITIONS)
|
|
set(GGML_TRANSIENT_DEFINES ${GGML_TARGET_DEFINES} ${GGML_DIR_DEFINES})
|
|
get_target_property(GGML_LINK_LIBRARIES ggml LINK_LIBRARIES)
|
|
|
|
set_target_properties(llama PROPERTIES PUBLIC_HEADER ${CMAKE_CURRENT_SOURCE_DIR}/include/llama.h)
|
|
install(TARGETS llama LIBRARY PUBLIC_HEADER)
|
|
|
|
configure_package_config_file(
|
|
${CMAKE_CURRENT_SOURCE_DIR}/cmake/llama-config.cmake.in
|
|
${CMAKE_CURRENT_BINARY_DIR}/llama-config.cmake
|
|
INSTALL_DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/llama
|
|
PATH_VARS LLAMA_INCLUDE_INSTALL_DIR
|
|
LLAMA_LIB_INSTALL_DIR
|
|
LLAMA_BIN_INSTALL_DIR )
|
|
|
|
write_basic_package_version_file(
|
|
${CMAKE_CURRENT_BINARY_DIR}/llama-version.cmake
|
|
VERSION ${LLAMA_INSTALL_VERSION}
|
|
COMPATIBILITY SameMajorVersion)
|
|
|
|
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/llama-config.cmake
|
|
${CMAKE_CURRENT_BINARY_DIR}/llama-version.cmake
|
|
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/llama)
|
|
|
|
install(
|
|
FILES convert_hf_to_gguf.py
|
|
PERMISSIONS
|
|
OWNER_READ
|
|
OWNER_WRITE
|
|
OWNER_EXECUTE
|
|
GROUP_READ
|
|
GROUP_EXECUTE
|
|
WORLD_READ
|
|
WORLD_EXECUTE
|
|
DESTINATION ${CMAKE_INSTALL_BINDIR})
|
|
|
|
configure_file(cmake/llama.pc.in
|
|
"${CMAKE_CURRENT_BINARY_DIR}/llama.pc"
|
|
@ONLY)
|
|
|
|
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/llama.pc"
|
|
DESTINATION lib/pkgconfig)
|
|
|
|
#
|
|
# programs, examples and tests
|
|
#
|
|
|
|
add_subdirectory(common)
|
|
|
|
if (LLAMA_BUILD_TESTS AND NOT CMAKE_JS_VERSION)
|
|
include(CTest)
|
|
add_subdirectory(tests)
|
|
endif ()
|
|
|
|
if (LLAMA_BUILD_EXAMPLES)
|
|
add_subdirectory(examples)
|
|
add_subdirectory(pocs)
|
|
endif()
|