ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Author	SHA1	Message	Date
Brian	ea1bb2b82b	github: move PR template to .github/ root (#7868 )	2024-06-11 17:43:41 +03:00
slaren	e4e6f9abea	fix CUDA CI by using a windows-2019 image (#7861 ) * try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019	2024-06-11 08:59:20 +03:00
slaren	27d373a411	ci : try win-2019 on server windows test (#7854 )	2024-06-10 15:18:41 +03:00
Nicolás Pérez	f6bbf78d23	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
Georgi Gerganov	8de006f83e	ggml : remove OpenCL (#7735 ) ggml-ci	2024-06-04 21:23:20 +03:00
Masaya, Kato	2c040f0269	ggml : use OpenMP as a thread pool (#7606 ) * ggml: Added OpenMP for multi-threads processing * ggml : Limit the number of threads used to avoid deadlock * update shared state n_threads in parallel region * clear numa affinity for main thread even with openmp * enable openmp by default * fix msvc build * disable openmp on macos * ci : disable openmp with thread sanitizer * Update ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-03 17:14:15 +02:00
Brian	0a83967e79	github: add contact links to issues and convert question into research [no ci] (#7612 )	2024-05-30 21:55:36 +10:00
Meng, Hengyu	991a5632cd	[SYCL] fix intel docker (#7630 ) * Update main-intel.Dockerfile * workaround for https://github.com/intel/oneapi-containers/issues/70 * reset intel docker in CI * add missed in server	2024-05-30 16:19:08 +10:00
Brian	d42f92b628	github: add refactor to issue template (#7561 ) * github: add refactor issue template [no ci] * Update 07-refactor.yml	2024-05-28 20:27:27 +10:00
Brian	11a3ec860b	github: add self sorted issue ticket forms (#7543 ) * github: add self sorted issue ticket forms [no ci] * github: consolidate BSD in bug issue ticket * github: remove contact from bug ticket template [no ci] * github: remove bios from os dropdown in bug report [no ci]	2024-05-27 10:54:30 +10:00
Brian	39c7118283	labeler: added Apple Metal detector (+Kompute) (#7529 ) * labeler: added Apple Metal detector [no ci] * labeler: add Kompute to detector [no ci]	2024-05-25 19:30:42 +10:00
Brian	7929f977ac	docker.yml: disable light-intel and server-intel test (#7515 ) * docker.yml: disable light-intel test * docker.yml: disable server-intel test	2024-05-24 23:47:56 +10:00
Brian	75e5b1388c	labeler.yml: add embedding label detector [no ci] (#7482 )	2024-05-23 17:40:43 +10:00
Georgi Gerganov	c37774e1ef	build : remove zig (#7471 )	2024-05-22 20:05:38 +03:00
Georgi Gerganov	31b2d6e05b	server : fix temperature + disable some tests (#7409 ) * server : fix temperature * server : disable tests relying on parallel determinism * ci : change server Debug -> RelWithDebInfo	2024-05-20 22:10:03 +10:00
slaren	cb9cf0fb9b	llama : remove MPI backend (#7395 )	2024-05-20 01:17:03 +02:00
Brian	a846498a4a	labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363 ) https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action Recommends the use of checkout action to use the correct repo context when applying settings for PR labels e.g. steps: - uses: actions/checkout@v4 # Uploads repository content to the runner with: repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more - uses: actions/labeler@v5 with: configuration-path: 'path/to/the/uploaded/configuration/file'	2024-05-19 20:51:03 +10:00
Georgi Gerganov	cc5796c0ec	ci : re-enable sanitizer runs (#7358 ) * Revert "ci : temporary disable sanitizer builds (#6128)" This reverts commit `4f6d1337ca`. * ci : trigger	2024-05-18 18:55:54 +03:00
Brian	85733c54b1	github-actions-labeler: initial commit (#7330 ) * github-actions-labeler: initial commit [no ci] * github actions: remove priority auto labeling [no ci]	2024-05-18 16:04:23 +10:00
Gavin Zhao	37d4f164ef	ROCm: use native CMake HIP support (#5966 ) Supercedes #4024 and #4813. CMake's native HIP support has become the recommended way to add HIP code into a project (see [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)). This PR makes the following changes: 1. The environment variable `HIPCXX` or CMake option `CMAKE_HIP_COMPILER` should be used to specify the HIP compiler. Notably this shouldn't be `hipcc`, but ROCm's clang, which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. Note that since native CMake HIP support is not yet available on Windows, on Windows we fall back to the old behavior. 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the GPU architectures to build for. Previously this was controled by `GPU_TARGETS`. 3. Updated the Nix recipe to account for these new changes. 4. The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs. 5. Added CI checks for HIP on both Linux and Windows. On Linux, we test both the new and old behavior. The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang. ~~Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still uses `GPU_TARGETS` to control architectures to build for, but I feel like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're calling `make`.~~ Makefile used `GPU_TARGETS` but the README says to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`. Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS` to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new HIP support. Signed-off-by: Gavin Zhao <git@gzgz.dev>	2024-05-17 17:03:03 +02:00
Max Krasnyansky	ab97fe155a	ci: fix bin/Release path for windows-arm64 builds (#7317 ) Switch to Ninja Multi-Config CMake generator to resurect bin/Release path that broke artifact packaging in CI.	2024-05-16 15:36:43 +10:00
Max Krasnyansky	5cc8a89c08	Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191 ) * logging: add proper checks for clang to avoid errors and warnings with VA_ARGS * build: add CMake Presets and toolchian files for Windows ARM64 * matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings * ci: add support for optimized Windows ARM64 builds with MSVC and LLVM * matmul-int8: fixed typos in q8_0_q8_0 matmuls Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * matmul-int8: remove unnecessary casts in q8_0_q8_0 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-16 12:47:36 +10:00
Radoslav Gerganov	af81b28dbf	ggml : add RPC backend (#6829 ) * ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos	2024-05-14 14:27:19 +03:00
Neo Zhang	1fa2d4319b	[SYCL] Add oneapi runtime dll files to win release package (#7241 ) * add oneapi running time dlls to release package * fix path * fix path * fix path * fix path * fix path --------- Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:04:29 +08:00
Neo Zhang	b079bf29ed	[SYCL] update CI with oneapi 2024.1 (#7235 ) Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:02:55 +08:00
Sigbjørn Skjæret	484922ff64	Disable benchmark on forked repo (#7034 ) * Disable benchmark on forked repo * only check owner on schedule event * check owner on push also * more readable as multi-line * ternary won't work * style++ * test++ * enable actions debug * test-- * remove debug * test++ * do debug where we can get logs * test-- * this is driving me crazy * correct github.event usage * remove test condition * correct github.event usage * test++ * test-- * event_name is pull_request_target * test++ * test-- * update ref checks	2024-05-05 13:38:55 +02:00
Brian	7bd1c13a56	convert.py : add python logging instead of print() (#6511 ) * convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * .py: Convert all python scripts to use logging module requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * .py: logging basiconfig refactor to use conditional expression .py: removed commented out logging fixup! .py: logging basiconfig refactor to use conditional expression constant.py: logger.error then exit should be a raise exception instead * .py: Convert logger error and sys.exit() into a raise exception (for atypical error) gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * .py: refactor logging.basicConfig() gguf-py/gguf/.py: use __name__ as logger name Since they will be imported and not run directly. python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * .py: fix flake8 warnings revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * .py: accidentally corrected the wrong line *.py: add compilade warning suggestions and style fixes	2024-05-03 22:36:41 +03:00
slaren	82e7f68958	ci : exempt confirmed bugs from being tagged as stale (#7014 )	2024-05-01 08:13:59 +03:00
Olivier Chafik	b9688eda68	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 ) * readme: cmake . -B build && cmake --build build * build: fix typo Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * build: drop implicit . from cmake config command * build: remove another superfluous . * build: update MinGW cmake commands * Update README-sycl.md Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * build: reinstate --config Release as not the default w/ some generators + document how to build Debug * build: revert more --config Release * build: nit / remove -H from cmake example * build: reword debug instructions around single/multi config split --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2024-04-29 17:02:45 +01:00
Georgi Gerganov	820703bf9c	llama : fix BPE pre-tokenization (#6920 ) * merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * lint : fix * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>	2024-04-29 16:58:41 +03:00
Przemysław Pawełczyk	2307a7b21c	ci : add building in MSYS2 environments (Windows) (#6967 )	2024-04-29 15:59:47 +03:00
Pierrick Hymbert	6feab329fe	ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935 ) * ci: server: fix python env * ci: server: fix server tests after #6638 * ci: server: fix windows is not building PR branch	2024-04-27 17:50:48 +02:00
Pierrick Hymbert	e5ef23a472	ci: server: fix python installation (#6925 )	2024-04-26 12:27:25 +02:00
Pierrick Hymbert	bdd8ba3806	ci: server: fix python installation (#6922 )	2024-04-26 11:11:51 +02:00
Pierrick Hymbert	a1cc26069f	ci: server: fix python installation (#6918 )	2024-04-26 09:27:49 +02:00
Pierrick Hymbert	f8c07fa19a	ci: fix concurrency for pull_request_target (#6917 )	2024-04-26 09:26:59 +02:00
Pierrick Hymbert	563887f3b9	ci: fix job are cancelling each other (#6781 )	2024-04-22 13:22:54 +02:00
loonerin	ba48614c84	ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748 )	2024-04-19 19:03:35 +02:00
Jaemin Son	30ad7d5191	[bug fix] convert github repository_owner to lowercase (#6673 )	2024-04-14 13:12:36 +02:00
Georgi Gerganov	774c09566b	ci : disable Metal for macOS-latest-cmake-x64 (#6628 )	2024-04-12 11:15:05 +03:00
Hugo Roussel	82a4aa7a3a	ci: download artifacts to release directory (#6612 ) When action download-artifact was updated to v4, the default download path changed. This fix binaries not being uploaded to releases.	2024-04-11 19:52:21 +02:00
Pierrick Hymbert	b65aae6a1d	eval-callback: Example how to use eval callback for debugging (#6576 ) * gguf-debug: Example how to use ggml callback for debugging * gguf-debug: no mutex, verify type, fix stride. * llama: cv eval: move cb eval field in common gpt_params * ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. * ggml_debug: ci: add tests * ggml_debug: EOL in CMakeLists.txt * ggml_debug: Remove unused param n_batch, no batching here * ggml_debug: fix trailing spaces * ggml_debug: fix trailing spaces * common: fix cb_eval and user data not initialized * ci: build revert label * ggml_debug: add main test label * doc: add a model: add a link to ggml-debug * ggml-debug: add to make toolchain * ggml-debug: tests add the main label * ggml-debug: ci add test curl label * common: allow the warmup to be disabled in llama_init_from_gpt_params * ci: add curl test * ggml-debug: better tensor type support * gitignore : ggml-debug * ggml-debug: printing also the sum of each tensor * ggml-debug: remove block size * eval-callback: renamed from ggml-debug * eval-callback: fix make toolchain --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-11 14:51:07 +02:00
Pierrick Hymbert	73e01781d4	ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495 ) * ci: bench: support sse and fix prompt processing time server: add tokens usage in stream mode * ci: bench: README.md EOL * ci: bench: remove total pp and tg as it is not accurate * ci: bench: fix case when there is no token generated * ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics * ci: bench: fix finish reason rate	2024-04-06 05:40:47 +02:00
Minsoo Cheong	016aa58b11	ci: exempt master branch workflows from getting cancelled (#6486 ) * ci: exempt master branch workflows from getting cancelled * apply to bench.yml	2024-04-04 18:30:53 +02:00
Ewout ter Hoeven	7d32d7d775	build CI: Name artifacts (#6482 ) Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP. It might be possible to further simplify the packing step (in future PRs).	2024-04-04 17:08:55 +02:00
Pierrick Hymbert	fd66566ee1	ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478 )	2024-04-04 16:59:04 +02:00
Pierrick Hymbert	4cf330fb1e	ci: bench: add more ftype, fix triggers and bot comment (#6466 ) * ci: bench: change trigger path to not spawn on each PR * ci: bench: add more file type for phi-2: q8_0 and f16. - do not show the comment by default * ci: bench: add seed parameter in k6 script * ci: bench: artefact name perf job * Add iteration in the commit status, reduce again the autocomment * ci: bench: add per slot metric in the commit status * Fix trailing spaces	2024-04-04 12:57:58 +03:00
Ewout ter Hoeven	e9cd978f7e	ci : update checkout, setup-python and upload-artifact to latest (#6456 ) * CI: Update actions/checkout to v4 * CI: Update actions/setup-python to v5 * CI: Update actions/upload-artifact to v4	2024-04-03 21:01:13 +03:00
Pierrick Hymbert	5f580568dd	ci: server: verify deps are coherent with the commit (#6409 ) * ci: server: verify deps are coherent with the commit * ci: server: change the ref to build as now it's a pull event target	2024-04-01 12:36:40 +02:00
Pierrick Hymbert	f0be4bd555	ci: bench: fix Resource not accessible by integration on PR event (#6393 )	2024-03-30 12:36:07 +02:00

1 2 3 4

181 Commits