Commit Graph

132 Commits

Author SHA1 Message Date
Jared Van Bortel
4742bda9a2 cmake : pass CPU architecture flags to nvcc (#5146) 2024-01-26 15:34:06 -05:00
Xuan Son Nguyen
8bb43a2380 devops : add intel oneapi dockerfile (#5068)
Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
2024-01-23 09:11:39 +02:00
XiaotaoChen
27a6a3d428 llava : MobileVLM support (#4954)
* MobileVLM native implementation

* delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake

* move android script to example/llava directory

* Fix the editor config checks

---------

Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-22 15:09:35 +02:00
Herman Semenov
0a093b1cbb cmake : add support for ccache (#5002)
* Added support ccache for speedup recompilation

* cmake : option to disable ccache

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-20 10:11:31 +02:00
Georgi Gerganov
8a4d0e3cf5 cmake : add ggml public headers (#5011) 2024-01-18 23:36:07 +02:00
ngc92
663c94457d pass cpu-architecture arguments only to host code (C;C++) (#4943) 2024-01-15 19:40:48 +01:00
Karthik Kumar Viswanathan
e77c6d2b23 llama : support WinXP build with MinGW 8.1.0 (#3419) 2024-01-14 10:41:44 +02:00
a-n-n-a-l-e-e
b52357162d cmake : check for openblas64 (#4134)
openblas v0.3.22 64-bit pkg-config file is named openblas64.pc
https://github.com/OpenMathLib/OpenBLAS/issues/3790
2024-01-05 18:04:40 +02:00
Georgi Gerganov
7e27e37f26 metal : switch back to default.metallib (ggml/681)
ggml-ci
2024-01-05 18:02:06 +02:00
Georgi Gerganov
64ec26ed76 metal : enable shader debugging (cmake option) (#4705)
* ggml : disable fast-math for Metal (cmake build only)

ggml-ci

* metal : fix Metal API debug warnings

* cmake : add -fno-inline for Metal build (#4545)

* metal : fix API debug warnings

* metal : fix compile warnings

* metal : use uint64_t for strides

* cmake : rename option to LLAMA_METAL_SHADER_DEBUG

* metal : fix mat-vec Q8_0 kernel for BS > 1

* metal : normalize mat-vec kernel signatures

* cmake : respect LLAMA_QKK_64 option

* metal : fix mat-vec Q4_K kernel for QK_K == 64

ggml-ci
2024-01-02 10:57:44 +02:00
slaren
556d3ccdb7 cuda : improve cuda pool efficiency using virtual memory (#4606)
* cuda : improve cuda pool efficiency using virtual memory

* fix mixtral

* fix cmake build

* check for vmm support, disable for hip

ggml-ci

* fix hip build

* clarify granularity

* move all caps to g_device_caps

* refactor error checking

* add cuda_pool_alloc, refactor most pool allocations

ggml-ci

* fix hip build

* CUBLAS_TF32_TENSOR_OP_MATH is not a macro

* more hip crap

* llama : fix msvc warnings

* ggml : fix msvc warnings

* minor

* minor

* cuda : fallback to CPU on host buffer alloc fail

* Update ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* ensure allocations are always aligned

* act_size -> actual_size

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-12-24 14:34:22 +01:00
Erik Garrison
824b8257cd cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)
* AMD ROCm: handle UMA memory VRAM expansions

This resolves #2797 by allowing ROCm AMD GPU users with a UMA to
dynamically expand the VRAM allocated to the GPU.

Without this, AMD ROCm users with shared CPU/GPU memory usually are
stuck with the BIOS-set (or fixed) framebuffer VRAM, making it
impossible to load more than 1-2 layers.

Note that the model is duplicated in RAM because it's loaded once for
the CPU and then copied into a second set of allocations that are
managed by the HIP UMA system. We can fix this later.

* clarify build process for ROCm on linux with cmake

* avoid using deprecated ROCm hipMallocHost

* keep simplifying the change required for UMA

* cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON
2023-12-21 21:45:32 +02:00
Bach Le
a957bfa577 Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) 2023-12-17 11:57:33 +01:00
Jared Van Bortel
4aad625587 build : detect host compiler and cuda compiler separately (#4414) 2023-12-13 12:10:10 -05:00
Jared Van Bortel
217f9ca1d4 build : target Windows 8 for standard mingw-w64 (#4405)
* build : target Windows 8 for standard mingw-w64

* make : fix missing console.o deps

This was causing a link error with `make all` on Windows.
2023-12-12 11:27:26 +02:00
Georgi Gerganov
8a8220f13a sync : ggml (new ops, tests, backend, etc.) (#4359)
* sync : ggml (part 1)

* sync : ggml (part 2, CUDA)

* sync : ggml (part 3, Metal)

* ggml : build fixes

ggml-ci

* cuda : restore lost changes

* cuda : restore lost changes (StableLM rope)

* cmake : enable separable compilation for CUDA

ggml-ci

* ggml-cuda : remove device side dequantize

* Revert "cmake : enable separable compilation for CUDA"

This reverts commit 09e35d04b1c4ca67f9685690160b35bc885a89ac.

* cuda : remove assert for rope

* tests : add test-backend-ops

* ggml : fix bug in ggml_concat

* ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`

* ci : try to fix macOS

* ggml-backend : remove backend self-registration

* ci : disable Metal for macOS cmake build

ggml-ci

* metal : fix "supports family" call

* metal : fix assert

* metal : print resource path

ggml-ci

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-12-07 22:26:54 +02:00
Jared Van Bortel
55e92243c5 build : enable libstdc++ assertions for debug builds (#4275) 2023-12-01 20:18:35 +02:00
Li Tan
c7df772921 cmake : fix the metal file foder path (#4217) 2023-11-30 23:44:11 +02:00
bandoti
b854430d8d cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970)
* Split CPP generation from build-info query

* Remove blank lines

* Add BUILD_SHARED_LIBS option
2023-11-27 21:25:42 +02:00
Roger Meier
30c43147f4 build : support ppc64le build for make and CMake (#3963)
* build: support ppc64le build for make and CMake

* build: keep __POWER9_VECTOR__ ifdef and extend with __powerpc64__

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-17 18:11:23 +02:00
Michael Potter
98080efe1d Fix MacOS Sonoma model quantization (#4052)
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-14 12:34:41 -05:00
Eve
a04db7ec4e cmake : MSVC instruction detection (fixed up #809) (#3923)
* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
2023-11-05 10:03:09 +02:00
slaren
afbc267d09 cmake : disable LLAMA_NATIVE by default (#3906) 2023-11-02 14:10:33 +02:00
cebtenzzre
6c1e3443ed build : link against build info instead of compiling against it (#3879)
* cmake : fix build when .git does not exist

* cmake : simplify BUILD_INFO target

* cmake : add missing dependencies on BUILD_INFO

* build : link against build info instead of compiling against it

* zig : make build info a .cpp source instead of a header

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>

* cmake : revert change to CMP0115

---------

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
2023-11-02 08:50:16 +02:00
Georgi Gerganov
91be989d92 ggml : quantization refactoring (#3833)
* ggml : factor all quantization code in ggml-quants

ggml-ci

* ggml-quants : fix Zig and Swift builds + quantize tool

ggml-ci

* quantize : --pure option for disabling k-quant mixtures

---------

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-10-29 18:32:28 +02:00
Georgi Gerganov
06d579b4ed cuda : improve text-generation and batched decoding performance (#3776)
* cuda : prints wip

* cuda : new cublas gemm branch for multi-batch quantized src0

* cuda : add F32 sgemm branch

* cuda : fine-tune >= VOLTA params + use MMQ only for small batches

* cuda : remove duplicated cuBLAS GEMM code

* cuda : add CUDA_USE_TENSOR_CORES and GGML_CUDA_FORCE_MMQ macros

* build : add compile option to force use of MMQ kernels
2023-10-27 17:01:23 +03:00
Georgi Gerganov
42adfb7c7c cuda : add batched cuBLAS GEMM for faster attention (#3749)
* cmake : add helper for faster CUDA builds

* batched : add NGL arg

* ggml : skip nops in compute_forward

* cuda : minor indentation

* cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)

* Apply suggestions from code review

These changes plus:

```c++
#define cublasGemmBatchedEx hipblasGemmBatchedEx
```

are needed to compile with ROCM. I haven't done performance testing, but it seems to work.

I couldn't figure out how to propose a change for lines outside what the pull changed, also this is the first time trying to create a multi-part review so please forgive me if I mess something up.

* cuda : add ROCm / hipBLAS cublasGemmBatchedEx define

* cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases

* cuda : reduce mallocs in cublasGemmBatchedEx branch

* cuda : add TODO for calling cublas from kernel + using mem pool

---------

Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-10-24 16:48:37 +03:00
Georgi Gerganov
7bf0bf6231 cmake : fix add_compile_options on macOS 2023-10-12 14:31:05 +03:00
Georgi Gerganov
78b3d9b796 sync : ggml (ggml-backend) (#3548)
* sync : ggml (ggml-backend)

ggml-ci

* zig : add ggml-backend to the build
2023-10-08 20:19:14 +03:00
Eve
482d162480 cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273)
* fix LLAMA_NATIVE

* syntax

* alternate implementation

* my eyes must be getting bad...

* set cmake LLAMA_NATIVE=ON by default

* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc

* revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile

* remove -DLLAMA_MPI=ON

---------

Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03 19:53:15 +03:00
cebtenzzre
b404cd317d cmake : increase minimum version for add_link_options (#3444) 2023-10-02 22:38:43 +03:00
cebtenzzre
4f46a163c0 cmake : make CUDA flags more similar to the Makefile (#3420)
* cmake : fix misuse of cxx_flags

* cmake : make CUDA flags more similar to the Makefile

* cmake : fix MSVC build
2023-10-02 16:16:50 +03:00
bandoti
f74c19d38f cmake : fix transient definitions in find pkg (#3411) 2023-10-02 12:51:49 +03:00
Cebtenzzre
4cc4f84aea build : enable more non-default compiler warnings (#3200) 2023-09-28 17:41:44 -04:00
Jag Chadha
31e3b674ad build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (#3342) 2023-09-27 18:34:32 +03:00
DAN™
d9749e7cc3 cmake : fix build-info.h on MSVC (#3309) 2023-09-25 18:45:33 -04:00
Johannes Gäßler
94a0ea6e76 CUDA: enable peer access between devices (#2470) 2023-09-17 16:37:53 +02:00
Cebtenzzre
4e89732b50 check C++ code with -Wmissing-declarations (#3184) 2023-09-15 15:38:27 -04:00
Engininja2
d840da1bcd cmake : fix building shared libs for clang (rocm) on windows (#3176) 2023-09-15 15:24:30 +03:00
Andrei
61681fa802 cmake : fix llama.h location when built outside of root directory (#3179) 2023-09-15 11:07:40 +03:00
Andrei
7940ea477b cmake : compile ggml-rocm with -fpic when building shared library (#3158) 2023-09-14 20:38:16 +03:00
bandoti
deaffb425b cmake : add relocatable Llama package (#2960)
* Keep static libs and headers with install

* Add logic to generate Config package

* Use proper build info

* Add llama as import library

* Prefix target with package name

* Add example project using CMake package

* Update README

* Update README

* Remove trailing whitespace
2023-09-14 20:04:40 +03:00
Tristan Ross
fefd1c924f cmake : add a compiler flag check for FP16 format (#3086) 2023-09-13 16:08:52 +03:00
Johannes Gäßler
a1999004f0 CUDA: mul_mat_q RDNA2 tunings (#2910)
* CUDA: mul_mat_q RDNA2 tunings

* Update ggml-cuda.cu

Co-authored-by: Henri Vasserman <henv@hot.ee>

---------

Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-09-13 11:20:24 +02:00
Eric Sommerlade
c87a266056 arm64 support for windows (#3007)
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-09-12 21:54:20 -04:00
Jhen-Jie Hong
e6d674b4cd cmake : support build for iOS/tvOS (#3116)
* cmake : support build for iOS/tvOS

* ci : add iOS/tvOS build into macOS-latest-cmake

* ci : split ios/tvos jobs
2023-09-11 19:49:06 +08:00
Georgi Gerganov
ba1b0a362d sync : ggml (CUDA GLM RoPE + POSIX) (#3082)
ggml-ci
2023-09-08 17:58:07 +03:00
Przemysław Pawełczyk
193c737cb1 build : do not use _GNU_SOURCE gratuitously (#2035)
* Do not use _GNU_SOURCE gratuitously.

What is needed to build llama.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions,
plus some stuff from BSD that is not specified in POSIX.1.

Well, that was true until NUMA support was added recently,
so enable GNU libc extensions for Linux builds to cover that.

Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
FTMs set by Makefile here or other FTMs depending on their needs.

It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.

* make : enable Darwin extensions for macOS to expose RLIMIT_MEMLOCK

* make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK

* make : use BSD-specific FTMs to enable alloca on BSDs

* make : fix OpenBSD build by exposing newer POSIX definitions

* cmake : follow recent FTM improvements from Makefile
2023-09-08 15:09:21 +03:00
Kunshang Ji
d2db7d97c6 enable CPU HBM (#2603)
* add cpu hbm support

* add memalign 0 byte check

* Update ggml.c

* Update llama.cpp

* ggml : allow ggml_init with 0 size

* retrigger ci

* fix code style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-08 03:46:56 +02:00
Cebtenzzre
bd7504dd6e fix some warnings from gcc and clang-tidy (#3038)
Co-authored-by: xaedes <xaedes@gmail.com>
2023-09-07 13:22:29 -04:00