ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-07 06:50:09 +00:00

Author	SHA1	Message	Date
iohub	a1392fc017	readme : add 3rd party collama reference to UI list (#4840 ) Add a VSCode extension for llama.cpp reference to UI list	2024-01-09 18:45:54 +02:00
Georgi Gerganov	ac412c4ec1	readme : add link to SOTA models	2024-01-08 20:25:17 +02:00
Lars Grammel	9e96d6076a	readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814 )	2024-01-07 22:24:11 +02:00
automaticcat	6394e47300	ggml : add ggml_cpu_has_avx_vnni() (#4589 ) * feat: add avx_vnni based on intel documents * ggml: add avx vnni based on intel document * llama: add avx vnni information display * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * Update ggml.c Fix indentation upgate Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-30 10:07:48 +02:00
manikbhandari	d818687d5a	gpt2 : Add gpt2 architecture integration (#4555 )	2023-12-28 15:03:57 +01:00
Paul Tsochantaris	fe79201273	Adding Emeltal reference to UI list (#4629 )	2023-12-25 18:09:53 +02:00
Shintarou Okada	659fe6b867	llama : add PLaMo model (#3557 ) * add plamo mock * add tensor loading * plamo convert * update norm * able to compile * fix norm_rms_eps hparam * runnable * use inp_pos * seems ok * update kqv code * remove develop code * update README * shuffle attn_q.weight and attn_output.weight for broadcasting * remove plamo_llm_build_kqv and use llm_build_kqv * fix style * update * llama : remove obsolete KQ_scale * plamo : fix tensor names for correct GPU offload --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-24 15:35:49 +02:00
FantasyGmm	ce2c5517e6	cuda : fix jetson compile error (#4560 ) * fix old jetson compile error * Update Makefile * update jetson detect and cuda version detect * update cuda marco define * update makefile and cuda,fix some issue * Update README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update Makefile * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-22 17:11:12 +02:00
Michael Kesper	10c9ac210e	make : add LLAMA_HIP_UMA option (#4587 ) NB: LLAMA_HIP_UMA=1 (or any value) adds MK_CPPFLAG -DGGML_HIP_UMA	2023-12-22 10:03:25 +02:00
Deins	55c7355ee0	readme : add zig bindings (#4581 )	2023-12-22 08:49:54 +02:00
Erik Garrison	824b8257cd	cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449 ) * AMD ROCm: handle UMA memory VRAM expansions This resolves #2797 by allowing ROCm AMD GPU users with a UMA to dynamically expand the VRAM allocated to the GPU. Without this, AMD ROCm users with shared CPU/GPU memory usually are stuck with the BIOS-set (or fixed) framebuffer VRAM, making it impossible to load more than 1-2 layers. Note that the model is duplicated in RAM because it's loaded once for the CPU and then copied into a second set of allocations that are managed by the HIP UMA system. We can fix this later. * clarify build process for ROCm on linux with cmake * avoid using deprecated ROCm hipMallocHost * keep simplifying the change required for UMA * cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON	2023-12-21 21:45:32 +02:00
Georgi Gerganov	946602a70f	readme : update coding guidelines	2023-12-21 19:27:14 +02:00
Georgi Gerganov	a88ba9a360	readme : update hot topics	2023-12-17 20:16:23 +02:00
BarfingLemurs	db6c6b68b7	readme : update supported model list (#4457 )	2023-12-14 09:38:49 +02:00
Georgi Gerganov	07dc8ed2c2	readme : update hot topics	2023-12-13 14:05:38 +02:00
Georgi Gerganov	86bfbd3afc	llama : per-layer KV cache + quantum K cache (#4309 ) * per-layer KV * remove unnecessary copies * less code duplication, offload k and v separately * llama : offload KV cache per-layer * llama : offload K shift tensors * llama : offload for rest of the model arches * llama : enable offload debug temporarily * llama : keep the KV related layers on the device * llama : remove mirrors, perform Device -> Host when partial offload * common : add command-line arg to disable KV cache offloading * llama : update session save/load * llama : support quantum K cache (#4312) * llama : support quantum K cache (wip) * metal : add F32 -> Q8_0 copy kernel * cuda : add F32 -> Q8_0 copy kernel ggml-ci * cuda : use mmv kernel for quantum cache ops * llama : pass KV cache type through API * llama : fix build ggml-ci * metal : add F32 -> Q4_0 copy kernel * metal : add F32 -> Q4_1 copy kernel * cuda : wip * cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels * llama-bench : support type_k/type_v * metal : use mm kernel only for quantum KV cache * cuda : add comment * llama : remove memory_f16 and kv_f16 flags --------- Co-authored-by: slaren <slarengh@gmail.com> * readme : add API change notice --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-12-07 13:03:17 +02:00
vodkaslime	bc91cdbd87	readme : fix (#4135 ) * fix: readme * chore: resolve comments * chore: resolve comments	2023-11-30 23:49:21 +02:00
Dawid Wysocki	8f1e6fbde7	readme : fix typo (#4253 ) llama.cpp uses GitHub Actions, not Gitlab Actions.	2023-11-30 23:43:32 +02:00
Peter Sugihara	d119cde4a5	readme : add FreeChat (#4248 )	2023-11-29 09:16:34 +02:00
Kasumi	2cf38d14b2	readme : add Amica to UI list (#4230 )	2023-11-27 19:39:42 +02:00
Georgi Gerganov	6f7d280455	readme : update hot topics	2023-11-26 20:42:51 +02:00
Georgi Gerganov	e5d642885c	readme : update hot topics	2023-11-25 12:02:13 +02:00
Aaryaman Vasishta	92eb4cdab4	readme : use PATH for Windows ROCm (#4195 ) * Update README.md to use PATH for Windows ROCm * Update README.md * Update README.md	2023-11-24 09:52:39 +02:00
Georgi Gerganov	a8e65a6b4c	readme : update hot topics	2023-11-23 13:51:22 +02:00
Aaryaman Vasishta	94da394760	readme : update ROCm Windows instructions (#4122 ) * Update README.md * Update README.md Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-11-20 17:02:46 +02:00
Galunid	d200fc170a	stablelm : StableLM support (#3586 ) * Add support for stablelm-3b-4e1t * Supports GPU offloading of (n-1) layers	2023-11-14 11:17:12 +01:00
Georgi Gerganov	5940637098	readme : update hot topics	2023-11-13 14:18:08 +02:00
Richard Kiss	a05fccf374	Fix some documentation typos/grammar mistakes (#4032 ) * typos * Update examples/parallel/README.md Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>	2023-11-11 23:04:58 -07:00
Georgi Gerganov	534bbd5c14	readme : add notice about #3912	2023-11-02 20:44:12 +02:00
Ian Scrivener	21a26a6dea	readme : remove unsupported node.js library (#3703 ) - https://github.com/Atome-FE/llama-node is quite out of date - doesn't support recent/current llama.cpp functionality	2023-10-22 21:16:43 +03:00
Georgi Gerganov	ede7949722	sampling : refactor init to use llama_sampling_params (#3696 ) * sampling : refactor init to use llama_sampling_params * llama : combine repetition, frequency and presence penalties in 1 call * examples : remove embd-input and gptneox-wip * sampling : rename penalty params + reduce size of "prev" vector * sampling : add llama_sampling_print helper * sampling : hide prev behind API and apply #3661 ggml-ci	2023-10-20 21:07:23 +03:00
Georgi Gerganov	f9bbb76017	readme : update hot topics	2023-10-18 21:44:43 +03:00
BarfingLemurs	2404ccf7ab	readme : update hot-topics & models, detail windows release in usage (#3615 ) * Update README.md * Update README.md * Update README.md * move "Running on Windows" section below "Prepare data and run" --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 21:13:21 +03:00
ldwang	e49cde7ded	readme : add Aquila2 links (#3610 ) Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-10-17 18:52:33 +03:00
Ian Scrivener	3ee11e89e1	typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592 ) fixed a typo in the MacOS Metal run doco	2023-10-12 14:10:50 +03:00
Galunid	a637869df6	Add MPT model to supported models in README.md (#3574 )	2023-10-10 19:02:49 -04:00
Xingchen Song(宋星辰)	8994c485e9	readme : add bloom (#3570 )	2023-10-10 19:28:50 +03:00
BarfingLemurs	3226b5d74b	readme : update models, cuda + ppl instructions (#3510 )	2023-10-06 22:13:36 +03:00
Georgi Gerganov	1ded9d4793	readme : add project status link	2023-10-04 16:50:44 +03:00
slaren	a18aa627fa	llama.cpp : add documentation about rope_freq_base and scale values (#3401 ) * llama.cpp : add documentation about rope_freq_base and scale values * add notice to hot topics	2023-09-29 18:42:32 +02:00
BarfingLemurs	6706639c45	readme : update hot topics + model links (#3399 )	2023-09-29 15:50:35 +03:00
Andrew Duffy	93527803e3	readme : add link to grammars app (#3388 ) * Add link to grammars app per @ggernagov suggestion Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211 * Update README.md	2023-09-29 14:15:57 +03:00
Pierre Alexandre SCHEMBRI	6580c05d1c	readme : add Mistral AI release 0.1 (#3362 )	2023-09-28 15:13:37 +03:00
BarfingLemurs	9d92d67428	readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340 ) * Update README.md * Update README.md * Update README.md with k-quants bpw measurements	2023-09-27 18:30:36 +03:00
2f38b454	be8fb3dc9b	docs: Fix typo CLBlast_DIR var. (#3330 )	2023-09-25 20:24:52 +02:00
Lee Drake	1e8ebda8ce	Update README.md (#3289 ) * Update README.md * Update README.md Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-09-21 21:00:24 +02:00
Georgi Gerganov	7eca40bf4b	readme : update hot topics	2023-09-20 20:48:22 +03:00
Johannes Gäßler	94a0ea6e76	CUDA: enable peer access between devices (#2470 )	2023-09-17 16:37:53 +02:00
dylan	61cead9a5b	docker : add gpu image CI builds (#3103 ) Enables the GPU enabled container images to be built and pushed alongside the CPU containers. Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com>	2023-09-14 19:47:00 +03:00
Ikko Eltociear Ashimine	8db00f111b	readme : fix typo (#3043 ) * readme : fix typo acceleation -> acceleration * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-08 19:04:32 +03:00

1 2 3 4 5

230 Commits