mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-21 06:59:21 +00:00
Add GitHub data (#637)
This commit is contained in:
13
github-data/pull_requests/69-Allow bf16 kv-cache.md
Normal file
13
github-data/pull_requests/69-Allow bf16 kv-cache.md
Normal file
@@ -0,0 +1,13 @@
|
||||
### 🔀 [#69](https://github.com/ikawrakow/ik_llama.cpp/pull/69) - Allow bf16 kv-cache
|
||||
|
||||
| **Author** | `ikawrakow` |
|
||||
| :--- | :--- |
|
||||
| **State** | ❌ **Closed** |
|
||||
| **Created** | 2024-09-29 |
|
||||
| **Updated** | 2024-09-29 |
|
||||
|
||||
---
|
||||
|
||||
#### Description
|
||||
|
||||
On the CPU I get the exact same PPL with and without FA using `bf16` for kv-cache. But on CUDA the `bf16` kv-cache result is about the same as the `fp16` kv-cache CPU result, so I'm missing some conversion somewhere. Either way, we can now run on all platforms supported here with `bf16` kv-cache.
|
||||
Reference in New Issue
Block a user