Files
ik_llama.cpp/github-data/pull_requests/510 - Update News section of readme.md
2025-07-23 13:31:53 +02:00

4.6 KiB

🔀 #510 - Update News section of readme

Author saood06
State Closed
Created 2025-06-09
Updated 2025-06-13

Description

@ikawrakow

Making this draft PR to get your feedback on this format before I add all the new ones (and add in all the missing links).

Do you see any way to condense the sections that are currently one PR per line? (Maybe subsections of Performance improvements?)

And if any of them can be removed as they are no longer relevant (especially if MLA-2 is deprecated)


💬 Conversation

👤 ikawrakow commented the 2025-06-09 at 13:20:55:

Yes, you can split it like this


👤 saood06 commented the 2025-06-11 at 04:54:07:

@ikawrakow

I have added in all the new PR's (skipping a few trivial ones).

I still need to add the PR links for the old stuff, but this still feels too long and organization (ordering, categorization, omission/inclusion) feels like it could still be improved.

Any thoughts?


👤 ikawrakow submitted a review the 2025-06-11 at 05:41:57: 💬 COMMENTED


👤 ikawrakow commented during a code review the 2025-06-11 at 05:41:57 on README.md:

And not GLM-4, LlaMA-4, Qwen3/Qwen3-MoE ?


👤 ikawrakow commented during a code review the 2025-06-11 at 05:43:24 on README.md:

I would count the trellis quants also here. They partially implemented a long time ago, but the PRs to add CPU and Metal support are quite recent.


👤 ikawrakow commented during a code review the 2025-06-11 at 05:44:57 on README.md:

Duplicate


👤 ikawrakow submitted a review the 2025-06-11 at 05:45:58: 💬 COMMENTED


👤 saood06 submitted a review the 2025-06-11 at 05:49:53: 💬 COMMENTED


👤 saood06 commented during a code review the 2025-06-11 at 05:49:53 on README.md:

Not sure what you mean, all three you mentioned are included alongside their respective PRs. (Qwen 3 is just listed as Qwen3 and not Qwen3/Qwen3-MoE)


👤 ikawrakow submitted a review the 2025-06-11 at 05:52:58: 💬 COMMENTED


👤 ikawrakow commented during a code review the 2025-06-11 at 05:52:58 on README.md:

Oh, sorry, short attention span. Didn't reed the whole line. It seems I need LLM support when reviewing.


👤 saood06 submitted a review the 2025-06-11 at 05:54:05: 💬 COMMENTED


👤 saood06 submitted a review the 2025-06-11 at 05:55:02: 💬 COMMENTED


👤 saood06 commented during a code review the 2025-06-11 at 05:55:02 on README.md:

It really isn't entirely your fault. I don't like this being one block but if I split it into multiple lines it takes too much space.


👤 saood06 submitted a review the 2025-06-11 at 06:18:03: 💬 COMMENTED


👤 saood06 commented during a code review the 2025-06-11 at 06:18:03 on README.md:

Fixed.


👤 saood06 submitted a review the 2025-06-11 at 06:19:12: 💬 COMMENTED


👤 ikawrakow submitted a review the 2025-06-11 at 06:55:52: 💬 COMMENTED


👤 ikawrakow commented during a code review the 2025-06-11 at 06:55:52 on README.md:

Sure.

One thing that bothers me is that many people appear to be thinking that they need ik_llama.cpp-specific quants to use ik_llama.cpp. Or that they need to do something additional in order to be able to use llama.cpp GGUFs with ik_llama.cpp. At least this is the impression I get from the comments people make here. I think it would be useful to point out that they can grab any GGUF and just use it the way it is will ik_llama.cpp.


👤 saood06 submitted a review the 2025-06-11 at 07:12:17: 💬 COMMENTED


👤 saood06 commented the 2025-06-12 at 16:57:29:

Will you finish it, or are you waiting for me to finish it?

I was waiting for a response on what to do to help clarify that people can use existing GGUFs (assuming model support exists here). I just added the missing PR links and am doing the IQK quants section now.

Overall although I think this is an improvement and will be shorter than the old approach as time goes on, I still think it still has the same problem of it will just keep getting longer (and may already be too long).


👤 ikawrakow submitted a review the 2025-06-13 at 04:56:31: APPROVED