ik_llama.cpp/504 - Add DRY and fix the server to use other new samplers..md at 30381fc1fc6a302f9de0487b1e719f4efcc06a00 - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

7.8 KiB

Raw Blame History

🐛 #504 - Add DRY and fix the server to use other new samplers.

Author	`Ph0rk0z`
State	✅ Open
Created	2025-06-07
Updated	2025-06-13

Description

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

So with some vibe coding I added what should be a working dry implementation. Nothing has exploded. Also the server was never modified to use the new samplers so they did nothing unless you were using the main llama.cpp executable without a front end.

I didn't update docs as the other PR seems to do that (but not the server, lol). Let me know if this is any good or not. Much lighter than porting the new sampler arch.

There's also a spot in the header where sampler order array was never updated? Does it have to be?

💬 Conversation

👤 saood06 commented the 2025-06-08 at 05:11:07:

I see you also used the Z-algorithm and the implementation looks the same but you stripped the comments explaining it and where it came from. Any reason for that choice?

👤 saood06 commented the 2025-06-08 at 10:55:44:

Nope. If you find a mistake or a place that needs a comment/credit/etc please point it out.

I did point it out though?

The whole code of llama_sample_dry_impl looks like the exact same as llama_sampler_dry_apply from the DRY PR of mainline link to the file in question from that PR here https://github.com/ggml-org/llama.cpp/pull/9702/files#diff-ccfd27e7598c9965070306d4c6baf3cb4bf844211d1d37d7c52b0d03c8624507

But the difference is it is lacking pretty much all of comments (which contain attributions alongside a lot of helpful info) that are contained in the mainline PR.

All I care is that it works and we have dry.

That may be what you care about, but attribution and credit even when not required (I am not sure it is here, but IANAL) is a nice thing to give, and it looks especially bad considering it really does look like you copy and pasted the code and then removed the attributions and comments.

I am not saying that is what you did (I can't know, so I won't assume), but it definitely does look that way considering the code is identical and that is not a good look.

👤 ikawrakow commented the 2025-06-08 at 11:28:21:

I agree with @saood06. Let's not remove the credits and comments.

👤 Ph0rk0z commented the 2025-06-08 at 11:46:41:

It went through a LLM but you're working up some scenario where I actively went through and took them out. I'll put them back best I can.

👤 saood06 commented the 2025-06-08 at 11:57:42:

It went through a LLM but you're working up some scenario where I actively went through and took them out.

I did originally ask more politely "Any reason for that choice?" and you didn't offer an explanation so I wanted to make it clear what it looks like happened, and I even stated "I can't know, so I won't assume" and I was going to even reference you stating you did vibe coding, but the point was that it looks identical to that like that, and that does impact how people perceive it.

Even if you didn't actively take them out (which I believe you when you say you didn't), you did submit a PR where they were stripped out.

👤 saood06 commented the 2025-06-08 at 12:28:52:

It doesn't match their code 1:1 copy pasted.. putting the comments in sort of reveals that. Parts of it do. Its an amalgamation of the PR which was built from k.cpp which itself is probably based on pew and textgen webui code.

Enough of it did where me looking at both side by side made me feel the need to say something. I never stated it was a copy and paste, but unless you look closely it is hard to tell that it isn't.

Thank you for putting in the work to make this PR I do appreciate it, sorry that didn't come across in my earlier comments, but I still stand by what I said (but maybe I should have included the thank you earlier).

👤 saood06 submitted a review the 2025-06-08 at 12:30:47: 💬 COMMENTED

👤 saood06 commented during a code review the 2025-06-08 at 12:30:47 on src/llama-sampling.cpp:

Is this correct? And even if it is why subtract one then add it?

👤 saood06 submitted a review the 2025-06-08 at 12:31:14: 💬 COMMENTED

👤 saood06 commented during a code review the 2025-06-08 at 12:31:14 on src/llama-sampling.cpp:

You accidentally duplicated this when pasting in the comment.

👤 Ph0rk0z submitted a review the 2025-06-08 at 12:43:31: 💬 COMMENTED

👤 Ph0rk0z commented the 2025-06-08 at 12:53:16:

That's fine, I hope that more people test it than just us. Remember that dry removes/breaks up engrams not single word repetition. I'll pull changes from here back in and keep rolling with it. Also another reminder that anyone using XTC or n sigma on server was not having it apply. The parameters weren't there.

Need to figure out if new samplers all belong here in sampling.h too

    std::vector<llama_sampler_type> samplers_sequence = {
        llama_sampler_type::TOP_K,
        llama_sampler_type::TFS_Z,
        llama_sampler_type::TYPICAL_P,
        llama_sampler_type::TOP_P,
        llama_sampler_type::MIN_P,
        llama_sampler_type::TEMPERATURE
    };

👤 saood06 submitted a review the 2025-06-08 at 12:58:55: 💬 COMMENTED

👤 saood06 commented during a code review the 2025-06-08 at 12:58:55 on src/llama-sampling.cpp:

Yes, but that still doesn't answer my question of is it correct? It doesn't look equivalent to the reference implementation to me.

👤 saood06 submitted a review the 2025-06-08 at 13:04:40: 💬 COMMENTED

👤 saood06 submitted a review the 2025-06-08 at 13:06:56: 💬 COMMENTED

👤 saood06 commented during a code review the 2025-06-08 at 13:06:56 on src/llama-sampling.h:

The reference uses a ring_buffer for this and not a vector. You added an implementation for a ring_buffer but never used it

👤 saood06 commented the 2025-06-08 at 13:09:57:

It doesn't match their code 1:1 copy pasted

From my experience porting code from mainline it is usually easier to do that and then fix incompatibilities and any other issues than to do what you did. It also makes reviewing it easier.

👤 Ph0rk0z commented the 2025-06-08 at 13:11:22:

Yea, in this case it is much much too different. I took several cracks at that and failed each time.

👤 saood06 commented the 2025-06-08 at 13:20:16:

I haven't built or ran the code yet, don't have time to test it tonight.

I did leave some more comments though just from reading the code, I don't think it is worth testing anyway until they are resolved.

👤 saood06 commented the 2025-06-09 at 10:31:04:

why does it show comments as pending in gh?

That is odd.

If you want I can try to port the ring buffer if you say it offers better efficiency, but I am testing it as it is right now.

I'll approve or request changes based on that.

👤 Ph0rk0z commented the 2025-06-09 at 10:38:59:

I tried with the RB and it caused more problems. Unless there's some big slowdowns, its probably not worth it. Another "trick" directly from pew was to set high top_K like (i.e 100) and place it before DRY to speed everything up. I've been doing that on mainline since I heard about it. Here I already did DRY on/off and the t/s was the same. Probably the thing to look out for.

7.8 KiB Raw Blame History

🐛 #504 - Add DRY and fix the server to use other new samplers.

Description

💬 Conversation

7.8 KiB

Raw Blame History