Files
ik_llama.cpp/ggml/src/kompute-shaders/op_diagmask.comp
Kawrakow 0ceeb11721 Merge mainline llama.cpp (#3)
* Merging mainline - WIP

* Merging mainline - WIP

AVX2 and CUDA appear to work.
CUDA performance seems slightly (~1-2%) lower as it is so often
the case with llama.cpp/ggml after some "improvements" have been made.

* Merging mainline - fix Metal

* Remove check

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-07-27 07:55:01 +02:00

31 lines
726 B
Plaintext

#version 450
#include "common.comp"
layout(local_size_x = 1) in;
layout(binding = 0) buffer restrict readonly tensorIn { float in_[]; };
layout(binding = 1) buffer restrict writeonly tensorOut { float out_[]; };
layout(push_constant) uniform PushConstants {
uint inOff;
uint outOff;
uint n_past;
int ne00;
int ne01;
} pcs;
void main() {
const uint i02 = gl_WorkGroupID.z;
const uint i01 = gl_WorkGroupID.y;
const uint i00 = gl_WorkGroupID.x;
const uint index = i02*pcs.ne01*pcs.ne00 + i01*pcs.ne00 + i00;
if (i00 > pcs.n_past + i01) {
out_[index + pcs.outOff] = uintBitsToFloat(0xFF800000);
} else {
out_[index + pcs.outOff] = in_[index + pcs.inOff];
}
}