composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 09:08:35 +00:00

Files

Anthony Chang 63fd5da637 Single-kernel GEMM + layernorm (#263 )

* dump lds content in appropriate precision type

* add squared add reduction op; allows sq sum

* initial stub from regular gemm impl

* layernorm example code & host verification

* initial layernorm implementation

* tidy up

* make C0 precision type consistent with C

* clang-tidy and additional comments

* tighten up example code

* account for extra flops/bytes from normalization

* clang-format

* c0 bias/beta/gamma now have its own precision type

* AccElemOp for gemm outputs prior to feeding to layernorm

* update workgroup mapping

* rename kernel template param to reflect its dual use

* use LDS mem pool for reduction workspace

* change cshuffle precision type to f16; clean up

* clang-format

* correct naming

* explicit cast

* fully implemented gemm + bias + activation + add + norm

* activation in correct order

* reflect reduction API's recent change

* amend

* clean up; add comment

* keep up with recent changes in reduction API

* format

* resolve merge conflicts

Co-authored-by: Chao Liu <chao.liu2@amd.com>

2022-07-01 01:38:00 -05:00

reduction_functions_threadwise.hpp

Single-kernel GEMM + layernorm (#263 )

2022-07-01 01:38:00 -05:00

threadwise_contraction_dl.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_gemm_dlops_v3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_set.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v3r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v3r3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v4r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v5r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r2.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v7.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer.hpp

Standalone sweep once softmax kernel w/ ckProfiler (#295 )

2022-06-30 12:08:50 -05:00