composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-07 08:15:04 +00:00

Files

Rostyslav Geyyer b076a02ad2 Optimize bf16 conversion (#664 )

* Add TypeConvert class and start refactoring

* Refactor TypeConvert as a struct

* Get back to template functions type_convert

* Add a type_convert_bf16_rtn, set rtz as default

* Clean up

* Add UnaryConvertPrecision struct for high-precision workloads

* Format

* Update type_convert to UnaryConvert on threadwise level

* Update UnaryConvertPrecision

* Format

* Fix chmod

* Add a flag to pick converion method

* Format

* Remove the added flag

* Merge elementwise op with type conversion

* Move type_convert to elemwise op, update the op

* Update type_convert_precision -> bf16_convert_rtn

* Clean up

* Update comments

* Update the CK_WORKAROUND_DENORM_FIX flag handling

* Update the unneeded op to work but warn user

* Remove the message

* Use a PassThrough instead of ConvertBF16RTN to calcaulate reference

* Format

* Add missing include

2023-05-04 10:25:47 -05:00

reduction_functions_threadwise.hpp

Single-kernel GEMM + layernorm (#263 )

2022-07-01 01:38:00 -05:00

threadwise_contraction_dl.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_gemm_dlops_v3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_set.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v3r1.hpp

Optimize bf16 conversion (#664 )

2023-05-04 10:25:47 -05:00

threadwise_tensor_slice_transfer_v3r3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v4r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v5r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r1.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r2.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v6r3.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer_v7.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

threadwise_tensor_slice_transfer.hpp

Generate output using Doxygen / Breathe (#598 )

2023-03-06 11:39:16 -06:00

threadwise_welford.hpp

Batchnorm-forward implemented using welford method to calculate variance (#403 )

2022-10-27 18:52:54 -06:00