composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-29 19:28:33 +00:00

Author	SHA1	Message	Date
Illia Silin	aa0b05156f	Replace the obsolete offload-arch flags with GPU_TARGETS and fix a bug. (#437 ) * replace obsolete offload-arch flags with GPU_TARGETS * fix a build error for client app * replace commma with semicolon in GPU_TARGETS	2022-09-22 09:32:25 -05:00
rocking5566	4eba345f6e	Group norm (#417 ) * Add groupnorm example by layernorm 1. Reference is not ready 2. shape of gamma and beta need to be fix * Let shape of gamma and beta can be same as x * Modify test, instance and client example * [What] Fix bug of layernorm for greater than 2 dimension. [Why] We need to get upper length from merge transform instead of embed transform. * Add reference for groupnorm * Fuse sigmoid after groupnorm * [What] Rename original layernorm into layernorm2d [Why] Prepare to add groupnorm using layernorm5d * clang-format * Add groupnorm test * Refine error message * Add groupnorm ckProfiler * Test groupnorm kernel from device_instance * update example * upadte profiler * Fix test naming * Fix argc number * Move descriptor and sweeponce to argument for quick debugging Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-09-19 22:30:46 -05:00
Shaojie WANG	27858374ac	Conv bwd data multiple d (#404 ) * init commit of convnd bwd data * begin compiling example * have a first version that produce a right result * refine device level launch kernel code * add more instances in example and get right results * clang-format * format example file * add more instances * fix instances * adding conv_bwd_data multile_d * adding conv_bwd_data multile_d * adding conv_bwd multiple d * adding conv_bwd multiple d * adding conv_bwd multiple d * refactor * refactor * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * refactor * update conv fwd's bias impl * refactor * reorg file * clean up cmake * clean * clean * clean Co-authored-by: Chao Liu <lc.roy86@gmail.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-09-19 11:25:28 -05:00
Adam Osewski	3da5c19e62	Softmax client example (#396 ) * Update Softmax device operation interface. * Update ckProfiler. * Update Softmax UT. * Update example. * Client example. * Clang format Co-authored-by: Adam Osewski <aosewski@amd.com>	2022-09-06 12:22:48 -05:00
rocking5566	e1a3fff675	layernorm external api (#379 ) * Add layernorm client example * [What] Add default make install dir to gitignore [Why] client example need to make install	2022-08-24 18:43:43 -05:00
Qianfeng	53ea4713af	Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 ) * Implement multiple-reduction in one kernel (kernels, device ops, examples) * Add generic elementwise kernel and device interface * Add generator for normal-distributed data initialization * Add host refer implementation of batchnorm-forward and batchnorm-infer * Add examples for implementing batchnorm-forward and batchnorm-infer using generic kernels * Remove un-needed including in batchnorm example * Renaming generic_elementwise to elementiwise in kernel and device classes/functions * Change in gemm_layernorm examples to use DeviceElementwise instead of Device5AryElementwise * Change in exampe 19_binary_elementwise to use DeviceElementwise instead of DeviceBinaryElementwise * Change in device_cgemm_4gemm_xdl_cshuffle.hpp to use kernel_elementwise instead of kernel_binary_elementwise * Add DeviceElementwiseBase and use it in device_normalize_instance.cpp * Removing and renaming files * Update to synchronize gemm_layernorm client example to the generic element-wise device op API * Update to synchronize with the latest headers directory and HostTensorDescriptor interface renaming * Merge two static member functions in device_elementwise.hpp * Remove unary_elementwise_1d kernel and device	2022-08-15 10:11:02 -05:00
Chao Liu	500fa99512	Clean up conv example, Instances, profiler and test (#324 ) * convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test	2022-07-29 18:19:25 -05:00
Daming Feng	c5620ed0ca	minor fix in gemm client example (#328 )	2022-07-13 10:54:38 -05:00
Chao Liu	4fe9c393b8	N-D Tensor Contraction example, instance, and client example (#270 ) * adding contraction * add contraction example * update examle * update example * format * update readme * clean header * clean header * contraction with multiple D * rename * fix naming issue; add instances for contraction+bilinear * change assumed virtual layout of contraction; add client example * update example * update * contraction+scale * use type_convert * rename	2022-07-07 14:31:11 -05:00
Chao Liu	0dcb3496cf	Improve external interface for GEMM and GEMM+add+add+fastgelu (#311 ) * interface for GEMM and GEMM+add+add+fastgelu * rename namespace * instance factory * fix build * fix build; add GEMM client example * clean	2022-06-30 22:11:00 -05:00
rocking5566	12235112a1	external api for gemm + layernorm (#285 ) * Extract base class for elementwise * Refactor interface of DeviceGemmReduce. Do not use tuple in interface * [What] Rename d into reduce in gemm + reduction related code [Why] Prepare to add d term for add * Unify base class of gemm + reduce and gemm + bias + add + reduce * 1. Rename gemm_bias_add_reduce for external api 2. Refine cmake * Add normalize device operation * [What] Reorder the argument [Why] Because d0 is also the input of c. * Add type string * Add example of gemm_bias_add_layernorm via external api * Refactor example code * clang-format * Fix compile error * clang-format * Add external api for gemm_add_add_layernorm and normalize * Add client example * clang-format	2022-06-27 14:25:10 -05:00
Chao Liu	aebd211c36	External Interface (#304 ) * add client example * clean * clean * reorg * clean up profiler * reorg * clea * fix profiler * function for getinstances * update client example * update client example * update client example * update * update example * update Jenkins file * update cmake * update Jenkins	2022-06-26 19:39:02 -05:00

12 Commits