composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 10:09:41 +00:00

Author	SHA1	Message	Date
JD	69d47bf04f	Initial Setup for CI (#86 ) * add docker file and make default target buildable * add Jenkinsfile * remove empty env block * fix package stage * remove render group from docker run * clean up Jenkins file * add cppcheck as dev dependency * update cmake file * Add profiler build stage * add hip_version config file for reduction operator * correct jenkins var name * Build release instead of debug * clean up Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `2778e99758`]	2022-02-18 21:44:11 -06:00
ltqin	0d55b15355	NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73 ) * add fwd bf16 conv * change tunning parametor * add int8 for conv fwd * remove comments * change tunning parametor for int8 * change init int8 example * add test for conv2d fwd * change device operation file pos because merge develop * fwd int8 use reference * test_conv_fwd use reference * add braket for if statement * rename fwd example name * remove StaticBufferOfVectorTypeV2 * tweak example Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `880fbee957`]	2022-02-11 20:06:40 -06:00
zjing14	53c79a56c6	Add small tile size for fp16/fp32 and NN layout (#80 ) * add DeviceGemmSplitKXdl * add file device_gemm_splitk_xdl.hpp * set c matrix zero * using atomic * add all tuning parameter to f32 mkkn * grid size change to 720 * add tunning parameter for NT * add tunning parameter for TN * add tunning parameter for TT * add m=96tunning parameter * add lost config * debug * fix sweep * add failed tuning params * fixed sweep logic * clean * add padding to M/N for irr tile size * clean code * add element wise operation * fixed MPerBlock=96 * remove marco for slpitk swtich * add test * add new line at the end of device_gemm_xdl_instance.hpp * remove step hack * seperate split-k instance files * add tunning parameters * change disired grid size to parameters * remove slice length * add desiredgridsize parameter to ckProfiler * add losting file device_gemm_xdl_splitk_instance.hpp * change desired gride size to kbatch * format * format * clean up * add selection of device_instances * clean code * clean code * add small tile size in fp16 nn * test for rocm 4.5 * merge develop * clean * clean * clean * remove no-use code * add padding switch to device_gemm_xdl * add padding switch for ksplit fp32 * clean * clean * add files * rename * Update profiler.cpp * format Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: ltqin <letao.qin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `20a672d0b8`]	2022-02-11 15:49:06 -06:00
zjing14	4795d9803d	Batched GEMM for fp16 (#79 ) * prepare host for batched_gemm * init commit of batched kernels * fixed * refine transform with freeze * m/n padding * fixed a bug; clean * add small tiles * clean * clean code * clean code * add nt, tn, tt layout * add missing file * use StaticBufferTupleOfVector instead * add reference_batched_gemm * fixed a macro [ROCm/composable_kernel commit: `b53e9d08ed`]	2022-02-11 09:36:52 -06:00
rocking5566	01020d0db4	Support alpha beta scaling for GEMM (#78 ) * [What] Add 2d version of bias, prepare to implement alpha / beta scaling * Add alpha / beta functor * Refine parameter of example * [What] Use real type instead of template [Why] Prevent implicit cast * Rename parameter for general operator * Remove redundant comment * Fix compile error Co-authored-by: rocking <chunylai@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `6f928a0876`]	2022-02-11 00:48:41 -06:00
Anthony Chang	8f6cc9df99	fix build breaks (#81 ) - device_gemm_xdl_c_shuffle function signature matches split-k - retire host_driver since it is no longer maintained - linter error (unused variable) Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `904cbe2a8f`]	2022-02-10 23:52:19 -06:00
Chao Liu	fb387c0e82	GEMM+Bias+ReLU+Add (#76 ) * tweak conv for odd C * update script * clean up elementwise op * fix build * clean up * added example for gemm+bias+relu+add * added example for gemm+bias+relu * add profiler for gemm_s_shuffle; re-org files * add profiler * fix build * clean up * clean up * clean up * fix build [ROCm/composable_kernel commit: `823657ed12`]	2022-02-06 22:32:47 -06:00
ltqin	b5cd5f7005	References for conv2d fwd bias relu and add (#75 ) * add reference * clean up * add reference for conv * rename Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `690c75a7eb`]	2022-02-03 22:29:58 -06:00
zjing14	569b084436	Replace llvm Intrinsics with clang buildins (#65 ) * test mfma builtins * add fp16 buildins * add int8 buildins * add bfl16 buildins * simplify host conv forward * clean * clean [ROCm/composable_kernel commit: `6d92959ad3`]	2022-02-02 23:13:09 -06:00
ltqin	998217be22	add split-k GEMM (#59 ) * add DeviceGemmSplitKXdl * add file device_gemm_splitk_xdl.hpp * set c matrix zero * using atomic * add all tuning parameter to f32 mkkn * grid size change to 720 * add tunning parameter for NT * add tunning parameter for TN * add tunning parameter for TT * add m=96tunning parameter * add lost config * add element wise operation * fixed MPerBlock=96 * remove marco for slpitk swtich * add test * add new line at the end of device_gemm_xdl_instance.hpp * remove step hack * seperate split-k instance files * add tunning parameters * change disired grid size to parameters * remove slice length * add desiredgridsize parameter to ckProfiler * add losting file device_gemm_xdl_splitk_instance.hpp * change desired gride size to kbatch * format * format * clean up * add selection of device_instances * clean code * fix build issue Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Jing Zhang <jizhan@amd.com> [ROCm/composable_kernel commit: `4be7f0198e`]	2022-02-02 22:47:27 -06:00
rocking5566	79706b02df	Do not hardcode the function parameter, use template instead. (#72 ) * Do not hardcode the function parameter, use template instead. * [What] Remove AThreadTransferSrcResetCoordinateAfterRun and BThreadTransferSrcResetCoordinateAfterRun in host API [Why] "C_Shuffle" version is supposed to be similar to the vanilla one * Fix typo Let DeviceGemmXdl_C_Shuffle use kernel_gemm_xdlops_v3r1 [ROCm/composable_kernel commit: `ca47a6cfe2`]	2022-01-24 22:44:13 -06:00
rocking5566	268965e555	Add gemm_shuffle host api (#71 ) * [What] 1. Add DeviceGemmXdl_C_Shuffle 2. Revise example of gemm_xdl [Why] Prepare to add shuffle version of D = alpha * (A * B) + beta * C [How] Imitate DeviceGemmXdl and device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp [ROCm/composable_kernel commit: `4d40b1974e`]	2022-01-21 00:31:17 -06:00
Chao Liu	f4e18d16ad	Fix building issue for examples (#66 ) * fix build issue [ROCm/composable_kernel commit: `6260ced2f3`]	2022-01-17 23:49:04 -06:00
Chao Liu	886680ae94	Fusion Conv+Bias+ReLU(+Add) (#62 ) * fix relu * clean up * clean up * adding 1x1 conv * adding 1x1 conv * added 1x1 conv * refactor * refactor * refactor * added profiler for conv+bias+relu+add * clean up * adding conv+bias+relu * adding conv+bias+relu * added conv+bias+relu * Update README.md * update cpu verification * adding c shuffle * update static_tensor for dealing with invalid element * adding c shuffle * debugging * fix bug * convert to fp16 before shuffle * shuffle more than one M/NRepeat * clean up * remove coordinate step hack from GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v3r1 * clean up * remove coordinate step hack from all gridwise gemm xdl * clean up coordinate step hack * clean up coordinate step hack * ThreadwiseTensorSliceTransfer_v3r2 support pointwise op on both src and dst * adding output shuffle in conv+bias+relu+add * update * added conv+bias+relu+add with c shuffle * added conv+bias+relu+add with c shuffle * fix forward_sweep bugs in threadwise copy * clean up * refactor * clean up * clean up * added conv_c_shuffle+bias_relu * clean up * added conv+bias+relu+atomic_add * clean up * clean up * clean up * clean up * clean up * clean up * misc fixes; add 1x1 specialization * clean up * delete unused device op * clean up * add support for odd C value [ROCm/composable_kernel commit: `acbd7bd7c5`]	2021-12-26 07:43:42 -07:00
Chao Liu	370a49bb29	manually apply bug fix changes in pr #63 (#64 ) * Bug in BlockwiseGemmXdlops_k0mk1_k0nk1_m0n0m1n1m2m3m4n2_v1::MakeCGridDescriptor_M0_N0_M1_N1_M2_M3_M4_N2() * Bug in ThreadwiseTensorSliceTransfer_v1r3 logic for calculating "forward_sweep" [ROCm/composable_kernel commit: `a4f24233e5`]	2021-12-12 18:05:51 -06:00
Chao Liu	1feebc85ac	fix ReLU formula (#61 ) * fix relu * clean up * clean up [ROCm/composable_kernel commit: `fd3d907a80`]	2021-12-04 16:05:29 -06:00
Chao Liu	4a141b5a04	GEMM/Conv+BiasAdd+ReLU+Add (#55 ) * gemm+activation * move C pointwise operation into threadwise copy * add pointwise operation to A/B matrix * update ckProfiler * adding bias add * adding bias add * adding bias add * added bias add; worked around compiler issues * clean up * clean up * Update README.md * Update README.md * Update README.md * clean up * add conv_xdl example * adding conv_xdl_bias_relu_add example * add conv+bias+relu+add, but has register spill issue * tweak * tweak * refactor * Update README.md update readme for example/2_gemm_xdl_bias_relu_add * clean up * Update README.md update readme for example/3_conv_xdl * Update README.md [ROCm/composable_kernel commit: `41cdd3801a`]	2021-12-02 20:07:37 -06:00
Jing Zhang	fa2a2f6c9b	renaming/comments [ROCm/composable_kernel commit: `d7a0a3f94c`]	2021-12-02 23:37:57 +00:00
Jing Zhang	0bc1ceba3c	add static_buffer_v2 zero out [ROCm/composable_kernel commit: `2cbb897638`]	2021-12-02 05:54:19 +00:00
Jing Zhang	14ad77d429	fixed c_buffer alloc [ROCm/composable_kernel commit: `d798c9b8c6`]	2021-12-02 05:03:03 +00:00
Chao Liu	46ec6b3ef0	fix layout naming convention (#56 ) [ROCm/composable_kernel commit: `4041850f11`]	2021-11-30 09:10:55 -06:00
Chao Liu	7f14c82cd7	added test for magic number division (#58 ) [ROCm/composable_kernel commit: `237d4ca03f`]	2021-11-30 09:09:28 -06:00
zjing14	220bc28498	add args for packed gemm (#54 ) [ROCm/composable_kernel commit: `567f5e9c5f`]	2021-11-24 12:33:55 -06:00
Chao Liu	9bf8189530	Use __builtin_memcpy to implement bit_cast and for accessing vector from pointer of scalars (#53 ) * reworking vector_type * use __builtin_memcpy for bit_cast and vector access of scalar pointer * clean up [ROCm/composable_kernel commit: `64350affc5`]	2021-11-18 09:11:15 -06:00
zjing14	40117fe4ef	v5r1 fusion kernels for inference (#49 ) * init * refactor for 1x1 * rename e0_e1 * add e1 with bugs * debug * fixed * fixed e1 * add timer * imprve threadwise gemm with dot2 * add e2 * tuning * seperate c2 * add nhwc * restore nchwc * clean * opt * fixed; tuning * add BGlobalMoveSliceWindowStepHacks{} * tuning * repeat running * adjust * merge v5r1 nchwc * add adaptors * split k0 k1 in c_thread_grid * split h and w * remove v5r1 nhwc * clean for pr * remove host_conv_add * clean code * clean * add dynamic support * static mode * test static * add conv+add fusion * fixed validation * naming fix * use activ_enum * make static * refactor conv_add for InMem::add * add bias * add conv_out * add configurable makeddesc * add maxpool fusion * add maxpool host for validation * enable static desc * conv-only use v5r1_add * test * test * for binary dumps * fixed incorrect results due to typo * clean * debugging maxpool * workaround with offset trick * clean code * modularize ops of fusion * add gridwise_gemm_v3 * create seperate fusion fun * enable dynamic mode of conv and conv+resize_add * add dynamic mode of maxpool * add pass by point * add activ_type as arguments * merge develop * clean * reset config to old default Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `970fa3e92e`]	2021-11-18 08:34:07 -06:00
zjing14	041b8a226d	Fixed bfp16 host_conv_fwd (#52 ) * fixed bfloat16 issues * refactor type_convert * fixed host_convolution_forward for ushort Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `a651ea4f7a`]	2021-11-18 08:10:56 -06:00
zjing14	1e7102575b	fixed multiple definition issue of bfp16/fp32 conversion function when building ckProfiler (#51 ) * fixed bfloat16 issues * refactor type_convert Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `0a66c54e95`]	2021-11-16 15:44:17 -06:00
Jing Zhang	ea6fa92eea	updated bfloat16_to_float [ROCm/composable_kernel commit: `89e1ebd4d5`]	2021-11-16 18:01:25 +00:00
zjing14	456f5306df	Add bfp16/int8 support into XDL GEMM operator (#50 ) * init StaticBufferV2 * clean * adopt old output stage for staticBufferV2 * clean * remove hack * clean * clean * add parameters * clean code * move c_buffer alloc into blockwise gemm * add adaptors for m/n_thread_data_on_grid * tweak gemm * adjust blockwise_gemm_xdlops * tweak * update conv * update script * adding bwd 1x1 * update script * adding 1x1 bwd * debugging bwd 1x1 failure * update script * update script * test * test v100 * add bf16_1k * clang-format * clean * add bfp16 for gfx908 * add verification * clean up * clean code * restore bfl16 * clean * add bfp16 support into gemm_driver * apply new generator to other drivers * add int8 support * cleanb * clean * clean * clean Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com> Co-authored-by: root <root@hayabusa6111.amd.com> [ROCm/composable_kernel commit: `3737bb039a`]	2021-11-15 10:24:39 -06:00
Chao Liu	8791d26e52	FP16 data in-register transpose (#41 ) * start fixing 16bit data packing * adding StaticTensor * adding StaticTensor * adding StaticTensor * add missing constexpr * adding static tensor * adding static tensor * adding transpose * add inline asm for transpose 2x2 of half_t * add general transpose_vectors(), but have unnecessary register initialization using v_mov * fix unnecessary register initialization in transpose_vector by using more pass-by-reference * add hardcoded logic for NHWC wrw * improve asm for v_pack * make ThreadwiseTensorSliceTransfer_v3r2 support any tensor * tweak * reorganize file [ROCm/composable_kernel commit: `b491ebf384`]	2021-11-15 10:05:58 -06:00
Chao Liu	2f5ccb68f5	ckProfiler and device-level XDL GEMM operator (#48 ) * add DeviceGemmXdl * update script * fix naming issue * fix comment * output HostTensorDescriptor * rename * padded GEMM for fwd v4r4r4 nhwc * refactor * refactor * refactor * adding ckProfiler * adding ckProfiler * refactor * fix tuning parameter bug * add more gemm instances * add more fp16 GEMM instances * fix profiler driver * fix bug in tuning parameter * add fp32 gemm instances * small fix * refactor * rename * refactor gemm profiler; adding DeviceConv and conv profiler * refactor * fix * add conv profiler * refactor * adding more GEMM and Conv instance * Create README.md Add build instruction for ckProfiler * Create README.md Add Readme for gemm_xdl example * Update README.md Remove build instruction from top most folder * Update README.md * clean up [ROCm/composable_kernel commit: `e823d518cb`]	2021-11-14 11:28:32 -06:00
ltqin	1f6ca26819	[Bug Fix] GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4 loop issue (#44 ) * change method computering kpad * remove unusing variable: batchlen * change KPerBlock to K0PerBlock * fix bug for k0 == k0perblock * fix bug for get k0 index * use math::integer_divide_ceil Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `6014185ac6`]	2021-10-27 09:39:18 -05:00
Chao Liu	d3dc0bcbe3	Merge pull request #46 from ROCmSoftwarePlatform/miopen_downstream_all update ck from miopen ck_upstream [ROCm/composable_kernel commit: `3e9113707f`]	2021-10-27 09:07:38 -05:00
ltqin	1ae6a49f43	Merge branch 'develop' into miopen_downstream_all [ROCm/composable_kernel commit: `211dae8229`]	2021-10-27 13:34:19 +08:00
Jun Liu	0c17233608	[Composable Kernel] update develop branch code to ck_upstream Merge pull request #1236 from ROCmSoftwarePlatform/develop [ROCm/composable_kernel commit: `5890e30076`]	2021-10-25 19:49:17 -07:00
Chao Liu	3fcbcb776c	fix bug in gridwise gemm xdlops v2r3 (#45 ) [ROCm/composable_kernel commit: `d5297abae9`]	2021-10-21 16:42:24 -05:00
Chao Liu	02fc6ba269	bug fix (#39 ) [ROCm/composable_kernel commit: `c3018794b4`]	2021-10-19 18:43:10 -05:00
ltqin	0d74bff825	add nchw atomic , nhwc and nhwc atomic method for backward weight (#30 ) * add add new algorithm from v4r4r2 * program once issue * add split k functiion * redefine code * add a matrix unmerge * add b matrix unmerge k0 * trans a and b to gridegemm * nhwc init * no hacks and vector load * add hacks * modify some parameter * fix tuning prometer for fp32 * fix tuning prometer for fp16 * start change gridwise k split * init ok * revome a b matrix k0mk1 desc in grid * carewrite lculate gridsize * add kbatch to CalculateBottomIndex * remove some unused funtion * add clear data function before call kernel * out hacks * in hacks * rename device convolution file and function name * modify kBatch value * fix some tuning code * start from v4r4 nhwc * nhwc atomic is able to run * just for fp32 * enable nchw atomic * tweak * tweak * re-arrange gridwise gemm hot loop for wrw * add wrw v4r5 * v4r4r5 fp16 * v4r4r4 fp16 * v4r4r2 fp16 * V4R4R4XDLNHWC fp16 * V4R4R2XDLATOMICNCHW fp16 * adjust for fp16 * input gridsize * change kbatch to gridsize * testing wrw * clean up * k_batch to gridsize * fix bug * wrw v4r4r4 kbatch change to gride size * wrw v4r4r2 kbatch change to gride size * after merge , change gridwise gemm v2r4 * change MakeCBlockClusterAdaptor * other method use new gridwise gemm * clean up * chapad method nge to make_right_pad_transform * kbatch out from transform function * clean up and fix bug * fix bug * using function type reduce template parameters * using auto replace define fuction type * clean up Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Jing Zhang <jizhan@amd.com> [ROCm/composable_kernel commit: `fd49ff8080`]	2021-10-19 18:42:34 -05:00
Qianfeng	afe31f1e41	[MIOpen Downstream] Fix Reduction Kernel (#34 ) * Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel * Fix with regard to implementing GetZeroVal() in both kernel and host * Avoid convert to compType from dstDataType before writting the output value * Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator * Add CONSTANT decorator for descriptor read buffer * Use get_thread_local_1d_id() for thread local Id * Rename GetZeroVal() to GetReductionZeroVal() in the kernels * Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp * Occasional tiny simplification and update in the kernel files * Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers * Update to remove OpenCL tidy checking failures * Update for better readability * Remove unused codes and not-needed template parameters in the kernel wrappers Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `b2dc55f82c`]	2021-10-06 14:43:17 -05:00
Chao Liu	720cf3d6b2	Tweak GEMM kernel (#38 ) * add parameters * tweak gemm * tweak * update conv * update script * adding bwd 1x1 * update script * adding 1x1 bwd * debugging bwd 1x1 failure * update script * update script * test * test v100 * clean up [ROCm/composable_kernel commit: `b3e8d57d51`]	2021-10-06 11:12:36 -05:00
zjing14	8159394bfa	Add VectorType support into StaticBuffer (#27 ) * init StaticBufferV2 * clean * adopt old output stage for staticBufferV2 * clean * remove hack * clean * clean * clean code * move c_buffer alloc into blockwise gemm * add adaptors for m/n_thread_data_on_grid * adjust blockwise_gemm_xdlops * reorder ops in GEMM hot loop Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `846f462bd4`]	2021-10-06 10:13:52 -05:00
Qianfeng	d1c185cde7	[Enhancements] Several bugfixes and refactoring of dynamic generic reduction (#1156 ) * Squashed 'src/composable_kernel/' content from commit `aa8c98119` git-subtree-dir: src/composable_kernel git-subtree-split: `aa8c981198` * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files * Squashed 'src/composable_kernel/' changes from aa8c98119..1d8dbe3c5 `1d8dbe3c5` Update develop (#5) (#6) `8ce0728ae` Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile `f017e3448` refactor `9eb35eec8` refactor `041c48a06` rename git-subtree-dir: src/composable_kernel git-subtree-split: `1d8dbe3c57` * fix * refactor * remove online compilation from CK * refactor * fix * add ctest * tidy * add tidy * tidy * tidy * tidy * tidy * tidy * tidy * tidy * tidy * tidy * add c-style pointer cast * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast * fix clang warning suppression * tidy * suppress cppcheck * fix enum issue * revert chagnes to hip build * fix kernel filename * update CK build script * rename * rename * make innner product compatiable on gfx900 * Update src/include/miopen/solver/ck_utility_common.hpp Co-authored-by: JD <Jehandad.Khan@amd.com> * compiler parameter use stream * use int instead of index_t in kernel wrapper * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element * refactor * refactor * change cmakelist * change ck common utility * fix * Squashed 'src/composable_kernel/' changes from 1d8dbe3c5..887df7b12 `887df7b12` Merge pull request #16 from ROCmSoftwarePlatform/develop `7e6b9fb7a` Merge pull request #14 from ROCmSoftwarePlatform/miopen_downstream_init_integration `833701f40` Merge pull request #8 from ROCmSoftwarePlatform/miopen_downstream_init_integration `e25c4c2f1` refactor `27048b771` refactor `65e834905` DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element `b3759bf6a` use int instead of index_t in kernel wrapper `04ed8ddf4` compiler parameter use stream `9f40048d1` make innner product compatiable on gfx900 `f7df8c7ee` rename `1e312fef1` rename `c9869a5ac` update CK build script `c825eb6b1` fix kernel filename `594b1cf91` fix enum issue `286475c6b` tidy `a7c943aba` fix clang warning suppression `d49e0ddcb` vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast `314b9d78e` add c-style pointer cast `d4b35bd09` tidy `cb2edf210` tidy `4771cfa34` tidy `eb7f9f35b` tidy `b14b5d337` tidy `9c589af82` tidy `e8def0e77` tidy `9e2c3c776` tidy `51ab4abaf` add tidy `cba13cb6b` fix `5ed1b840a` remove online compilation from CK `5856acc10` refactor `7221bedc9` Merge commit '437cc595c6e206dfebb118985b5171bbc1e29eab' into composable_kernel_init_integration_v3 `0bb6c85c2` Merge pull request #7 from ROCmSoftwarePlatform/master `a0b9a203a` Update develop (#5) `898807d60` add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files git-subtree-dir: src/composable_kernel git-subtree-split: `887df7b129` * Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel * Fix with regard to implementing GetZeroVal() in both kernel and host * Avoid convert to compType from dstDataType before writting the output value * Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator * Add CONSTANT decorator for descriptor read buffer * Use get_thread_local_1d_id() for thread local Id * Rename GetZeroVal() to GetReductionZeroVal() in the kernels * Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp * Occasional tiny simplification and update in the kernel files * Update in src/reducetensor.cpp for consistent IDs passing to the kernel * Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers * Update to remove OpenCL tidy checking failures * Small updates in src/reducetensor.cpp * Update for better readability * Remove unused codes and not-needed template parameters in the kernel wrappers Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: JD <Jehandad.Khan@amd.com> [ROCm/composable_kernel commit: `dfb80c4e39`]	2021-09-29 08:12:11 -07:00
Jun Liu	47bf3c6a3d	Merge pull request #1165 from ROCmSoftwarePlatform/develop Merge develop into CK_upstream (Please don't squash when merging) [ROCm/composable_kernel commit: `8557901d02`]	2021-09-21 15:52:12 -07:00
Chao Liu	c420cdb1a4	Merge pull request #31 from ROCmSoftwarePlatform/miopen_downstream-dynamic_reduction_pr [MIOpen Downstream] Dynamic Reduction PR [ROCm/composable_kernel commit: `f305bebdc3`]	2021-09-21 11:59:23 -05:00
Chao Liu	2a43644437	Merge remote-tracking branch 'origin/develop' into miopen_downstream-dynamic_reduction_pr [ROCm/composable_kernel commit: `b725e3fc84`]	2021-09-21 11:55:26 -05:00
Chao Liu	78178aecec	:Merge remote-tracking branch 'origin/develop' into CK_upstream [ROCm/composable_kernel commit: `df0d68106e`]	2021-09-20 20:44:01 -05:00
Chao Liu	f1d7806427	Add a version of Merge transform that use integerdivision and mod (#25 ) * add Merg_v3_division_mod * refactor [ROCm/composable_kernel commit: `f3acd2510b`]	2021-09-05 12:57:57 -05:00
Chao Liu	079adb1e7d	GEMM driver and kernel (#29 ) * add gemm driver * tweak * add gemm kernel: mk_kn_mn and km_kn_mn * tweak * add GEMM km_nk_mn * fix comment [ROCm/composable_kernel commit: `19613902b5`]	2021-09-05 12:41:28 -05:00
ltqin	2f4f6427f5	Backward weight v4r4r2 with xdlops (#18 ) * start * modify transformat * modify device convolutiion * modify host * added host conv bwd and wrw * remove bwd, seperate wrw * clean * hacall k to zero * out log * fixed * fixed * change to (out in wei) * input hack * hack to out * format * fix by comments * change wei hacks(wei transform has not merge) * fix program once issue * fix review comment * fix vector load issue * tweak Co-authored-by: ltqin <letaoqin@amd.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com> [ROCm/composable_kernel commit: `627d8ef35a`]	2021-08-30 22:49:17 -05:00
Chao Liu	a44dd0d851	Misc fixes (#24 ) * use cast_pointer_to_generic_address_space() in v6r1 kernel wrapper, DynamcBuffer and buffer_load take customized invalid-element-value, add buffer_load/store for fp64 * use remove_cvref_t [ROCm/composable_kernel commit: `10bb811060`]	2021-08-26 20:05:19 -05:00

1 2 3 4 5 ...

500 Commits