composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 17:00:18 +00:00

Author	SHA1	Message	Date
Haocong WANG	091570f594	fix 3rd dword of buffer source descriptor (#659 )	2023-03-29 19:03:55 -05:00
carlushuang	bb5530af91	simplify karg in device/grid of split-k op (#644 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout	2023-03-29 19:03:07 -05:00
Rostyslav Geyyer	dbd8f94bef	Add a denorm test fix (#603 ) * Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-29 15:05:32 -05:00
rocking5566	389e84a83b	Conv + quantization + tanh (#645 ) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-29 14:50:23 -05:00
Haocong WANG	4e097ad283	Add CMake Option "USE_OPT_NAVI3X" (#647 ) * Add CMake Option "USE_OPT_NAVI3X" * remove navi3x opt compile option from cmake script	2023-03-29 14:07:33 -05:00
Sam Wu	88d474323b	Separate bibtex requirement from rocm-docs-core (#656 ) * separate bibtex requirement from rocm-docs-core * point requirements to source rocm-docs-core repo	2023-03-27 17:14:36 -06:00
Sam Wu	f80776d937	standardize docs (#655 )	2023-03-23 20:58:59 -07:00
Haocong WANG	e5376be4ac	[Navi3x] Fix Gridwise_multiple_d operation (#649 ) * Add CMake Option "USE_OPT_NAVI3X" * fix bug	2023-03-23 11:22:10 -05:00
Po Yen Chen	fe96e8fbf2	Reduce group & batch of the tested convolutions (#648 )	2023-03-22 10:49:11 -07:00
Illia Silin	36750a5763	Get rid of XDL parameters in WMMA kernel string. (#646 ) * remove XDL parameters from WMMA kernel string * get rid f two more parameters	2023-03-22 08:05:48 -07:00
Dan Yao	8a659a2e4c	rtn in ternary way (#632 ) * rtn in ternary way * Check both flags to preserve NaN * Format * Rearrange flag1 * Apply suggestions from code review Co-authored-by: Ronan Keryell <ronan@keryell.fr> --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Ronan Keryell <ronan@keryell.fr>	2023-03-20 14:30:24 -05:00
ltqin	6ae12434d2	workaround 637 (#640 ) * add workaround 637 * format * change id --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-20 11:49:31 -05:00
Rostyslav Geyyer	fa998675fc	Update cmake-ck-dev.sh script (#641 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2023-03-15 18:38:11 -05:00
rocking5566	16dc18e0f9	gemm/Conv xdlops + dlops quantization (#625 ) * Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit `820869182f`. * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-15 15:29:40 -05:00
Adam Osewski	a2d5ca8e95	Device Op GroupedGemmMultipleD + example fp16 (#633 ) * Pass shared mem pointer as pointer to void. * Device Op GroupedGEMM Multiple D * Example for grouped gemm multiple d. * Add MI200 to supported archs. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-15 11:22:59 -05:00
Rostyslav Geyyer	c10a6e8293	Add layout check to IsSupportedArgument (#627 ) * Add layout check to IsSupportedArgument * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-15 11:12:12 -05:00
Illia Silin	14b3504d95	Update GetTypeString function to generate unique kernel IDs. (#638 ) * make conv_fwd_bias_activation kernel id unique * add more parameters to conv and gemm kernel names * update GetTypeString for conv and gemm kernels * fix two more kernel strings	2023-03-15 10:44:42 -05:00
Haocong WANG	ea028ac65a	Fix arch limitation bug (#639 )	2023-03-15 07:44:13 -07:00
Rostyslav Geyyer	5b57ab96a8	Remove debug asserts (#629 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2023-03-10 17:34:44 -06:00
Haocong WANG	087e310589	[Navi3x] Multiple issue fix (#612 ) * Change gridwise gemm mD blockwise gemm to naive * RRR Gemm fix * Fix RCR gemm bug * Isolate wmma instructions * Update amd_inline_asm.hpp * Update amd_wmma.hpp * Update amd_wmma.hpp * fix syntax and update Jenkinsfile --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2023-03-10 17:04:28 -06:00
carlushuang	76fcdc60e9	fix a bug with non-dword-aligned offset when OOB, in case crash (#616 ) Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-09 08:07:24 -06:00
Illia Silin	0ccecc7c31	[gfx110x] support Navi3x architectures. (#628 ) * enable building on Nav31 * fix syntax * replace GPU_TARGETS with offload-arch * add gfx1102 rachitecture * fix typo * update changelog	2023-03-09 07:56:40 -06:00
Adam Osewski	9096b1c7b2	GroupedGEMM + Gelu client example/instances/profiler (#614 ) * Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-03-07 22:06:56 -06:00
Rostyslav Geyyer	1e59eb3be5	Add descriptions to avoid build issues (#619 ) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2023-03-06 13:11:58 -08:00
pmaybank	e4bf6d422e	Generate output using Doxygen / Breathe (#598 ) * Modify Doxygen config to pick up include directories recursively * Add DeviceMem struct to API Reference guide * Add classes that are used in Flash Attention kernel * Add a reference and config for generating bibliography Co-authored-by: Philip Maybank <Philip.Maybank@amd.com>	2023-03-06 11:39:16 -06:00
Illia Silin	e6cda9f8ff	Change the CI workflow. (#611 ) * add new parallel stage on navi node * dont run performance tests on navi, get rid of 9110 compiler * only run navi build when not doing QA * fix syntax * use navi21 label * dont stash profiler on navi nodes, scp deb package to ginger * disable tests on navi nodes * test posting a binary to ginger * add sshpass and use it to copy deb package * fix the scp example * fix syntax * debug the scp issues * add jenkins user to docker * dont try whoami * change jenkins uid and add user with uid=1002 * try scp from the last stage on micimaster * rename and stash the package, scp from micimaster	2023-03-02 11:24:31 -06:00
Illia Silin	59cbb20c7c	Suppress reserved-identifier warning and catch all warnings. (#608 ) * suppress the reserved-identifier warnings * keep BUILD_DEV=On and use -Werror by default	2023-03-01 12:08:13 -06:00
Haocong WANG	68dbf40a79	[Navi3x Bug Fix] fix typo to accept MNKPadding flag correctly. (#597 ) * fix a bug blocking wmma_gemm_multipleD * Utilize matrix padder in device_wmma_op * cosmetic change for gemmpadding format * clang format * Change gridwise gemm from FIFO to KMN loop fashion	2023-03-01 12:07:42 -06:00
Chao Liu	8f455615a8	Fast GeLU using built-in function (#587 ) * clean up * fast gelu using builtin function * clean * clean * clean * clean: * clean * fix compilation * clean * clean --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-02-26 23:19:11 -06:00
zjing14	209baee299	disable tensor contraction f64 on MI100 (#602 )	2023-02-23 16:59:37 -08:00
Rostyslav Geyyer	246ceee49e	Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505 ) * Add DeviceOp and examples * Format DeviceOp template arguments * Remove bf16 example * Format * Format * Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N * Refactor argument preparation * Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl * Rename device op file * Update include directive in the example file * Update descriptor preparation for grouped op * Update the argument * Update batch handling * Add gridwise gemm supporting batched input * Update blockwise indexing, working version * Update copyright year * Update check if argument is supported * Refactor and make consistent with xdl examples * Update check if argument is supported * Add changelog entry * Added comments on Dl op split_k>1 support --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-02-22 11:59:53 -06:00
ltqin	830d37a7d5	Grouped conv1d client example (#589 ) * add conv1d fwd client example * change 07_grouped_conv2d_fwd to 07_grouped_convnd_fwd * add conv1d bwd weight --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-02-22 11:55:21 -06:00
Illia Silin	bef0cb20db	fix a bug when building for gfx1030 target. (#591 ) * fix a bug while building for gfx1030 and add gfx1030 to targets * fix syntax	2023-02-16 13:54:08 -06:00
Illia Silin	584d233cfe	Build and archive deb packages. (#590 ) * build and archive deb packages * fix syntax * run QA to test building packages * apply cron to develop branch again	2023-02-16 13:11:23 -06:00
pmaybank	cb3fac4d2a	Sphinx doc (#581 ) * New docs directory with minimal config * Based on docs directory of rocBLAS * Config for running Doxygen then Sphinx to generate HTML * Add minimal content - intro to doc * Add some boilerplate sections to doc * content still needs to be done, * e.g., need to generate API documentation using Doxygen * need to write contributor guide * Start Softmax section of Support Primitives doc * Written as a test bed for typesetting math content * Need to decide how much detail to go into * add doc directories to git ignore file. * Minor edits - new line at EOF, change year in copyright notices * Port Markdown files to ReStructuredText * Copy Markdown files from pre-existing doc directory to docs directory * Convert to reStructured Text (rst) - section headings, links, tables have a different syntax in rst * New rst files added to index - can generate HTML with same style as HTML generated from rst files in previous commits * Intention is to make all the content in doc redundant and use rst throughout rather than mix of md and rst * Extend Softmax section of Primitives Guide * rename l to z * add material on applying softmax row-wise to matrix * define macro for diag operator (represents diagonal matrix) --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-02-15 17:17:46 -06:00
Illia Silin	19490ac4f7	Clean up kernel launch output (#569 ) * clean up output from kernel_launch * set RUN_WARMUP to 0 by default * split the warm-up into a separate issue --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-02-15 12:07:21 -06:00
zjing14	24c9ee1d22	Add contraction_fp64 example (#570 ) * add contraction_bilinear * add contraction_scale_xdl_fp64 * reduce tile size to avoid register spill --------- Co-authored-by: root <root@ctr-ubbsmc16.amd.com>	2023-02-15 12:00:58 -06:00
rocking5566	6a6163a3d1	Improve normalization (#580 ) * Sync the order of type string with template parameter * Add more instances * Check the vector size and remove redundant var * Extract var to static, prepare to separate sweep once kernel * Separate sweeponce flow and optimize the flow * 1. Rename AccDatatype in normalization to computeData 2. Rename AccElementwiseOperation to YElementwiseOperation in normalization * Remove useless code * Update naive variance kernel * Refine string * Fix typo * Support naive variance for device_normalization * Check the blocksize * Share the VGPR of x and y * Share the VGPR of gamma and beta * Add more instances * Support fp16 sqrt for experiment * Add CHANGELOG * Fix typo * clang-format	2023-02-15 11:59:35 -06:00
Haocong WANG	0cfda84d05	[Navi3x] Add Device Operations (#567 ) * wmma_op + unit test * add arch limitation to wmma test * change arch limitation * Refactor + Add all type unit test(int4 compile failed) * Add f32_16x16x16_bf16 unit test * tempsave * tempsave * tempsave * runtime bug, cannot find symbol * workaround for incorrect HIP warpSize return value * debugging * tempsave * Correctness OK, waiting for optimization * Tidy up + format * temp save * temp save, reproduce the v_bfi_b32 issue * add inline asm for wmmaop test * tidy up * clean some debug purpose code * discard some codes * clang format * clang format * compiler issue fixed + increase tile size * navi3x_multipleD+example * temp save * workable * batchedgemm[OK], groupconv[debug] * groupconv: Sanity check[OK], Performance[Bad] * navi3x_groupconv_need_optimization * format * Add arch limitation to all wmma examples * fix bug: example30 input conv args	2023-02-15 11:50:51 -06:00
Adam Osewski	e9fd122889	Conv3D FWD BWD WRW fp16 fp32 client examples (#559 ) * Conv3d bwd weight client example. * Update year in license * Convolution bwd data 3D fp16/fp32 client example. * Client example for convnd fwd fp16 fp32 * clang-format * Review remarks. * Fix compiler err. * Update data layout to standard one. * Add conv 3d fwd NDHWGC instances * clang-format * Conv3d fwd NDHWGC instances. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-02-15 11:16:47 -06:00
Illia Silin	06f1fc864c	Remove the workaround for bf16 attention tests. (#586 ) * remove workanround in bf16 attention test * clean up another workaround	2023-02-14 18:06:24 -06:00
Adam Osewski	8f42780fd6	GroupedGEMM more bigger tiles. (#577 ) * Adding more bigger tiles. * Remove failing instance. * Remove instances which that don't improve perf. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-02-13 10:06:24 -06:00
Illia Silin	0ac0f51ad6	enable batched_gemm_softmax_bf16 tests (#582 )	2023-02-10 13:00:37 -06:00
rocking5566	f7d28f3e4b	Gemm+layernorm instance, ckProfiler, client example (#568 ) * Add gemm + layernorm instance * Add ckProfiler * Add test * Add client example * Detect if user forger to set the workrspace * Use literal in the example * [What] use builtin function for sqrt [Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt() * check gemm vaildity in IsSupportedArgument * Add more testcases * Merge duplicated folder in client example * Print more infomation * Use better kernel parameter for MS problem size * clang format * Add constexpr for if condition and remove redundant include * Remove cstdlib and add constexpr	2023-02-09 15:02:55 -06:00
guangzlu	76d144fa7c	Add instance for elementwise normlization (#573 ) * added instances for large N * add instance for elementwise normlization * added supported restrict in device_elementwise_normalization_impl.hpp	2023-02-09 09:37:29 -08:00
Illia Silin	b63accee2b	adding the first draft of changelog (#571 ) * adding the first draft of changelog * second draft of changelog	2023-02-08 17:25:53 -06:00
ltqin	332ccc3367	Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576 ) * add instance for gemm bias softmax gemm * add client example * change CGridDesc_G_M_N to CGridDesc_G_M_O * add gridwise * change c grid name * device add d0s data * fix 08 client_example * add example 47_fused_attention * example output correct * add d0 to example * add d0 element op * rechange instance code * change Acc0ElementwiseOperation to C0DEElementwiseOperation * change example name * update instance for cdeelementwiseop * add bhalf_t ScaleAdd * add test * not surport geem1 bias * remove some ignore * fix test bug	2023-02-08 14:34:45 -06:00
Illia Silin	bb3d9546f1	Fix a couple more CI issues. (#578 ) * test the QA cron parameter for compiler commit * create separate dockers for latest and fixed amd-stg-open compiler versions * change groovy syntax * apply cron timers back to develop branch	2023-02-08 11:50:09 -06:00
Illia Silin	f73574ffdd	Fix CI issues. (#572 ) * switch to recent staging compiler as default for CI * fix the baseline query * roll back sqlalchemy to version 1.4.46	2023-02-06 13:15:45 -06:00
Rostyslav Geyyer	afdfef74f7	Add the markdown tutorial hello world (#563 ) * Add the markdown tutorial * Clean up --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2023-02-01 15:56:59 -06:00

1 2 3 4 5 ...

856 Commits