composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-18 01:28:27 +00:00

Author	SHA1	Message	Date
Bartlomiej Wroblewski	88bb9d5fac	Redesign the DPP8 GEMM kernel to use warp-wise component (#863 ) * Redesign the DPP8 GEMM kernel to use warp-wise component * Review: Improve error messages * Review: Remove unnecessary empty lines * Review: Fix M, N per thread names * Review: Rename mfma_input_type to dpp_input_type * Review: Fix tensor adaptor; remove unnecessary element * Review: Remove calls to dpp_gemm's MakeCDescriptor * Review: Add blockwise doc, change function names to include dimension names * Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file * Review: Add __restrict__ keywords * Review: Use MatrixPadder for padding A, B, C matrices * Review: Remove hardcoded datatypes * Review: Change names from FloatX to XDataType * Review: Introduce AK0 and BK0 instead of a single K0 * Review: Remove construction of dpp_datatypes object * Review: Rename DppInstrRunner to DppLanegroupGemm [ROCm/composable_kernel commit: `37a8c1f756`]	2023-09-06 11:44:09 -05:00
zjing14	3446ff1e7d	added padding of K into gemm_v2r3 (#887 ) * added kpad support into v2r3 * add generic instances * fixed comments * fixed mnk padding * Update device_batched_gemm_xdl.hpp --------- Co-authored-by: Jing Zhang <jizha@amd.com> [ROCm/composable_kernel commit: `3786bfe1cc`]	2023-09-06 10:15:52 -05:00
zjing14	88af65157c	Fixed fp8 gemm (#882 ) * add generic instances; fixed initi with fp8 * fixed comment --------- Co-authored-by: Jing Zhang <jizha@amd.com> [ROCm/composable_kernel commit: `a61b8b785e`]	2023-09-06 09:59:20 -05:00
Bartłomiej Kocot	2ec7a9084a	Add image to column kernel (#867 ) * Add image to column kernel * Add instances, tests, profiler, example * Add client example * Several fixes of image to column * Fix variable name in device_image_to_column_impl * Several fixes of image to column profiler * Fix num_btype calculation * Make new mesaurements for correct bytes calculation [ROCm/composable_kernel commit: `0077eeb3be`]	2023-09-05 10:11:40 -05:00
Bartłomiej Kocot	f64298eef4	Add nhwgc dl generic instances for grouped conv fwd (#879 ) [ROCm/composable_kernel commit: `0c9a1d25b3`]	2023-09-05 10:07:56 -05:00
zjing14	bc88b1a50b	Grouped Gemm with Fixed K and N with SplitK (#818 ) * move all arguments into device * add b2c_tile_map * add examples * add SetDeviceKernelArgs * dedicated fixed_nk solution * init client api * add grouped_gemm_bias example * add a instance * add instances * formatting * fixed cmake * Update EnableCompilerWarnings.cmake * Update cmake-ck-dev.sh * clean; fixed comments * fixed comment * add instances for fp32 output * add instances for fp32 output * add fp32 out client example * fixed CI * init commit for kbatch * add splitk gridwise * format * fixed * clean deviceop * clean code * finish splitk * fixed instances * change m_loops to tile_loops * add setkbatch * clean code * add splitK+bias * add instances * opt mk_nk instances * clean examples * fixed CI * remove zero * finished non-zero * clean * clean code * optimized global_barrier * fixed ci * fixed CI * removed AddBias * format * fixed CI * fixed CI * move 20_grouped_gemm to 21_grouped_gemm --------- Co-authored-by: Jing Zhang <jizha@amd.com> [ROCm/composable_kernel commit: `f5ec04f091`]	2023-08-31 09:22:12 -05:00
rocking	e422d088a3	MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861 ) * Add maxpool instances * Rename index pool to max pool. * Add maxpool bwd bf16 instances * Add avg pool bwd instances * Rename avgpool and maxpool to avg_pool3d and max_pool * Add bf16 pool fwd instances * Add max pool bwd to ckProfiler * Add avg pool3d bwd to ckProfiler * Add avg pool bwd test * Fix bug of reference pool fwd (dilation) * Fix bug of max pool bwd (dilation and initZero) * Support bf16 compute data type * Force compute type be f32. Because atomicAdd only support f32 * Add max pool bwd test * Rename folder * Rename pool * Add max pool bwd client example * Add avg pool bwd client example * Add missing workspace * clang format * Rename macro * remove useless header * remove useless layout [ROCm/composable_kernel commit: `866377de18`]	2023-08-31 21:01:50 +08:00
zjing14	17d03c86d2	Fp16/fp8 mixed-precision Gemm with multiply+add fusion (#865 ) * add compute_type * add multiply_add ckProfiler * add f8_fp16 support * clean * clean * fixed lds size calc * format --------- Co-authored-by: Jing Zhang <jizha@amd.com> [ROCm/composable_kernel commit: `31ea132aa2`]	2023-08-28 16:27:32 -05:00
Jun Liu	a2392f1887	[HotFix] add config and version files to pass on build info (#856 ) * experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file [ROCm/composable_kernel commit: `c8a8385fdd`]	2023-08-23 11:36:17 -07:00
zjing14	d91fc00427	add generic instances (#858 ) Co-authored-by: Jing Zhang <jizha@amd.com> [ROCm/composable_kernel commit: `8ebea3a56e`]	2023-08-23 09:18:10 -05:00
zjing14	3d88622dda	Ck profiler splitk (#857 ) * updated regular gemm * update ckProfiler * fixed gtests --------- Co-authored-by: Jing Zhang <jizha@amd.com> [ROCm/composable_kernel commit: `ca3115e7e8`]	2023-08-22 16:54:34 -07:00
Rostyslav Geyyer	6f9eeb3190	Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm (#853 ) * Add ComputeType arg to splitk device and gridwise ops * Update for gridwise op compatibility * Update bf16 and int8 splitk gemm examples with ComputeType * Add instances * Update ckProfiler for mixed precision cases * Add a mixed precision splitK gemm client example --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `eac50708d9`]	2023-08-22 09:34:49 -05:00
Bartlomiej Wroblewski	f81eb31934	Implement DPP8 based GEMM for Navi21 (#826 ) [ROCm/composable_kernel commit: `d4c84256f7`]	2023-08-14 15:46:27 -05:00
rocking	ae36ead7f5	Refactor pool fwd (#815 ) * Do not hardcode stride * devicePool2DFwd Inherit devicePool3DFwd * Move instance declaration out of common * Add dilation * use the pool3d rank, because pool2d inherit pooo3d * calculate Do Ho Wo for the dilation * Fix header name * Modify ckProfiler * Remove pool2d instance * Remove pool2d in profiler * Remove pool2d and add dilation * In to client example, this commit revise following: 1. Add dilation. 2. Use pool3d to implement pool2d * Refine naming and IsSupportedArgument() * Add dilation to maxpool bwd example * clang format * 1. Remove useless header 2. Fix copyright 3. Refine naming * Add layout parameter to pool fwd * clang format * Fix merge error * Fix compile error * Remove layout parameter in derived class * Refine changlog * Fix compile error * Fix compiler error * Add layout to external api and profiler [ROCm/composable_kernel commit: `f60f0a5e03`]	2023-08-15 02:25:28 +08:00
rocking	9c24b3c23e	Add Normalization splitk instances (#829 ) * Add normalization splitK to layernorm and groupnorm instances * Fix bug of GetKPerThread() * Refine naming * clang format [ROCm/composable_kernel commit: `03b8119e2e`]	2023-08-12 01:31:31 +08:00
Bartłomiej Kocot	ac574360c7	Enable grouped conv with small K or C (#822 ) * Enable grouped conv with small K or C * Add missing instances * Refactor grouped conv fwd instances * Fix fp16 instances since it supports src_per_vec %2 = 0 * Add generic instances [ROCm/composable_kernel commit: `472fa029ba`]	2023-08-09 10:40:55 -05:00
Illia Silin	4bea06a519	Allow building CK for specific data types and split off last remaining DL instances. (#830 ) * properly split conv_nd_bwd_data instances * split conv2d_fwd instance data types * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm * split the tests by data types where possible * filter examples by DTYPES * split few remaining examples by DTYPES * filter most instances by DTYPES * add new lines at end of headers, fix grouped_gemm profiler * fix syntax * split the ckprofiler instances by DTYPES * split the conv2d and quantization DL and XDL instances * fix the splitting of conv2d DL instances * split softmax and pool_fwd tests for fp16 and fp32 types * fix syntax * fix the dl_int8 quantization instances isolation [ROCm/composable_kernel commit: `08eb176929`]	2023-08-07 14:56:10 -07:00
Po Yen Chen	60371ab663	Update tuning parameter & compilation options of DeviceGemmXdl<> instance (layout=TT) (#819 ) * Enable pipeline v2 opt for layout=TT instance * Use better thread mapping for reading A tile * Conditionally enable pipeline v2 opt * Allow enabling only fp16 gemm instances in profiler * Fix formatting error * Fix compilation error if we enable fp32 in profiler [ROCm/composable_kernel commit: `f7cc8c3b03`]	2023-08-02 10:32:22 -05:00
carlushuang	836a29fcd8	initial stream-k implementation with example (#699 ) * initial stream-k implementation with example * fix unexpected change in err * improve a little bit performance by reorganize pipeline. * improve perf a little bit by swizzle block idx * add profiler * update example * fix spelling * shrink karg for streamk * support dynamic buffer using memory coherence glc_slc bit from template * control memory coherence while construct dynamic buffer * update reduction for streamk(not ready yet) * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting * fix build issue * fix several bug * now result is correct, everything works (but has scratch) * remove scratch by manually reset coordinate * update device code * fix a bug in final reduce * fix something in example * update async memset * fix enum as camel case * modify coherence enum name * clean code and use atomic streamk by default * remove unused var * throw exception if have empty pointer * fix format * fix CI warning * fix type in init * modify CI error * filter out on gfx10+ * restore changed example code --------- Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com> [ROCm/composable_kernel commit: `e7dca79d27`]	2023-07-26 14:18:15 -05:00
Illia Silin	fe77d721fb	Disable DL kernels by default. (#816 ) [ROCm/composable_kernel commit: `9195435c77`]	2023-07-26 11:06:45 -05:00
Po Yen Chen	26cfd12dec	Speed-up global memory reading for GEMM instances (#813 ) * Use better ThreadClusterLengths to speed up * Update B tile reading pattern for layout=NN instance [ROCm/composable_kernel commit: `f4ea560112`]	2023-07-25 18:54:47 -05:00
ltqin	714fc59909	Add bias scalar vectorload = 1 for gemm bias gemm (#791 ) * first change bias load * add bias dim and scalervector parameter * make CDE0BlockTransferSrcVectorDim not work * changse toinstance * add limit for CDE0BlockTransferSrcScalarPerVector [ROCm/composable_kernel commit: `50643dd555`]	2023-07-24 20:08:15 -05:00
Bartłomiej Kocot	9eb711b9d2	Grouped conv bwd wei NDHWGC/NDHWGK (#804 ) [ROCm/composable_kernel commit: `10732847e7`]	2023-07-21 12:00:55 -05:00
Bartłomiej Kocot	686ad6b543	Grouped 3d conv backward data support (#799 ) * Grouped 3d conv backward data support * Fix comments [ROCm/composable_kernel commit: `49180fd60b`]	2023-07-18 11:01:33 -05:00
Illia Silin	67b2baf9c1	Add mechanism to build CK for select data types, add Navi3x CI. (#790 ) * allow building CK for specific data types * add CI build and test stage on Naiv3x without some int8 instances * add missing gemm fp16 instances * add the changes to the missed cmake file * add empty lines at end of source files * Do not build quantization client example on navi3 in CI * disable batched_gemm_multi_d_int8 instances with DTYPES * disable device_conv2d_bwd_data_instance with DTYPES * fix ckprofiler for conv_bwd_data for int8 * properly isolate the conv_bwd_data int8 instances * remove empty line [ROCm/composable_kernel commit: `189ea3b9aa`]	2023-07-17 18:02:42 -07:00
Bartłomiej Kocot	a1a8901df8	Support NHWGC conv2d_bwd_weight (#769 ) * Support NHWGC conv2d_bwd_weight * Fix client example * Fix client example * Fix comments * Redesign grouped_conv_bwd_weight instances * Clang format fix --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `1ee99dcaa6`]	2023-07-12 08:25:02 -05:00
Po Yen Chen	7308ac2436	Split GEMM instance library & enable pipeline v2 optimization (#783 ) * Move source file into sub-directories * Add missing include directive * Split DeviceGemmXdl<> fp16 instances * Fix format * Remove unnecessary CMakeLists.txt * Add macros to toggle new features * Remove debug message * Turn off GEMM v2 pipeline optimization by default * Fix format * Extract duplicated string as list * Enlarge indent in CMakeLists.txt [ROCm/composable_kernel commit: `850144a0d3`]	2023-07-06 10:59:35 -05:00
Adam Osewski	5c2c77a439	Move Device Ops implementations into impl directory. (#777 ) Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `f4dfc060b7`]	2023-07-06 16:15:51 +02:00
Bartlomiej Kocot	d200bb9b35	Fix copyrights for DeviceBatchedGemmMultipleD_Dl [ROCm/composable_kernel commit: `2b0b6d9f46`]	2023-07-06 15:50:27 +02:00
Bartłomiej Kocot	1dde9f03de	Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757 ) * Support bf16/f32/f16 and NHWGC conv2d_bwd_data * Add interface test * clang format * Comment fixes * Add more friendly error message [ROCm/composable_kernel commit: `63388e84ab`]	2023-06-21 08:20:31 -05:00
Qianfeng	eebebd33c6	Padded Generic Kernel Instance (#730 ) * Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting * Move the generic kernel instance to be the first of the instance list for elementwise op of normalization * Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax * Add testing of GetGenericInstance() in client_example of Softmax * Revert "Add testing of GetGenericInstance() in client_example of Softmax" This reverts commit `f629cd9a93`. * Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax" This reverts commit `a9f0d000eb`. * Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm * Move generic kernel instance to separate tuple for elementwise op of normalization * Remove un-used files for softmax instance * Store generic kernel instance to separate tuple for softmax * Add IsSupported checking for generic instance to client example of softmax * Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization * clang-format fix * Remove int8 from softmax instances --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `0d9118226b`]	2023-06-16 23:43:11 -05:00
zjing14	49892843ac	Fixed Weight layout of grouped_conv 3d fwd (#743 ) * Changed wei layout * changed layout for examples * fixed client example --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `309b1c6461`]	2023-06-15 10:19:33 -05:00
Rostyslav Geyyer	37abb8fe42	Add generic kernel instances for ck::tensor_operation::device::DeviceGemmMultipleD (#741 ) * Add generic instance gemm_add_add_fastgelu * Add a client example for generic gemm_add_add_fastgelu * Update CMakeLists * Format * Format * Add generic instance gemm_add_fastgelu * Format * Add a gemm_add_fastgelu client example * Format * Add generic instance gemm_fastgelu * Format * Fix argument order * Add gemm_fastgelu client example * Add exceptions if argument is not supported [ROCm/composable_kernel commit: `54b68eb343`]	2023-06-14 16:06:56 -05:00
Bartłomiej Kocot	a404cc8faf	Add DeviceBatchedGemmMultipleD_Dl (#732 ) * Add DeviceBatchedGemmMultipleD_Dl * Fix batched_gemm tests * Fix comments * test_batched_gemm_multi_d fixes * Fix args for isSupported batchedGemmMultipleDDl * Disable tests for gfx90a [ROCm/composable_kernel commit: `fc9f97568f`]	2023-06-12 08:37:15 -05:00
ltqin	e168773cd1	Fix flash attn mask bug (#733 ) * add check input parameter * add instance for vector load = 1 * move gerneral instance to first pos * fix read bias code * regular code for bias load --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `0ede66de54`]	2023-06-12 08:35:31 -05:00
Illia Silin	b57fbee2f1	update copyright headers (#726 ) [ROCm/composable_kernel commit: `b94fd0b227`]	2023-05-31 18:46:57 -05:00
Adam Osewski	05ef0151aa	Multiple fixes to GroupedGemm+SplitK (#707 ) * Add license header. * Reduce number of logged output. Add constant initialization. * Add functional tests for grouped_gemm with different kbatch value. * Add debug log informations + remove unused code. * Don't pass kbatch to CalculateKPadded. * Turn on logging in grouped gemm and gemm splitk profiler * Debug: limit number of test cases to run; * Log more information and initialize with constant value. * Turn on DEBUG_LOG * Add more debug log informations. * Limit the number of instances to compile. * Use GridwiseGemmPipeline * Use KBatch to calculate K0 * Multiple DebugLog messages. * Unit tests for multiple KBatch values. * Refactoring * Disable logging * extract out of if statement KBatch update. * Uncomment instances. * Disable DebugLog. * Use Kbatch when calculate KPadded. * Fix CGridDesc padding. * Use available helper functions. * Uncomment code commented for debuggin. * Remove unnecessary debug log messages. * Uncomment previously commented code for debug purposes. * Add KBatch info to profiler output summary log. * Add gtests for gemm splitk using ckProfiler API. * Add more test-cases for different data layout. * Add more test cases for gemm splitk * Remove old test. * Unit tests for MKNK ggemm interface. * Fix and add more unit-tests. * Constepxr everything! * Increase error threshold for fp16 and splitk. Since we're using fp16 atomic add for splitk there's a known precision loss. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `70e4eb567f`]	2023-05-30 07:09:06 -05:00
Bartłomiej Kocot	474c107796	Add instances for fp16/int8 Gemm kernels (Navi21) (#717 ) * Add instances for fp16/int8 Gemm kernels (Navi21) * Extend instances with smaller tiles * Fix SrcVectorTensor for km_kn_mn int8 [ROCm/composable_kernel commit: `c2d7a29dec`]	2023-05-30 07:07:17 -05:00
rocking	84cbb3af35	Pool3d fwd (#697 ) * Expand the base class of pool2d, prepare to share base class with pool3d * Add pool3d device op * Add pool3d f16 example * Refactor the base class. implement generic pooling in the future * clang format * get original index in max pooling * Add outputindex to base class * Fix dimension * Add pooling instance * Use indexType instead * Remove useless header * Extract IndexDataType to template * Extract pooling reference code * clang format * clang format * Fix typo * Add tensor stride * Add missing header * Add index stride and output stride * Refine naming * Add type to base class * Rename file * Use proper size * Fix typo * Refine naming * Modify the argument into vector. * Add max pool profiler * Refine naming * Support f32 pool * Fix typo * Add avg pool2d fwd in profiler * clang format * Rename AccDatatype to ComputeDatatype * Fix init * test pool * Extract variable * Add client example * Check the pooling dim * clang format * Connect argv and arg_parser * Add found check * Remove useless header * Refine naming * Adjust the order of device_pool_fwd [ROCm/composable_kernel commit: `76ec0089fb`]	2023-05-24 09:05:04 -05:00
Adam Osewski	d9fe87efbd	Grouped Gemm + SplitK + simplified Kernel Args (#669 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * B2C with 3D grid for KSplit * Remove unused code. * Use default B2C (3D grid) in grid gemm v2r4r2. * Device gemm splitk use B2C map. * Device GroupedGemmXdlSplitKCShuffle * Example for GroupedGemm Xdl SplitK * Introduce Device GroupedGemmSplitK * Fix updating kbatch size. * Add instance mk-nk-mn * Enable set kbatch in profiler. * Add GGemmSplitK mk-kn-mn instances * Add more instances & split into multiple files. * minor fix * tuning * clean * disabled failed instances * use pipe v2 * Ignore arg on not supported arch. * fix warning --------- Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Jing Zhang <jizhan@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `8bb2bb4a05`]	2023-04-24 15:43:36 -05:00
rocking	cff08cbc72	Revise layout of group convolution (#675 ) * [What] Remove pure conv int8 instance [Why] We will never use pure int8 conv in AI, use int8 quantization instead * Change layout * Share the kernel parameter * Support more type of NHWGC for group conv * Revise client example of conv 2d, use NHWGC layout * Add instance to cmake * Revise layout of group conv quantization instance * Revise layout of external api of group conv quantization * Revise layout of group conv quantization client example * Fix clang format * Add comment to describe meaning of each parameter [ROCm/composable_kernel commit: `3eecbfb6ec`]	2023-04-23 23:40:00 -05:00
Illia Silin	55d16b3400	Put back the split-k gemm code. (#684 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout --------- Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `903cd19ce3`]	2023-04-21 19:37:00 -05:00
rocking5566	ee4b893928	Add (#677 ) [ROCm/composable_kernel commit: `fd11a4a12a`]	2023-04-17 10:12:10 -05:00
rocking5566	356c1cc17b	Groupnorm + swish external api (#668 ) * Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp [ROCm/composable_kernel commit: `ed3a2e5226`]	2023-04-10 08:02:17 -05:00
Jun Liu	89d6f8a65f	Issue #666 : Revert "simplify karg in device/grid of split-k op (#644 )" (#665 ) This reverts commit `1108f64591`. [ROCm/composable_kernel commit: `3248387bbb`]	2023-04-06 17:14:11 -07:00
zjing14	696991c923	add fp64 instances (#658 ) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `fde6d2742b`]	2023-03-30 13:30:43 -05:00
carlushuang	1108f64591	simplify karg in device/grid of split-k op (#644 ) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout [ROCm/composable_kernel commit: `bb5530af91`]	2023-03-29 19:03:07 -05:00
rocking5566	cbce8b77da	Conv + quantization + tanh (#645 ) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `389e84a83b`]	2023-03-29 14:50:23 -05:00
ltqin	fc10856d4b	workaround 637 (#640 ) * add workaround 637 * format * change id --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `6ae12434d2`]	2023-03-20 11:49:31 -05:00
rocking5566	6a1403d82d	gemm/Conv xdlops + dlops quantization (#625 ) * Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit `820869182f`. * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `16dc18e0f9`]	2023-03-15 15:29:40 -05:00

1 2 3 4

159 Commits