composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 10:09:41 +00:00

Author	SHA1	Message	Date
Illia Silin	74c83ffe26	Add mechanism to build CK for select data types, add Navi3x CI. (#790 ) * allow building CK for specific data types * add CI build and test stage on Naiv3x without some int8 instances * add missing gemm fp16 instances * add the changes to the missed cmake file * add empty lines at end of source files * Do not build quantization client example on navi3 in CI * disable batched_gemm_multi_d_int8 instances with DTYPES * disable device_conv2d_bwd_data_instance with DTYPES * fix ckprofiler for conv_bwd_data for int8 * properly isolate the conv_bwd_data int8 instances * remove empty line [ROCm/composable_kernel commit: `189ea3b9aa`]	2023-07-17 18:02:42 -07:00
Illia Silin	2ce9e7a4cf	Add check for compiler GPU target support. (#800 ) * check if gpu_targets are supported by compiler * set default list of targets and filter for them [ROCm/composable_kernel commit: `4867db4290`]	2023-07-17 09:44:40 -07:00
arvindcheru	f0ccdd1036	Disable Werror to ignore xnack+ warnings (#794 ) * Disable Werror to ignore xnack+ warnings [ROCm/composable_kernel commit: `03d3395b3c`]	2023-07-14 20:00:20 -04:00
Bartłomiej Kocot	dd6d0dd3b9	Support NHWGC conv2d_bwd_weight (#769 ) * Support NHWGC conv2d_bwd_weight * Fix client example * Fix client example * Fix comments * Redesign grouped_conv_bwd_weight instances * Clang format fix --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `1ee99dcaa6`]	2023-07-12 08:25:02 -05:00
Illia Silin	5b106bca9b	change the build thread usage in CI (#787 ) [ROCm/composable_kernel commit: `87f2bbcf5c`]	2023-07-06 20:17:25 -05:00
Adam Osewski	f5b4375a3d	Add basic setup for precommit (#749 ) (#764 ) * Add basic setup for precommit * Update README.md with instructions on installing precommit hooks --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com> [ROCm/composable_kernel commit: `237f9cd3aa`]	2023-07-06 11:01:06 -05:00
Po Yen Chen	aff6040b5b	Split GEMM instance library & enable pipeline v2 optimization (#783 ) * Move source file into sub-directories * Add missing include directive * Split DeviceGemmXdl<> fp16 instances * Fix format * Remove unnecessary CMakeLists.txt * Add macros to toggle new features * Remove debug message * Turn off GEMM v2 pipeline optimization by default * Fix format * Extract duplicated string as list * Enlarge indent in CMakeLists.txt [ROCm/composable_kernel commit: `850144a0d3`]	2023-07-06 10:59:35 -05:00
Qianfeng	b7192d8e4c	Batchnorm splitk single kernel (#771 ) * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance] * Add CountDataType as template parameter in blockwise_welford * Add utility/get_shift.hpp * Add BatchNorm multiblock single-kernel implementation * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a * Renaming in device_batchnorm_forward_impl.hpp * Tiny fix in the batchnorm_fwd profiler * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a" This reverts commit `d16d00919c`. * Use the old two-kernel batchnorm multiblock method for gfx1030 * Use the old two-kernel batchnorm multiblock method for gfx908 * use the single-kernel batchnorm multiblock method only for gfx90a * Remove get_wave_id() from utility/get_id.hpp since it is not used * Set true for testing running mean/variance and saving mean/invvariance in the examples * Fix to copy-right words * Remove un-needed including in utility/get_id.hpp * Add comments to workgroup_synchronization.hpp * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp * Renaming in the kernels * Remove un-used kernel file [ROCm/composable_kernel commit: `8f5cafaf04`]	2023-07-06 10:58:55 -05:00
Adam Osewski	da8a7b63ec	Move Device Ops implementations into impl directory. (#777 ) Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `f4dfc060b7`]	2023-07-06 16:15:51 +02:00
Bartlomiej Kocot	27c7825316	Fix copyrights for DeviceBatchedGemmMultipleD_Dl [ROCm/composable_kernel commit: `2b0b6d9f46`]	2023-07-06 15:50:27 +02:00
Rostyslav Geyyer	eb30728cd2	Add the missing archs (#785 ) [ROCm/composable_kernel commit: `61dc9aa932`]	2023-07-05 18:29:56 -05:00
Rostyslav Geyyer	3c1b791968	Add fp8 GEMM and an example for it (#767 ) * Add fp8 xdl gemm * Add example * Use int8 intrinsics for buffer load/store * Format * Update cmakelists [ROCm/composable_kernel commit: `1cf5003179`]	2023-07-04 20:38:49 -06:00
Illia Silin	f8f4f31830	Upgrade default docker to ROCM5.6 release. (#778 ) * upgrade default compiler to rocm5.6 release * do daily runs with rocm5.6 instead of 5.5 [ROCm/composable_kernel commit: `7797bd3d2b`]	2023-06-30 08:06:54 -07:00
Illia Silin	8173db2afe	Add rocm5.6 RC4 and rocm5.7 to docker build options. (#770 ) * upgrade to rocm5.6 rc4 * add rocm5.7 docker [ROCm/composable_kernel commit: `d3adc66581`]	2023-06-28 08:58:28 -05:00
Illia Silin	5beafe1971	do not build gfx941/942 targets during CI (#766 ) [ROCm/composable_kernel commit: `3b18f1e38c`]	2023-06-21 10:47:35 -07:00
Bartłomiej Kocot	8e7e512358	Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757 ) * Support bf16/f32/f16 and NHWGC conv2d_bwd_data * Add interface test * clang format * Comment fixes * Add more friendly error message [ROCm/composable_kernel commit: `63388e84ab`]	2023-06-21 08:20:31 -05:00
ltqin	f74b673fde	remove useless comments (#760 ) [ROCm/composable_kernel commit: `32d2f52bf7`]	2023-06-19 19:25:08 -07:00
zjing14	16659a8dfb	changed pipeline v1 (#763 ) [ROCm/composable_kernel commit: `05ea6452b6`]	2023-06-19 19:24:18 -07:00
Illia Silin	541ee9c35d	do not build gemm-gemm and conv-conv examples for gfx94* (#761 ) * do not build gemm-gemm and conv-conv examples for gfx94* * do not build gemm-gemm and conv-conv examples on navi [ROCm/composable_kernel commit: `645eb2f2a0`]	2023-06-19 16:55:03 -07:00
Rostyslav Geyyer	09bc04e7a4	FP8 enablement - add a pseudorandom number generator, add conversion methods (#708 ) * Add basic fp8 definitions and prn-generator * Format * Add fp8<->fp32 type_convert * Format * Split type_convert and cast_to/from_f8 * Format * Minor fix * Minor fix * Move fp8 utils to a separate header * Add elementwise ops * Add fp8_convert_sr * Format * Add element op * Eliminate magic numbers * Split f8_convert_sr in host and device * Format * Add some constexpr * Add a datatype test * Format * Another format * Add fp8<->fp16 tests * Update type_converts * Format * Add fp16 casting functions * Format * Use seed as a runtime arg * Use element location for PRNG * Format * Add fp8<->fp16 to PassThrough element op * Clean up * Merge host and device implementations * Add comments on rounding modes * Remove leftover code * Put type_converts into a separate header * Put random number gen to a separate header * Rearrange f8_utils' namespaces * Refactor type_convert.hpp * Move f8_t definition [ROCm/composable_kernel commit: `f0c620c42e`]	2023-06-19 11:20:35 -05:00
rocking	9c2487d2a0	Maxpool bwd (#750 ) * Add maxpool f32 kernel and example * Revise copyright * Add device pool bwd device op * Support f16 and bf16 * Add compute datatype for reference code. Prevent error in bf16 * Fix type error * Remove layout * Fix bf16 error * Add f16 and bf16 example * Add more operations * Implement IsSupportedArgument * Add changelog * Add comment * Add comment * Remove useless header * Move initialize of workspace to the run * Move set din zero to the device operator * Save din_length_raw * Remove useless header * Calculate gridsize according to the number of CU * Calculate gridSize according to the number of CU. Remove useless header * Add put example * Remove useless header * Fix CI fail [ROCm/composable_kernel commit: `341ad95665`]	2023-06-19 09:44:22 -05:00
Qianfeng	d6f690d361	Padded Generic Kernel Instance (#730 ) * Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting * Move the generic kernel instance to be the first of the instance list for elementwise op of normalization * Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax * Add testing of GetGenericInstance() in client_example of Softmax * Revert "Add testing of GetGenericInstance() in client_example of Softmax" This reverts commit `f629cd9a93`. * Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax" This reverts commit `a9f0d000eb`. * Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm * Move generic kernel instance to separate tuple for elementwise op of normalization * Remove un-used files for softmax instance * Store generic kernel instance to separate tuple for softmax * Add IsSupported checking for generic instance to client example of softmax * Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization * clang-format fix * Remove int8 from softmax instances --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `0d9118226b`]	2023-06-16 23:43:11 -05:00
Illia Silin	65eccfd426	do not build gfx941/942 targets during daily QA runs (#758 ) [ROCm/composable_kernel commit: `d140bdc9fa`]	2023-06-16 12:13:16 -07:00
Illia Silin	48347d8653	Enable gfx941 and gfx942 architectures. (#752 ) * enable gfx941/942 targets * fix clang format * fix the cmake logic for multiple targets * fix cmake syntax for looping over targets * add gfx941/942 support for gemm_xdl instances [ROCm/composable_kernel commit: `027e46ee82`]	2023-06-15 08:20:59 -07:00
zjing14	973fc655fd	Fixed Weight layout of grouped_conv 3d fwd (#743 ) * Changed wei layout * changed layout for examples * fixed client example --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com> [ROCm/composable_kernel commit: `309b1c6461`]	2023-06-15 10:19:33 -05:00
Qianfeng	82b9518dd6	Using number of compute units to set gridSize (#754 ) * Add getAvailableComputeUnitCount() interface * Use available number of compute units to set kernel grid size [ROCm/composable_kernel commit: `c5f6ec842c`]	2023-06-15 10:13:59 -05:00
Illia Silin	ca65844005	Fix the daily CI job with latest staging compiler. (#753 ) * fix CI builds with latest staging compiler * remove mount flags from dockerfile [ROCm/composable_kernel commit: `d1838d328c`]	2023-06-14 16:44:13 -07:00
Rostyslav Geyyer	f0c9daa292	Add generic kernel instances for ck::tensor_operation::device::DeviceGemmMultipleD (#741 ) * Add generic instance gemm_add_add_fastgelu * Add a client example for generic gemm_add_add_fastgelu * Update CMakeLists * Format * Format * Add generic instance gemm_add_fastgelu * Format * Add a gemm_add_fastgelu client example * Format * Add generic instance gemm_fastgelu * Format * Fix argument order * Add gemm_fastgelu client example * Add exceptions if argument is not supported [ROCm/composable_kernel commit: `54b68eb343`]	2023-06-14 16:06:56 -05:00
Rostyslav Geyyer	f913b45773	Fix arg order (#751 ) [ROCm/composable_kernel commit: `a35456a3f4`]	2023-06-12 08:38:46 -05:00
Bartłomiej Kocot	1405a4906b	Add DeviceBatchedGemmMultipleD_Dl (#732 ) * Add DeviceBatchedGemmMultipleD_Dl * Fix batched_gemm tests * Fix comments * test_batched_gemm_multi_d fixes * Fix args for isSupported batchedGemmMultipleDDl * Disable tests for gfx90a [ROCm/composable_kernel commit: `fc9f97568f`]	2023-06-12 08:37:15 -05:00
Po Yen Chen	d6b39871d2	Fix incomplete object size (=4n + 3) support of amd_wave_read_first_lane() (#738 ) * Fix wrong pointer type * Rename type trait get_unsigned_int<> to get_carrier<> * Add 3-bytes carrier type * Add missing __device__ specifier * Rename template non-type parameter * Leave the rest byte uninitialized * Avoid invoking (host) STL algorithms * Remove unnecessary 'inline' specifier * Extract common logic out as helper method * Hide dummy member function * Add missing __device__ specifier [ROCm/composable_kernel commit: `7c24654c24`]	2023-06-12 08:36:40 -05:00
ltqin	8c5f5f1293	Fix flash attn mask bug (#733 ) * add check input parameter * add instance for vector load = 1 * move gerneral instance to first pos * fix read bias code * regular code for bias load --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `0ede66de54`]	2023-06-12 08:35:31 -05:00
carlushuang	9499f4b51b	support dynamic buffer using memory coherence glc_slc bit from template (#725 ) [ROCm/composable_kernel commit: `016ebaa7f3`]	2023-06-08 07:40:29 -05:00
Illia Silin	c1fa58aee7	Update docker (#744 ) * update dockerfile to build rocm5.6 rc3 * fix couple of docker issues [ROCm/composable_kernel commit: `1dd455d633`]	2023-06-07 09:35:14 -07:00
Illia Silin	3e6de9b7df	fix clang format (#740 ) [ROCm/composable_kernel commit: `4036590401`]	2023-06-02 14:10:02 -07:00
who who who	e66c3993ec	replace hipMemcpy with hipMemcpyWithStream (#734 ) [ROCm/composable_kernel commit: `e2ebc8e795`]	2023-06-01 16:23:41 -05:00
Po Yen Chen	16357d06c6	Simplify kernel argument of device operator Device(Batched)GemmXdl<> (#723 ) * Remove M/N/KPad local variables * Use M/N/KPad to name padded lengths * Replace duplicated local variable by parameters * Rename variables M/N/KRaw to M/N/K * Move AK0/BK0 compute logic into GridwiseGemm * Use macro to shorten code * Move CalculateGridSize() logic into GridwiseGemm * Add comment to credit the implementation source * Reuse the existing implementation * Remove no-longer used data members * Remove elementwise-op objects from interfaces * Reserve kernel arg as whole object in interfaces * Remove redundant data member * Make 3rd type parameter optional * Remove unnesscary type parameters * Remove no-longer used descriptor-creation methods * Move kernel arg type definition into GridwiseGemm * Add macro to switch between code sections * Move argument field computing logic into device op side * Make utility method 'static' * Declare special methods * Unify MakeArgument() usage * Adapt the new GridwiseGemm interface * Push-down class 'GridwiseGemm::Argument' fields * Remove no-longer used methods * Add unused parameters * Force copying parameters in 'Embed' ctor * Remove no-longer used descriptors * Fallback change on BaseArgument * Remove macro 'INTEGER_DIVIDE_CEIL' * Make variable naming more consistent * Make sure methods are only invoked on right place * Remove tailing underscore in public attribute name * Remove necessary methods * Hide computing logic of derived attributes * Make new 'Embed' ctor only available for device code * Make sure 'Embed' type args are not references * Move check for karg.K into CheckValidity() * Remove more integer division logic form device code * Undo changes on Embed * Separate 'Problem' concept out from 'Argument' * Add overloaded version of __builtin_amdgcn_readfirstlane() * Remove 'static' specifiers * Remove more 'static' specifier * Replace unsigne char by std::byte * Add 'const' specifier to never changing variable * Add 'inline' specifier to funcion definition * Share same name for kernel interfaces * Fix wrong boundar calculation logic * Leave the third template arg for compatibility * Remove unnecessary parameters * Fix wrong error message (for type name) * Create descriptor on device side * Fix wrong debug message * Remove no-longer used data members * Rename type trait * Remove std:: qualifier from standard types * Replace 'size_t' by 'unsigned' * Use type alias to hint usage * Replace static_for<> by ordinary 'for' loop * Reject unsupported argument * Rename readfirstlane() to amd_wave_read_first_lane() * Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp * Update function calls * Reorder statements * Re-format files --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `9eae73df9b`]	2023-06-01 16:23:02 -05:00
Illia Silin	d40b8d5e2c	update copyright headers (#726 ) [ROCm/composable_kernel commit: `b94fd0b227`]	2023-05-31 18:46:57 -05:00
Po Yen Chen	7819e1b85d	Add class type support for __builtin_amdgcn_readfirstlane() (#711 ) * Add overloaded version of __builtin_amdgcn_readfirstlane() * Remove 'static' specifiers * Remove more 'static' specifier * Replace unsigne char by std::byte * Add 'const' specifier to never changing variable * Add 'inline' specifier to funcion definition * Fix wrong boundar calculation logic * Rename type trait * Remove std:: qualifier from standard types * Replace 'size_t' by 'unsigned' * Use type alias to hint usage * Replace static_for<> by ordinary 'for' loop * Rename readfirstlane() to amd_wave_read_first_lane() * Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp * Reorder statements [ROCm/composable_kernel commit: `582e31e88d`]	2023-05-31 10:25:25 -05:00
Haocong WANG	31fa21b7ee	fix wmma gemm int8; add grouped conv int8 example (#716 ) [ROCm/composable_kernel commit: `6eef0755c9`]	2023-05-30 07:18:53 -05:00
Po Yen Chen	0b5125f617	Simplify kernel argument of device operator DeviceGemm_Xdl_CShuffle<> (#696 ) * Remove M/N/KPad local variables * Use M/N/KPad to name padded lengths * Replace duplicated local variable by parameters * Rename variables M/N/KRaw to M/N/K * Move AK0/BK0 compute logic into GridwiseGemm * Use macro to shorten code * Move CalculateGridSize() logic into GridwiseGemm * Add comment to credit the implementation source * Reuse the existing implementation * Remove no-longer used data members * Remove elementwise-op objects from interfaces * Reserve kernel arg as whole object in interfaces * Remove redundant data member * Make 3rd type parameter optional * Remove unnesscary type parameters * Remove no-longer used descriptor-creation methods * Move kernel arg type definition into GridwiseGemm * Add macro to switch between code sections * Move argument field computing logic into device op side * Make utility method 'static' * Declare special methods * Unify MakeArgument() usage * Adapt the new GridwiseGemm interface * Push-down class 'GridwiseGemm::Argument' fields * Remove no-longer used methods * Add unused parameters * Force copying parameters in 'Embed' ctor * Remove no-longer used descriptors * Fallback change on BaseArgument * Remove macro 'INTEGER_DIVIDE_CEIL' * Make variable naming more consistent * Make sure methods are only invoked on right place * Remove tailing underscore in public attribute name * Remove necessary methods * Hide computing logic of derived attributes * Make new 'Embed' ctor only available for device code * Make sure 'Embed' type args are not references * Move check for karg.K into CheckValidity() * Remove more integer division logic form device code * Undo changes on Embed * Separate 'Problem' concept out from 'Argument' * Share same name for kernel interfaces * Reject unsupported argument --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `1344a0f25b`]	2023-05-30 07:09:55 -05:00
Adam Osewski	b145984ea1	Multiple fixes to GroupedGemm+SplitK (#707 ) * Add license header. * Reduce number of logged output. Add constant initialization. * Add functional tests for grouped_gemm with different kbatch value. * Add debug log informations + remove unused code. * Don't pass kbatch to CalculateKPadded. * Turn on logging in grouped gemm and gemm splitk profiler * Debug: limit number of test cases to run; * Log more information and initialize with constant value. * Turn on DEBUG_LOG * Add more debug log informations. * Limit the number of instances to compile. * Use GridwiseGemmPipeline * Use KBatch to calculate K0 * Multiple DebugLog messages. * Unit tests for multiple KBatch values. * Refactoring * Disable logging * extract out of if statement KBatch update. * Uncomment instances. * Disable DebugLog. * Use Kbatch when calculate KPadded. * Fix CGridDesc padding. * Use available helper functions. * Uncomment code commented for debuggin. * Remove unnecessary debug log messages. * Uncomment previously commented code for debug purposes. * Add KBatch info to profiler output summary log. * Add gtests for gemm splitk using ckProfiler API. * Add more test-cases for different data layout. * Add more test cases for gemm splitk * Remove old test. * Unit tests for MKNK ggemm interface. * Fix and add more unit-tests. * Constepxr everything! * Increase error threshold for fp16 and splitk. Since we're using fp16 atomic add for splitk there's a known precision loss. --------- Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `70e4eb567f`]	2023-05-30 07:09:06 -05:00
Bartłomiej Kocot	18002ddb3c	Add instances for fp16/int8 Gemm kernels (Navi21) (#717 ) * Add instances for fp16/int8 Gemm kernels (Navi21) * Extend instances with smaller tiles * Fix SrcVectorTensor for km_kn_mn int8 [ROCm/composable_kernel commit: `c2d7a29dec`]	2023-05-30 07:07:17 -05:00
Illia Silin	6eca93f302	Clean-up the headers (#713 ) * fix headers for gpu instances * remove unused headers --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> [ROCm/composable_kernel commit: `ac9e01e2cc`]	2023-05-24 08:11:25 -07:00
rocking	266e37d8fd	Pool3d fwd (#697 ) * Expand the base class of pool2d, prepare to share base class with pool3d * Add pool3d device op * Add pool3d f16 example * Refactor the base class. implement generic pooling in the future * clang format * get original index in max pooling * Add outputindex to base class * Fix dimension * Add pooling instance * Use indexType instead * Remove useless header * Extract IndexDataType to template * Extract pooling reference code * clang format * clang format * Fix typo * Add tensor stride * Add missing header * Add index stride and output stride * Refine naming * Add type to base class * Rename file * Use proper size * Fix typo * Refine naming * Modify the argument into vector. * Add max pool profiler * Refine naming * Support f32 pool * Fix typo * Add avg pool2d fwd in profiler * clang format * Rename AccDatatype to ComputeDatatype * Fix init * test pool * Extract variable * Add client example * Check the pooling dim * clang format * Connect argv and arg_parser * Add found check * Remove useless header * Refine naming * Adjust the order of device_pool_fwd [ROCm/composable_kernel commit: `76ec0089fb`]	2023-05-24 09:05:04 -05:00
Illia Silin	2359d80980	Enable gemm_dl and other kernels on Navi3x. (#714 ) * enable dl kernels on navi3 * do not build xdl tests and examples on Navi * run tests before building everything on jenkins * disable gemm_bilinear on gfx1030 * add gpu targets to installer on Navi * put tests in the same order as before * reduce the number of navi targets in CI * build CI installed for gfx940 as well * only build for MI300 during QA runs [ROCm/composable_kernel commit: `d821d1e54f`]	2023-05-23 11:23:16 -05:00
Sam Wu	42dc134b52	Documentation Updates (#710 ) * update documentation dependencies add version number to docs rename doc config directories enable more doc formats on rtd add license section in docs [ROCm/composable_kernel commit: `3cff340423`]	2023-05-18 11:08:38 -06:00
Bartłomiej Kocot	b937260174	Add contraction profiler and tests (#701 ) * Add contraction profiler and tests * Build and style fixes * Allow to use any elementwise operator for ref_contraction * Introduce profile_contraction_scale and profile_contraction_bilinear * Make ref_contraction generic and extend interface tests * Stylistic minor fixes * Extend test_contraction_interface [ROCm/composable_kernel commit: `642d5e9155`]	2023-05-15 09:46:52 -05:00
rocking	f9789fcfc2	Normalization/split k (#615 ) [ROCm/composable_kernel commit: `a1e344b1ae`]	2023-05-11 07:15:02 -05:00
Rostyslav Geyyer	a908dffad5	Optimize bf16 conversion (#664 ) * Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include [ROCm/composable_kernel commit: `b076a02ad2`]	2023-05-04 10:25:47 -05:00

1 2 3 4 5 ...

922 Commits