composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 08:50:17 +00:00

Author	SHA1	Message	Date
Chao Liu	204ef976ca	add more datatype to gemm+gemm and conv+conv example (#397 ) * refactor * refactor * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * clean	2022-09-01 09:31:17 -05:00
Po Yen Chen	46a675aa6f	Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380 ) * Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle * Add 'DeviceGroupedConvFwdMultipleDMultipleR' interface * Add DeviceGroupedConvFwdMultipleDMultipleR_Xdl_CShuffle * Remove 'GridwiseConvFwdMultipleDMultipleR_xdl_cshuffle' * Add 'TransformConvFwdToGemm<>' utility class (from Chao) * Use 'TransformConvFwdToGemm<>' to shorten code * Fix ill-formed method declaration * Re-implement MakeRGridDescriptor_M() function * Change problem description * Use macro to define layout types * Define K-reduced output tensor layout types * Let user to decide R output tensor layout * Rename variables * Add padding to the reduced output tensor if necessary * Extract common code as helper method * Remove debug message * Add missing include directive * Add partial fp16 Conv + Reduction example * Add example verification code for 2D Conv problem * Use type alias to simplify code * Share code across different-dimension Conv problems * Rename file/functions from run_conv_fwd* to run_convnd_fwd* * Make example code more verbose * Add code to support 1D & 3D Conv + Reduction on host * Add more examples for data type: bf16, fp32 * Add example for int8 * Add custom target to group examples * Use more general custom target name * Change the description in error message * Disable testing for example other than fp32 * Add examplel for int4 (just copy from int8) * Fix wrong data type * Use larger data type for intermediate tensors * Finish int4 example * Undefine macro PP_DEFINE_LAYOUT_TYPE() after use * Use named variables to replace magic numbers * Remove debug messages * Use same A/B data type for host Conv in int4 example * Add check for the 'RLayout' type argument * Group same-dim-layouts together in 'LayoutSetting<>' * Add 'final' specifier to utility classes * Use different initialization method for examples * Remove macro PP_DEFINE_LAYOUT_TYPE() * Fix code-comment mismatch * Use more reasonable initialization value for all data types * Default use init_method=1 for all examples * Remove never-used code * Remove confusing out-of-date comments * clean Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com>	2022-08-31 16:32:17 -05:00
Chao Liu	4df6d93f60	conv+conv (1x1 only) example using gemm+gemm (#393 ) * refactor conv * add conv+conv example, 1x1 only	2022-08-31 11:27:11 -05:00
Adam Osewski	d00e6115b9	Gemm reduce examples int4/int8/fp32/bf16 (#368 ) * GEMM + Reduce max fp16+fp32 * GEmm + Max bf16 + int8 * Refactor common definitions. * Refactor common func of mean meansquare example. * More examples for mean meansquare. * Update int8 examples and skip them cause of random errors. * Int4 examples. * Fix examples for max int4/8 * Tensor conversion for int4 input data for mean meansquare example. * Remove int4 mean_meansquare example * Fix int8 mean_meansquare example. -All ReductionAccData and R<N>DataType have to be F32. The INT32 data type is giving wrong results. * Guard int4 with ifdef * Change int8 example to add_addsquare due to div rounding err. * Clang format * Change the return type of common function. * Get back int8 example with division. * Remove int8 mean meansquare. * Use proper cast for BF16 data type. * Use ck::literals. * Use proper data type for host tensors & reference. - Use ReduceAccDataType for reference gemm output data type. - Cast host reference output tensor to EDataType - Fix ifdefs for int4. Co-authored-by: Adam Osewski <aosewski@amd.com>	2022-08-30 11:38:26 -05:00
Shaojie WANG	45adb736e7	Padding for attention: bmm+scale+softmax+bmm kernel (#385 ) * add padding algo for bmm+scale+softmax+bmm. Version for verification * remove verification code * remove comments * add padded bmm scale softmax bmm example * format * refactor * add comments for usages of padding bmm+scale+softmax+bmm Co-authored-by: Chao Liu <lc.roy86@gmail.com>	2022-08-30 11:01:37 -05:00
Anthony Chang	138faf3961	Try to workaround flaky GemmSoftmaxGemm tests (#386 ) * avoid potential hazard; flaky test issue persists * pin down the random seed to avoid flakiness	2022-08-29 08:40:25 -05:00
Illia Silin	9061d39bd6	Fix the slow cpu reference batched gemm kernels. (#388 ) * fix the performance of the batched gemm verification * fix tabs	2022-08-29 08:39:21 -05:00
Illia Silin	1e5b59df22	Add an option to build CK with clang directly (#387 ) * replace hipcc compiler with clang++ * build client app with hipcc * build client app with clang * add an option to build with hipcc ro clang * fix the environment for client app * fix setting up compiler in cmake_build * change the way the compiler is set	2022-08-26 12:51:39 -05:00
zjing14	9881625b2d	Fixed splitk gemm fp32 (#384 ) * add scripts * fixed splitK_gemm_fp32 * clean * clean	2022-08-26 09:59:50 -05:00
Adam Osewski	57fadf6fb9	More int4 tests. (#374 ) * More int4 UT. * Disable BitwiseRepresentation UT. * Add UT with static_cast * Surround cout statements with #if Co-authored-by: Adam Osewski <aosewski@amd.com>	2022-08-25 17:20:23 -05:00
Adam Osewski	3ab20fd753	GEMM batched/splitK/cgemm/grouped int4 examples (#383 ) * Grouped GEmm int4. * Formatting + fix K dimension for int8. * Batched Gemm int4 example. * CGEMM int4 example. * Include inc filese in clang-format. * SplitK int4 example * Refactoring of performance measurement. * Fix #ifdef statements. Co-authored-by: Adam Osewski <aosewski@amd.com>	2022-08-25 17:19:15 -05:00
Rostyslav Geyyer	b73ae24234	Add int4 example for convnd_fwd_bias_relu_add (#375 ) * Add int4 example for convnd_fwd_bias_relu_add * Fix AddReluAdd for building without int4 support * Update CMakeLists.txt * Format * Convert int4 tensors for int8 kernel * Fix device memory allocation * Format * Format	2022-08-25 17:08:43 -05:00
Qianfeng	d520d0cfc1	Add int4 reduction examples (#372 ) * Add int4 reduction examples * Contain all using of int4_t inside the pre-compiling condition checking	2022-08-25 16:58:48 -05:00
zjing14	f246fd2c88	add scripts (#382 )	2022-08-25 10:33:40 -05:00
rocking5566	e1a3fff675	layernorm external api (#379 ) * Add layernorm client example * [What] Add default make install dir to gitignore [Why] client example need to make install	2022-08-24 18:43:43 -05:00
Po Yen Chen	88e43744d8	Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle (#378 )	2022-08-24 10:12:54 -05:00
Po Yen Chen	fa2d894be1	Add examples of Gemm (data type: int4) (#367 ) * Add GEMM examples for int4 Currently the source files are just copied from int8 examples * Re-use pre-defined alias in int4 exmples * Distinguish user-side type from kernel-side type * Add int4_t support for check_err() * Allow conversion between Tensor<> specializations * Re-format source files * Use different type for host tensors * Re-use CopyAsType<>() to implement copy ctor * Re-use element-wise operation type alias * Fix typo in alias names * Complete the int4 examples * Add constraint to Tensor<> templated methods * Add type traits 'is_signed_integral<>' * Add type constraints for integer version check_err<>() * Allow comparing different-sized integral types in check_err() * Check converted Tensor<int4_t> with golden Tensor<int8_t> * Remove constraint of Tensor<>::CopyAsType() * Avoid compilation error while disabling ck::int4_t support * Remove debug messages * Add #error directive to prevent compile sources with wrong setting * Simplify tensor usages in examples * Add constraint to check_err() input reference type * Align design with other PR * Use ""_uz to simplify example code * Avoid too much generalizing check_err() * Re-format GEMM instance template arguments * Extract int4 example common codes * Sort include directives * Move #include directives into new header * Move common codes together * Re-format template argument in example code * Reuse same implementation code for most of GEMM examples * Re-format common.hpp * Unify structured comment in examples * Use reinterpret_cast<>() for cross-type pointer conversion * Revert "Add type traits 'is_signed_integral<>'" This reverts commit `f2c148efae`. * Allow unsigned integer arguments for check_err() * Fix compilation error in check_err() * Remove unnecessary copy ctor for Tensor<> * Mark Tensor<> special member functions as 'default' * Use more strict condition to add code in examples * Fix wrong program return value of GEMM examples * Handle the case while user specify all the strides * Fix never-ran examples * Exit successfully if GEMM instance does not support given problem * Add missing 'else' keyword * Re-format CMakeLists.txt * Add wrapper function to hide value conversion while copying memory * Add new DeviceMem API to copy memory * Use new DeviceMem API to implement examples * Revert "Add new DeviceMem API to copy memory" This reverts commit `3f190b0779`. * Add conversion ctor for Tensor<> * Write Tensor<> conversion logics explicitly in example code * Convert Tensor<> values after transfer data to host	2022-08-23 18:25:05 -05:00
Anthony Chang	e0d8806ca1	Attention with output permutation (#370 ) * comment on specialization for TensorSpecialization::Packed * gemm_softmax_gemm with output permutation * scaling * refactor MatrixPadder; rename to GemmPadder * remove old sanity check * restore original gemm_softmax_gemm * revise comment in gemm_softmax_gemm example * use GetElementSpaceSize() * remove extra header * typo * remove archaic DeviceOpPtr	2022-08-23 14:52:56 -05:00
zjing14	6091458300	Add examples of batched/grouped/SplitK Gemm for int8/bfp16/fp16/fp32 (#361 ) * add examples into grouped/batched_gemm * adding splitK examples * fixed splitK * add bfp16 int8 example into splitK * formatting * use static_cast * added common for batched_gemm * add commons for examples of splitK/batched/grouped_gemm * return true * adjust splitK check tol * update example Co-authored-by: Chao Liu <lc.roy86@gmail.com>	2022-08-23 14:41:56 -05:00
Po Yen Chen	2327f1a640	Add example of Gemm + AddAddFastGelu (data type: int4) (#369 ) * Add custom target to bundle examples together * Add int4 example conditionally (just copy from int8 example) * Extract common code into common.hpp * Move ref gemm type alias into data-type-specific sources * Add #error directive to prevent compile with wrong setting * Let AddAddFastGelu support int4 parameter type * Let check_err() support int4 parameter type * Add wrapper function to hide value conversion while copying memory * Finish int4 example for GEMM + AddAddFastGelu * Add new DeviceMem API to copy memory * Use new DeviceMem API to implement examples * Fix wrongly use of macro 'CK_EXPERIMENTAL_BIT_INT_EXTENSION_INT4' * Revert "Add new DeviceMem API to copy memory" This reverts commit `e26e7af71e`. * Add conversion ctor for Tensor<> * Add 'const' specifier to Tensor<>::CopyAsType() * Convert Tensor<> values before/after transfer between host & device	2022-08-23 10:38:41 -05:00
Anthony Chang	f4047c9418	Implement padding and sanity checks for fused GEMM+GEMM (#376 ) * GemmPadder and GemmGemmPadder * proper padding using GemmGemmPadder * test gemm_gemm padding * properly check size K in IsSupportedArgument() * properly check size requirement given SrcScalarPerVector in IsSupportedArgument() * comment * format	2022-08-23 10:01:02 -05:00
rocking5566	c366de553e	[What] Fix bug of verification fail on E Matrix (#371 ) [Why] We need to sync lds even in first loop because Gemm also use the same LDS.	2022-08-22 07:50:28 -05:00
Illia Silin	9efd033bee	restart the stages on MI200 in case of failures (#366 ) * restart the stages on MI200 * fix the docker image storage issue	2022-08-18 14:54:47 -05:00
Adam Osewski	e00149ac67	int4 data type (#364 ) * Introduce int4 data type. * Add unit-tests for int4 * Compile int4 UT only when int4 enabled. * clang-format Co-authored-by: Adam Osewski <aosewski@amd.com>	2022-08-18 14:53:47 -05:00
Chao Liu	bac7df8faf	use scale (#363 )	2022-08-17 10:38:00 -05:00
Anthony Chang	c961ce9226	Hotfix LDS data hazard in fused attention (#360 ) * avoid LDS data hazard in gemm_softmax_gemm pipeline * trivial refactors * comments * shrink blockwise gemm v2 thread buffer size * reclaim A block lds space when during 2nd gemm * amend * amend	2022-08-15 12:04:20 -05:00
Qianfeng	53ea4713af	Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 ) * Implement multiple-reduction in one kernel (kernels, device ops, examples) * Add generic elementwise kernel and device interface * Add generator for normal-distributed data initialization * Add host refer implementation of batchnorm-forward and batchnorm-infer * Add examples for implementing batchnorm-forward and batchnorm-infer using generic kernels * Remove un-needed including in batchnorm example * Renaming generic_elementwise to elementiwise in kernel and device classes/functions * Change in gemm_layernorm examples to use DeviceElementwise instead of Device5AryElementwise * Change in exampe 19_binary_elementwise to use DeviceElementwise instead of DeviceBinaryElementwise * Change in device_cgemm_4gemm_xdl_cshuffle.hpp to use kernel_elementwise instead of kernel_binary_elementwise * Add DeviceElementwiseBase and use it in device_normalize_instance.cpp * Removing and renaming files * Update to synchronize gemm_layernorm client example to the generic element-wise device op API * Update to synchronize with the latest headers directory and HostTensorDescriptor interface renaming * Merge two static member functions in device_elementwise.hpp * Remove unary_elementwise_1d kernel and device	2022-08-15 10:11:02 -05:00
Chao Liu	5ee304595c	fix build issue (#357 ) * fix build * excludeexample_gemm_max_xdl_fp16 from testing due to random failure on gfx908	2022-08-13 15:58:31 -05:00
cloudhan	fb1cbf025b	Change all device operations to use add_instance_library (#338 ) * Change all device operations to use add_instance_library to avoid duplicated cmake configuration. * update DeviceMem Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-08-13 12:17:58 -05:00
rocking5566	0bd6b842b9	Layernorm welford (#346 ) * Add threadwise and blockwise welford * Rename gridwise op, prepare to add welford version * implement welford and integrate welford into layernorm * Take care of tail loop * Fix buf when ThreadSliceK > 1 * Fix bug of merging of two empty set * Rename clip to clamp * 1. Fix type of count 2. Remove useless static_assert * Do not inherit Reduction::Argument * [What] replace __syncthreads() with block_sync_lds() [Why] __syncthreads might wait both lgkmcnt(0) and vmcnt(0) * Add y stride * Rename. DeviceLayernorm -> DeviceLayernormImpl DeviceNormalization2 -> DeviceLayernorm * Move literal ""_uz & ""_zu into namespace 'literals' * Move namespace 'literals' as 'ck::literals' Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-08-13 09:43:18 -05:00
Anthony Chang	c20a75b07d	Fused GEMM+GEMM (#351 ) * initial stub for gemm_gemm_xdl_cshuffle * set up example code * compiles * prevent integer overflow * harmonize interface between ref_gemm and ref_batched_gemm * batched_gemm_gemm * fix example * host tensor gen: diagonal pattern in lowest two-dimensions only * make c descriptors containing only integral constants * clean up * add BlockwiseGemmXdlops_v2 while exploring an unified approach * implement proper interface * tidy up example * fix compilation warnings * coarsely controlled 2nd gemm padding * remove rocm-cmake's hard requirement for certain revision * clang-format * resolve merge conflict * fix compilation error on gfx10 * adds acc0 elementwise op to interface * add gemm_gemm instances and tests * avoid LDS data hazard * fix build Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-08-13 09:18:58 -05:00
ltqin	10b3278b05	Skip lds of b matrix (#326 ) * start * read for gridwise gemm * add MakeBGridDescriptor_K0_N0_N1_N2_N3_K1 * add thread copy desc and register buffer * add K0PerBlock dim * add read global data * finish gridwise gemm * finish blockwise gemm * add print data * add smallest config * add compare code for gridwis gemm * fix NXdlPerWave * fix k0perthread and gridewis gemm main loop * remove b matrix lds alloc * fix name * add test code * create b_grid_desc_k0_k1_k2_n0_n1_n2_n3_k3 from parameter * add double register * modify b_thread_desc_ * add float * fp16 tag * add tail for pipeline * finish main loop * optimize main loop * start clear gridwise gemm * clear code * clear redundant code * change file name * change file name * fix bug after merge develop * fix input parameters * using MultiK0 control b load data loop * fix some config * 4 buffer * fix bug * one can use * change read order * change buffer array to tuple * change to 8 buffer * interleave buffer load * change to 16 * read 8 buffer * add data buffer to template * fix after merge develop(head file) * format * change to 4 buffer * remove unnecessary lambda fun	2022-08-13 01:35:49 -05:00
Qianfeng	14932e8de3	Add examples for reduction fp16/fp32/bp16/int8/fp64 for 3d/4d/5d (#342 ) * Update the reduce_blockwise example to support user specified data type and input+reducing dimensions * Add examples for using reduce_multiblock_atomic_add * Add more running examples to the default command-line * Remove un-necessary header including * Update to the example README.md	2022-08-13 01:10:01 -05:00
rocking5566	6c3c06bf1f	Gemm multiple d multiple r (#335 ) * Imitate XXX_gemm_multiple_d, add XXX_gemm_multiple_d_multiple_r for gemm + reduction * Implement run of kernel * Add example * Fix parameter of typo * Rewrite the reduceMax example * Rewrite the reduceMean + reduceMeanSquare example * Refine naming * Refine folder name * refine naming * Rewrite the gemm + bias + relu + add + layernorm example * Rewrite the gemm + layernorm example * clang-format * Fix bug if sync lds * Fix compile error	2022-08-13 01:07:12 -05:00
Anthony Chang	cac014f173	Fused attention (#345 ) * initial stub for gemm_gemm_xdl_cshuffle * set up example code * compiles * prevent integer overflow * harmonize interface between ref_gemm and ref_batched_gemm * batched_gemm_gemm * fix example * host tensor gen: diagonal pattern in lowest two-dimensions only * make c descriptors containing only integral constants * clean up * add BlockwiseGemmXdlops_v2 while exploring an unified approach * implement proper interface * tidy up example * fix compilation warnings * coarsely controlled 2nd gemm padding * remove rocm-cmake's hard requirement for certain revision * clang-format * resolve merge conflict * fix compilation error on gfx10 * adds acc0 elementwise op to interface * attention host validation * add blockwsie softmax v1 * iteratively update softmax+gemm * transpose both gemm0 and gemm1 xdl output so as to avoid broadcasting softmax max/sum * add init method for easier debugging * do away with manual thread cluster calculation * generalize blockwise softmax interface * row-wise softmax sum & max * format * rename to DeviceBatchedGemmSoftmaxGemm * add gemm_softmax_gemm instances and tests * comment Co-authored-by: ltqin <letao.qin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-08-13 00:16:14 -05:00
Po Yen Chen	a670a5a092	Move literal ""_uz & ""_zu into namespace 'ck::literals' (#354 ) * Move literal ""_uz & ""_zu into namespace 'literals' * Move namespace 'literals' as 'ck::literals'	2022-08-12 17:48:35 -05:00
Rostyslav Geyyer	0c6ef7c14e	Add example of conv_fwd_bias_relu_add for int4, int8, bfp16, fp16, and fp32 (#343 ) * [LWPCK-359] Initial commit * Working version for fp16, add results to readme * Update according to PR #341 * Update results in readme * Add fp32 example * Add bf16 example * Update fp16 and fp32 examples * Add int8 example * Add separate lengths and strides tensors for D tensors Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>	2022-08-12 15:30:27 -05:00
zjing14	35e49f2de6	add g; fixed strides (#355 )	2022-08-12 15:22:39 -05:00
Illia Silin	de60d290b6	Build docker only once in CI, fix conv_bwd logfile names. (#353 ) * build docker in separate stage * build docker with only one prefix * add parallel statement * add docker repo url * fix the name of perf_conv_bwd_data log file	2022-08-12 12:30:37 -05:00
Po Yen Chen	68b61504a3	Add examples for GEMM + AddAddFastGelu (data type: int8, bf16, fp32) (#340 ) * Add always_false<> util to delay symbol resolution * Use always_false<> to prevent trying instantiate unwanted method * Add new specializations of AddAddFastGelu::operator() method * Add GEMM + AddAddFastGelu examples for data types: int8, bf16, fp32 * Use floating point literal to simplify code * Remove unnecessary capture in lambda expressions * Extract fast GeLU calculation as standalone method * Mark methods as 'constexpr' * Add constraint for HostTensorDescriptor templated ctors * Simplify HostTensorDescriptor ctor calls * Add C++23 std::size_t literal suffix * Use _uz suffix to shorten example code * Remove unnecessary conversion to std::array<> * Re-order include directives * Remove C-style casting by literal suffix * Remove unnecessary statements in main() * Remove unused type parameter of always_false<> * Remove unused include directive * Exit main() by returning meaningful value * Use 'if constexpr' to switch example flow * Use std::is_same_v<> to shorten example code * Add 'inline' specifier to literal functions * Unify output methods in example * Move common codes into .inc file * Add type check in type_convert<>() * Add type_convert<float>() before computation * Merge AddAddFastGelu method specializations * Remove always_false<> * Add constraint to AddAddFastGelu::operator() parameter types	2022-08-11 17:31:28 -05:00
rocking5566	fdfd7eb597	ckProfiler for layernorm (#330 ) * Refine parameter * Add base class for layernorm * Add layernorm instance * Add layernorm to ckProfiler * Remove redundant * Add verification * Fix compile error due to merge	2022-08-11 17:03:54 -05:00
zjing14	e08d68d25d	Add batched/grouped_gemm contraction deviceOps (#349 ) * convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test * change gemm_c_permute with contraction * add grouped_contraction * add contraction in group_gemm * add example of grouped_gemm with contraction * add example of grouped_contraction_bias_e_permute * clean * fixed ds * add m3n2 m2n3 examples into gemm_bias_e_permute Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-08-10 12:20:29 -05:00
Illia Silin	aba7fefce7	Fix QA, allow switching compiler versions, fix google test compilation error. (#348 ) * allow selecting compiler version * fix typo * add Wno-deprecated flag for google tests * change git repo, fix qa log files names * change the git clone syntax * use Omkar's git credentials * try to use jenkins as git user * try using illsilin username for gerrit repo with ssh key * try new gerrit authorization * change ssh key syntax * try another way of passing ssh key to docker * add mount ssh in dockerfile * create .ssh folder * move ssh-keyscan to later * get rid of npm call * build first docker image on master * check the contents of the .ssh folder * try replacing omkars creds with gerrit creds * use open repo, clean up changes * get rid of ssh default argument	2022-08-08 13:49:14 -05:00
Chao Liu	146972f447	fix bug in gemm profiler (#344 )	2022-08-07 12:23:32 -05:00
Chao Liu	75ab874e02	Update Group convolution (#341 ) * add conv oddC * update example * update example * fix bug in example * fix bug in group conv example	2022-08-03 12:28:33 -05:00
Adam Osewski	fb0dc35861	CGEMM examples bf16, fp32, int8 (#332 ) * Add int8 specialization for elementwise Add and Subtract. * CGEMM examples bf16, fp32, int8 * Add convert reference output to CDataType. * Skip BF16 data type during testing. * Lower K value to get rid of accumulation error. * Fix merge artifact. * Fix changed function name: GetElementSpaceSize() * Fix merge artifact. Co-authored-by: Adam Osewski <aosewski@amd.com>	2022-08-02 14:52:27 -05:00
Illia Silin	984b3722bf	Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339 ) * turn on full qa only on gfx90a, use int initialization * change script syntax * update script parsing clinfo, throw exception if 0 devices * fix syntax * try using toBoolean for the QA conditions * run regular CI on MI100 only, use MI200 only for daily QA * evaluate when conditions before agent * launch QA on develop branch and update profile_reduce script * update test script * update script * remove false dependency from dockerfile * try removing rbuild completely Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: Chao Liu <lc.roy86@gmail.com>	2022-08-02 09:17:11 -05:00
Chao Liu	500fa99512	Clean up conv example, Instances, profiler and test (#324 ) * convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test	2022-07-29 18:19:25 -05:00
Illia Silin	85978e0201	comment out cron trigger (#334 )	2022-07-22 13:52:10 -05:00
zjing14	d7d7829096	Batched Gemm with multiD (#329 ) * add batched_gemm_multiD * add ds * rename file * add batched_gemm_bias example * add batch_strides into bmm_c_permute * clean * rename example_28 to example_29 Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-07-22 09:33:50 -05:00

1 2 3 4 5 ...

692 Commits