composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-12 01:10:17 +00:00

Author	SHA1	Message	Date
Illia Silin	fe6b185b97	move utility headers from library/include to include path (#1697 )	2024-11-27 06:12:56 -08:00
Adam Osewski	061ac0649c	Polished Grouped GEMM APIs and new BF16 instances (#1600 ) * Few small fixes. * New GroupedGemm instances (BF16) * Unify and refactor GroupedGEMM device API. * Adapt changes to new API. * Adapt grouped gemm profiler. * Accept multiple kbatches for grouped gemm profiler. - delete obsolete two stage as it is now covered by grouped gemm * Update unit test for grouped gemm. * Fix thresholds for BF16 and F8. Unblock tests. * Fix few instances. * Multiple small fixes. * Adapt to new API, check dynamic casting. * Uncomment few data types in grouped gemm profiler. * Fix call to SetDeviceArgs. * Fix profile grouped gemm multiply tile loop. * Fix grouped gemm tile loop kernel args in client examples. * Review comments.	2024-11-27 13:02:44 +01:00
Adam Osewski	bfe983a151	Change block gemm pipeline local prefill loop order. (#1692 ) * Fix loop order. * Fix loop order in pipeline v4	2024-11-26 17:36:53 +01:00
jakpiase	b70f367f80	Add check for bf16 splitk support for grouped gemm splitk (#1673 ) * add check for bf16 splitk support for grouped gemm splitk * Update if condition --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-11-26 13:56:32 +01:00
Harisankar Sadasivan	d6d4c2788b	universal streamk fp8 changes (#1665 ) * universal streamk fp8 changes & ckprofiler instances * revert strides to -1 and verification options * fp8 exclusion on pre-gfx94 for universal_streamk * PR review based revisions: permissions reverted, removed hip err checks --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-11-21 08:21:37 -08:00
Illia Silin	8aba2724cc	Add bf16 and int8 wmma gemms for Navi3x and Navi4x. (#1671 ) * add bf16 gemms for gfx11/gfx12 * reduce the input values in test_gemm * add int8 wmma gemm instances for gfx11/gfx12 * add example gemm_wmma_int8 * fix bug in gemm_wmma_int8 test * increase bf16 gemm test tolerance * update the dates and clean-up commented-out instances	2024-11-18 14:07:04 -08:00
Bartłomiej Kocot	754adc70e3	Batched GEMM Multiple D based on Universal GEMM (#1655 ) * Batched GEMM Multiple D based on Universal GEMM Co-authored-by: Jing Zhang <jizhan@fb.com> * CI fixes Co-authored-by: Jing Zhang <jizhan@fb.com> --------- Co-authored-by: Jing Zhang <jizhan@fb.com>	2024-11-18 14:03:45 +01:00
Illia Silin	efd9261545	fix clang format (#1662 )	2024-11-13 09:20:18 -08:00
Taylor Ding	73f02a1083	Move checks for compatibility from Argument() to IsSupportedArgument() (#1653 )	2024-11-13 11:20:38 -05:00
Illia Silin	75c5bfa364	enable compilation for generic navi targets (#1645 )	2024-11-07 14:14:42 -08:00
darren-amd	d0e3a70a2e	Statically Cast Pointer Offset (#1631 ) * explicit cast ptr offset * formating change	2024-11-05 09:59:08 -08:00
Bartłomiej Kocot	9a8a52130d	Remove virtual destructors from unary ops (#1610 ) * Remove virtual destructors from unary ops * Fixes * Fixes * clang format fixes	2024-10-30 17:42:50 +01:00
Illia Silin	922e42a039	fix compilation errors for gfx12 with clang20 (#1606 )	2024-10-28 19:02:48 -07:00
Bartłomiej Kocot	31bf253aeb	Add dynamic elementwise op (#1426 ) * Add dynamic elementwise op Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> * CI issues fix * Custom parameter value for dynamic functions - Comments addressed --------- Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>	2024-10-26 15:22:37 +02:00
valarLip	37f7afed1e	add int8 gemm multiply multiply a8w8 (#1591 ) * add int8 gemm multiply multiply a8w8 * uncomment * clang-format-12 * Add example_gemm_multiply_multiply_xdl_int8 * Remove shell scripts * update preprocess number for mi308; bring back printout in ckprofiler * format --------- Co-authored-by: chenjun <junchen2@amd.com> Co-authored-by: Haocong WANG <haocwang@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com>	2024-10-26 16:39:34 +08:00
aledudek	9385caa306	Generic threshold calculation (#1546 ) * Calculate generic relative threshold pool3dfwd * Calculate absolute error threshold pool3d fwd * Generic threshold calculation take max input for relative error pool3dfwd * Remove max possible value for error calculation at runtime * Remove debug print in pool3dfwd * Pool3d fwd adjusted types in generic threshold calculation * Generic threshold calculation take into account number of accumulations and accdatatype * Generic threshold fix final error formula * Generic threshold calculation - num of accs fix * Generic threshold calculation - adjust absolute error * Generic threshold calculation - OutDataType in absolute error	2024-10-25 12:46:24 +02:00
Jatin Chaudhary	4d5248e2d1	Explicit cast values to half (#1593 ) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-10-22 11:17:32 -07:00
Rostyslav Geyyer	4cf70b36c1	Add custom type vector support (#1333 ) * Add non_native_vector_type * Add a test * Add non-native vector type * Fix CTOR * Fix non-native vector type of 1 * Fix CTORs * Use vector_type to cover non-native implementation as well * Update the test * Format * Format * Fix copyright years * Remove BoolVecT so far * Add AsType test cases * Update assert error message * Remove redundant type * Update naming * Add complex half type with tests * Add tests for vector reshaping * Add missing alignas * Update test/data_type/test_custom_type.cpp Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> * Compare custom types to built-in types * Add default constructor test * Add an alignment test --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>	2024-10-14 11:56:45 -05:00
Bartłomiej Kocot	f21cda2536	Add transpose scale amax example (#1547 ) * Add transpose scale amax example * fixes * Tune reduce instance	2024-10-14 17:39:38 +02:00
Adam Osewski	29d384d0b2	Implement GetWorkSpaceSize from BaseOperator. (#1564 )	2024-10-12 14:05:11 +08:00
Christopher Millette	ceaed8e097	Fixes small memory leak from missing hipEventDestroy (#1554 )	2024-10-09 09:41:35 +02:00
Illia Silin	7d8ea5f08b	Fix build logic using GRU_ARCHS. (#1536 ) * update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set	2024-10-07 08:18:23 -07:00
Bartłomiej Kocot	6b54d2faf8	Fix grouped gemm check to avoid overflow (#1545 )	2024-10-04 17:32:43 +02:00
macurtis-amd	aeb7c91f48	Fix compilation errors generated by forthcoming Clang changes (#1544 ) Without this change, the following diagnostic is generated: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] See C++17 spec [temp.names] p5.	2024-10-02 13:56:22 -07:00
Illia Silin	42e6dceacc	Fix compilation errors with Clang20.0. (#1533 ) * fix clang20 compilation errors for gfx90a * fix clang20 compilation errors for gfx11 targets	2024-09-25 13:45:38 -07:00
Bartłomiej Kocot	4ba52b35dc	Add support for NGCHW in grouped conv fwd (#1499 ) * Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes	2024-09-20 10:45:46 +02:00
Adam Osewski	0c39954da9	Remove unsupported (fp8) type from Add memory operation. (#1521 ) The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`	2024-09-20 09:40:45 +02:00
Jun Liu	81bc1496b2	Customize filesystem in CK for legacy systems (#1509 ) * Legacy support: customized filesystem * Update cmakefile for python alternative path * fix build issues * CK has no boost dependency * More fixes to issues found on legay systems * fix clang format issue * Check if blob is correctly generated in cmake * fix the python issues * add a compiler flag for codegen when using alternative python * use target_link_options instead of target_compile_options --------- Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-09-13 07:51:07 -07:00
Mateusz Ozga	448c0f56d8	Pool2d max/avg kernel in the BWD version (#1494 ) * Add pool2d instance BWD AVG * Add pool2d instance BWD MAX * Fix: avg review * Fix review: part2 * Fix - enable test when type is compiled * Fix review part3	2024-09-12 11:47:52 +02:00
jakpiase	e8d2887cb2	Rewrite pool2d fwd (#1462 ) * added pool2d fwd * add tests * add reviewers changes * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new" This reverts commit `6b2ba7ff89`, reversing changes made to `22c82bea0c`. * Revert "add reviewers changes" This reverts commit `22c82bea0c`. * added reviewers comments * revert some old files * add reviewers requests --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-09-11 15:21:00 +02:00
jakpiase	2a261afcdf	Added structural sparsity blockwise gemm (#1435 ) * Implemented smfmac xdlops * Added smfmac blockwise xdlops * fixes * add reviewers suggestions --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2024-09-11 15:19:42 +02:00
M.Emin Ozturk	8378855361	Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279 " (#1492 ) * issue fix, one line changed for tmp * clang --------- Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu> Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com>	2024-09-04 21:52:55 -07:00
Haocong WANG	5b10dae6a4	Add gemm universal bf16 instances (#1484 ) * revert ckprofiler change * temp save * Add test and test pass * test pass * Fix bug inside rotating buffer when tensor is not packed * bug fix * clang format --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-09-04 20:58:54 -07:00
Bartłomiej Kocot	73b67f290f	Add support for NGCHW in grouped conv bwd wei (#1491 ) * Add support for NGCHW in grouped conv bwd wei * Comments fixes * navi fixes * Update function names	2024-09-03 10:52:03 +02:00
Bartłomiej Kocot	a9b170b541	Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 )" (#1490 ) This reverts commit `5ff8eeebf9`.	2024-09-02 10:39:49 +02:00
Andriy Roshchenko	c3515f277c	Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 ) * Enable CMakePresets build * Verify Convolution, Scaling and ReLU algorithms. * Add tensor element-wise scale and type cast operation. * Reduction implemented but does not work. * Exploration of Reduction functionality. * Completed example for Convolution scaled with ReLu activation and AMAX reduction. * WIP: Add required instances for convolution. * WIP: Create client example. Implement convolution stage. * Add elementwise instances. * Add elementwise scale + convert example. * Add reduction instances. * WIP: Client example for AMAX reduction. * WIP: Add instances for multistage reduction. * WIP: Implementation of multistage reduction. * Refactoring. * Clean up. * Add CMakePresets.json * Guard off FP8 instances when the data type is not available. * Add example for Scaled FP8 Convolution with AMAX reduction. * Refactor CombConvScaleRelu instances. * Add CombConvScale instances. * Add client example for Scaled FP8 Convolution with AMAX reduction. * Cleanup.	2024-08-21 15:22:41 -07:00
Rostyslav Geyyer	e20f20efbf	Set RNE fp8 conversion as a default (#1458 ) * Set RNE fp8 conversion as a default * Update f8 tests * Disable failing test on gfx11 * Update bf8 tests * Add a flag * Fix the flag * Raise flag for gfx10 as well * Temp commit for tolerance testing * Update tolerances	2024-08-21 09:09:48 -07:00
Haocong WANG	3049b5467c	[GEMM] gemm_universal related optimization (#1453 ) * replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Jing Zhang <jizhan@meta.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-14 10:42:30 +08:00
Mateusz Ozga	0606e5498e	Support large: 12d tensor size for reduction kenrel (#1465 )	2024-08-13 16:15:47 +02:00
Bartłomiej Kocot	4a870942e6	Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457 ) * Fix typo in TransformConvFwdToGemm * Fix bug in n offset calculation	2024-08-10 13:12:05 +02:00
Jun Liu	5ff8eeebf9	Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 ) This reverts commit `33b399cc15`.	2024-08-08 19:09:33 -07:00
Juan Manuel Martinez Caamaño	901e5f1540	Remove reinterpret_cast uses that result in undefined behaviour. (#1445 ) * Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead. See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility Closes #1439 * fix clang format --------- Co-authored-by: illsilin <Illia.Silin@amd.com>	2024-08-07 11:49:02 -07:00
Juan Manuel Martinez Caamaño	fd9ef4e678	Add missing constexpr to if conditions (#1444 )	2024-08-06 11:40:34 -07:00
jakpiase	b74d4d4d54	Fix for beta!=0 in reduce (#1440 ) * fix for beta!=0 in reduce * add reviewers suggestions	2024-08-06 09:10:39 -07:00
Bartłomiej Kocot	4ec5c52a0c	Add Grouped Conv Fwd Large Tensor kernel (#1432 ) * Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes	2024-08-06 10:06:10 +02:00
arai713	d32997a792	Codegen: isSupportedArgument check (#1417 ) * added isSupportedArgument check into codegen device op * adding function call * remove commented code	2024-07-31 07:12:15 -07:00
Bartłomiej Kocot	33b399cc15	Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )	2024-07-30 18:36:04 +02:00
zjing14	105bd708c7	Add rotating buff for gemm_multi_d (#1411 ) * add rotating_buff for gemm_multi_d * format * Update flush_cache.hpp * Update gtest.cmake --------- Co-authored-by: Jing Zhang <jizhan@fb.com> Co-authored-by: Haocong WANG <haocwang@amd.com>	2024-07-25 23:21:21 +08:00
Andriy Roshchenko	4a8a1befd5	Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412 ) * Add CMakePresets configurations. * Add binary elementwise ConvScaleAdd and an example. * Numerical verification of results. Observed significant irregularities in F8 to F32 type conversions: ```log ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000 ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000 ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000 ``` * Implemented ConvScaleAdd + Example. * Add ConvScale+Bias Instances * Add Client Example for ConvScale+Bias * Fix number of bytes in an example.. * Cleanup.	2024-07-24 15:49:55 -05:00
Bartłomiej Kocot	ffabd70a15	Add support for half_t and bfloat to reduction operations (#1395 ) * Add support for half_t and bfloat to reduction operations * Fix bhalf convert * Next fix bf16	2024-07-24 12:12:37 -05:00

1 2 3 4 5 ...

501 Commits