aska-0096
26d5174e15
update instance and lds layout strategy
2024-11-26 07:29:38 +00:00
aska-0096
ea90b01fc9
fix bug in enable f8 gemm inside ckProfiler
2024-11-20 09:33:39 +00:00
aska-0096
c99e3d595e
Merge branch 'mem_gemm_opt' of https://github.com/ROCm/composable_kernel into update_cka8w8
2024-11-20 05:41:33 +00:00
aska-0096
ec6b000c77
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8
2024-11-19 08:50:37 +00:00
Bartłomiej Kocot
754adc70e3
Batched GEMM Multiple D based on Universal GEMM ( #1655 )
...
* Batched GEMM Multiple D based on Universal GEMM
Co-authored-by: Jing Zhang <jizhan@fb.com >
* CI fixes
Co-authored-by: Jing Zhang <jizhan@fb.com >
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
2024-11-18 14:03:45 +01:00
aska-0096
f3bbfe3efe
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8
2024-11-18 07:32:39 +00:00
aska-0096
2b840f5a85
reduce prefetch stage in blockwisepipev4
2024-11-18 07:32:30 +00:00
Illia Silin
efd9261545
fix clang format ( #1662 )
2024-11-13 09:20:18 -08:00
Taylor Ding
73f02a1083
Move checks for compatibility from Argument() to IsSupportedArgument() ( #1653 )
2024-11-13 11:20:38 -05:00
darren-amd
d0e3a70a2e
Statically Cast Pointer Offset ( #1631 )
...
* explicit cast ptr offset
* formating change
2024-11-05 09:59:08 -08:00
aska-0096
f20e48f1f4
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8
2024-11-05 07:03:42 +00:00
Bartłomiej Kocot
9a8a52130d
Remove virtual destructors from unary ops ( #1610 )
...
* Remove virtual destructors from unary ops
* Fixes
* Fixes
* clang format fixes
2024-10-30 17:42:50 +01:00
aska-0096
b3e5048f12
tempsave
2024-10-30 07:38:59 +00:00
Illia Silin
922e42a039
fix compilation errors for gfx12 with clang20 ( #1606 )
2024-10-28 19:02:48 -07:00
Bartłomiej Kocot
31bf253aeb
Add dynamic elementwise op ( #1426 )
...
* Add dynamic elementwise op
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com >
* CI issues fix
* Custom parameter value for dynamic functions - Comments addressed
---------
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com >
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com >
2024-10-26 15:22:37 +02:00
valarLip
37f7afed1e
add int8 gemm multiply multiply a8w8 ( #1591 )
...
* add int8 gemm multiply multiply a8w8
* uncomment
* clang-format-12
* Add example_gemm_multiply_multiply_xdl_int8
* Remove shell scripts
* update preprocess number for mi308; bring back printout in ckprofiler
* format
---------
Co-authored-by: chenjun <junchen2@amd.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-10-26 16:39:34 +08:00
chenjun
1670bba95f
clang-format-12
2024-10-21 23:16:04 +08:00
chenjun
7fb0b3223c
add int8 gemm multiply multiply a8w8
2024-10-21 21:57:41 +08:00
Bartłomiej Kocot
f21cda2536
Add transpose scale amax example ( #1547 )
...
* Add transpose scale amax example
* fixes
* Tune reduce instance
2024-10-14 17:39:38 +02:00
Adam Osewski
29d384d0b2
Implement GetWorkSpaceSize from BaseOperator. ( #1564 )
2024-10-12 14:05:11 +08:00
Bartłomiej Kocot
6b54d2faf8
Fix grouped gemm check to avoid overflow ( #1545 )
2024-10-04 17:32:43 +02:00
macurtis-amd
aeb7c91f48
Fix compilation errors generated by forthcoming Clang changes ( #1544 )
...
Without this change, the following diagnostic is generated:
a template argument list is expected after a name prefixed by the template
keyword [-Wmissing-template-arg-list-after-template-kw]
See C++17 spec [temp.names] p5.
2024-10-02 13:56:22 -07:00
Illia Silin
42e6dceacc
Fix compilation errors with Clang20.0. ( #1533 )
...
* fix clang20 compilation errors for gfx90a
* fix clang20 compilation errors for gfx11 targets
2024-09-25 13:45:38 -07:00
Bartłomiej Kocot
4ba52b35dc
Add support for NGCHW in grouped conv fwd ( #1499 )
...
* Support NGCHW in grouped conv fwd
* Remove not needed variable
* Fixes
2024-09-20 10:45:46 +02:00
Mateusz Ozga
448c0f56d8
Pool2d max/avg kernel in the BWD version ( #1494 )
...
* Add pool2d instance BWD AVG
* Add pool2d instance BWD MAX
* Fix: avg review
* Fix review: part2
* Fix - enable test when type is compiled
* Fix review part3
2024-09-12 11:47:52 +02:00
jakpiase
e8d2887cb2
Rewrite pool2d fwd ( #1462 )
...
* added pool2d fwd
* add tests
* add reviewers changes
* Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"
This reverts commit 6b2ba7ff89 , reversing
changes made to 22c82bea0c .
* Revert "add reviewers changes"
This reverts commit 22c82bea0c .
* added reviewers comments
* revert some old files
* add reviewers requests
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-09-11 15:21:00 +02:00
jakpiase
2a261afcdf
Added structural sparsity blockwise gemm ( #1435 )
...
* Implemented smfmac xdlops
* Added smfmac blockwise xdlops
* fixes
* add reviewers suggestions
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-09-11 15:19:42 +02:00
Haocong WANG
0b3a409d4f
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into mem_gemm_opt
2024-09-06 03:22:06 +00:00
M.Emin Ozturk
8378855361
Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279 " ( #1492 )
...
* issue fix, one line changed for tmp
* clang
---------
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu >
Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com >
2024-09-04 21:52:55 -07:00
Haocong WANG
5b10dae6a4
Add gemm universal bf16 instances ( #1484 )
...
* revert ckprofiler change
* temp save
* Add test and test pass
* test pass
* Fix bug inside rotating buffer when tensor is not packed
* bug fix
* clang format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-09-04 20:58:54 -07:00
aska-0096
dbfcb380cd
temp save
2024-09-05 03:04:31 +00:00
aska-0096
cc404d1190
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into mem_gemm_opt
2024-09-04 15:18:52 +00:00
aska-0096
41fcfbc64e
clang format
2024-09-04 15:11:51 +00:00
aska-0096
6df91708a6
temp save
2024-09-04 14:32:02 +00:00
Bartłomiej Kocot
73b67f290f
Add support for NGCHW in grouped conv bwd wei ( #1491 )
...
* Add support for NGCHW in grouped conv bwd wei
* Comments fixes
* navi fixes
* Update function names
2024-09-03 10:52:03 +02:00
aska-0096
4885c38aa4
Merge branch 'transpose_opt' of https://github.com/ROCm/composable_kernel into rowwise_opt
2024-09-03 08:37:45 +00:00
aska-0096
7c8e92face
tempsave
2024-09-03 07:53:04 +00:00
aska-0096
5d9c964ece
temp save
2024-09-02 09:55:07 +00:00
Bartłomiej Kocot
a9b170b541
Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )" ( #1455 )" ( #1490 )
...
This reverts commit 5ff8eeebf9 .
2024-09-02 10:39:49 +02:00
aska-0096
cbf14ee192
tempsave, epilogue optimization for universal gemm done. TODO: mulitpleD epilogue optimization
2024-09-01 08:10:43 +00:00
aska-0096
1ca98e75ce
tempsave
2024-08-26 09:48:07 +00:00
aska-0096
9a99c8416b
temp save
2024-08-26 02:10:18 +00:00
aska-0096
4f65f7b387
tempsave
2024-08-22 07:30:46 +00:00
Andriy Roshchenko
c3515f277c
Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. ( #1473 )
...
* Enable CMakePresets build
* Verify Convolution, Scaling and ReLU algorithms.
* Add tensor element-wise scale and type cast operation.
* Reduction implemented but does not work.
* Exploration of Reduction functionality.
* Completed example for Convolution scaled with ReLu activation and AMAX reduction.
* WIP: Add required instances for convolution.
* WIP: Create client example. Implement convolution stage.
* Add elementwise instances.
* Add elementwise scale + convert example.
* Add reduction instances.
* WIP: Client example for AMAX reduction.
* WIP: Add instances for multistage reduction.
* WIP: Implementation of multistage reduction.
* Refactoring.
* Clean up.
* Add CMakePresets.json
* Guard off FP8 instances when the data type is not available.
* Add example for Scaled FP8 Convolution with AMAX reduction.
* Refactor CombConvScaleRelu instances.
* Add CombConvScale instances.
* Add client example for Scaled FP8 Convolution with AMAX reduction.
* Cleanup.
2024-08-21 15:22:41 -07:00
Haocong WANG
3049b5467c
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-14 10:42:30 +08:00
Mateusz Ozga
0606e5498e
Support large: 12d tensor size for reduction kenrel ( #1465 )
2024-08-13 16:15:47 +02:00
Bartłomiej Kocot
4a870942e6
Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle ( #1457 )
...
* Fix typo in TransformConvFwdToGemm
* Fix bug in n offset calculation
2024-08-10 13:12:05 +02:00
Jun Liu
5ff8eeebf9
Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )" ( #1455 )
...
This reverts commit 33b399cc15 .
2024-08-08 19:09:33 -07:00
jakpiase
b74d4d4d54
Fix for beta!=0 in reduce ( #1440 )
...
* fix for beta!=0 in reduce
* add reviewers suggestions
2024-08-06 09:10:39 -07:00
Bartłomiej Kocot
4ec5c52a0c
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
2024-08-06 10:06:10 +02:00