Commit Graph

353 Commits

Author SHA1 Message Date
aledudek
f504e98b5d Extend pool3d fwd avg, max operations by f8_t, int8_t types (#1483)
* Extend pool3d fwd avg, max operations by f8_t, int8_t types

* Pack MaxPool3dFwd params together

* Fix MaxPool3dFwd AVG instances

* Decrease verification precision for bf16

* Adjust tests + review changes

* Adjust threshold for F8

* Adjusted compute types for MAX op instances

* Fix ComputeDataType mismatch in tests and profiler for AVG

* Fix naming from max_pool3d_fwd to pool3d_fwd

* Adjust CMakeLists

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: a793afc961]
2024-09-17 15:57:10 +02:00
Mateusz Ozga
d7326fb525 This commit contains implementation of max pool2d for f8 type (#1506)
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 6834e5ee74]
2024-09-16 10:15:06 +02:00
bibek
50ec07c3e3 Fix duplicate CMake tidy-target issue (#1513)
[ROCm/composable_kernel commit: 49e012dee1]
2024-09-13 21:15:04 -07:00
jakpiase
4940f07a4b Add pool2d int8 and fp8 instances (#1508)
* add pool2d fp8 and int8

* minor fixes

* add formatting

* add reviewer suggestions

* add reviewer suggestions

[ROCm/composable_kernel commit: 8f8a2ce396]
2024-09-13 10:18:21 -07:00
Jun Liu
3739cf9f74 Customize filesystem in CK for legacy systems (#1509)
* Legacy support: customized filesystem

* Update cmakefile for python alternative path

* fix build issues

* CK has no boost dependency

* More fixes to issues found on legay systems

* fix clang format issue

* Check if blob is correctly generated in cmake

* fix the python issues

* add a compiler flag for codegen when using alternative python

* use target_link_options instead of target_compile_options

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 81bc1496b2]
2024-09-13 07:51:07 -07:00
Mateusz Ozga
92d1b386b2 Pool2d max/avg kernel in the BWD version (#1494)
* Add pool2d instance BWD AVG

* Add pool2d instance BWD MAX

* Fix: avg review

* Fix review: part2

* Fix - enable test when type is compiled

* Fix review part3

[ROCm/composable_kernel commit: 448c0f56d8]
2024-09-12 11:47:52 +02:00
jakpiase
cb4975cf70 Rewrite pool2d fwd (#1462)
* added pool2d fwd

* add tests

* add reviewers changes

* Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"

This reverts commit 6b2ba7ff89, reversing
changes made to 22c82bea0c.

* Revert "add reviewers changes"

This reverts commit 22c82bea0c.

* added reviewers comments

* revert some old files

* add reviewers requests

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: e8d2887cb2]
2024-09-11 15:21:00 +02:00
Haocong WANG
4e4514caa8 Add gemm universal bf16 instances (#1484)
* revert ckprofiler change

* temp save

* Add test and test pass

* test pass

* Fix bug inside rotating buffer when tensor is not packed

* bug fix

* clang format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 5b10dae6a4]
2024-09-04 20:58:54 -07:00
Illia Silin
132c89b29d copy all fmha headers when building library (#1497)
* copy all fmha headers when building library

* fix the rocm_install call for mha headers

[ROCm/composable_kernel commit: 8b95d9ad52]
2024-09-04 07:36:41 -07:00
Illia Silin
234bc58d2d Add an option to select an alternative python version during build. (#1496)
* locate a newwer version of python when -DRHEL=ON flag is set

* allow setting python version on cmake command line

[ROCm/composable_kernel commit: 841009c5ee]
2024-09-04 07:36:27 -07:00
Bartłomiej Kocot
950165c6fb Add support for NGCHW in grouped conv bwd wei (#1491)
* Add support for NGCHW in grouped conv bwd wei

* Comments fixes

* navi fixes

* Update function names

[ROCm/composable_kernel commit: 73b67f290f]
2024-09-03 10:52:03 +02:00
Bartłomiej Kocot
9974926658 Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)" (#1490)
This reverts commit 725dd433cdc6435d481e806b5442a07b0097c94a.

[ROCm/composable_kernel commit: a9b170b541]
2024-09-02 10:39:49 +02:00
Andriy Roshchenko
f6c6819b47 Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473)
* Enable CMakePresets build

* Verify Convolution, Scaling and ReLU algorithms.

* Add tensor element-wise scale and type cast operation.

* Reduction implemented but does not work.

* Exploration of Reduction functionality.

* Completed example for Convolution scaled with ReLu activation and AMAX reduction.

* WIP: Add required instances for convolution.

* WIP: Create client example. Implement convolution stage.

* Add elementwise instances.

* Add elementwise scale + convert example.

* Add reduction instances.

* WIP: Client example for AMAX reduction.

* WIP: Add instances for multistage reduction.

* WIP: Implementation of multistage reduction.

* Refactoring.

* Clean up.

* Add CMakePresets.json

* Guard off FP8 instances when the data type is not available.

* Add example for Scaled FP8 Convolution with AMAX reduction.

* Refactor CombConvScaleRelu instances.

* Add CombConvScale instances.

* Add client example for Scaled FP8 Convolution with AMAX reduction.

* Cleanup.

[ROCm/composable_kernel commit: c3515f277c]
2024-08-21 15:22:41 -07:00
Rostyslav Geyyer
0ab95a332e Set RNE fp8 conversion as a default (#1458)
* Set RNE fp8 conversion as a default

* Update f8 tests

* Disable failing test on gfx11

* Update bf8 tests

* Add a flag

* Fix the flag

* Raise flag for gfx10 as well

* Temp commit for tolerance testing

* Update tolerances

[ROCm/composable_kernel commit: e20f20efbf]
2024-08-21 09:09:48 -07:00
Andriy Roshchenko
10edb0c70e Adding Instances and Examples for FP8-based Scaled Convolution with ReLU Activation and AMAX Reduction. (#1469)
* Enable CMakePresets build

* Verify Convolution, Scaling and ReLU algorithms.

* Add tensor element-wise scale and type cast operation.

* Reduction implemented but does not work.

* Exploration of Reduction functionality.

* Completed example for Convolution scaled with ReLu activation and AMAX reduction.

* WIP: Add required instances for convolution.

* WIP: Create client example. Implement convolution stage.

* Add elementwise instances.

* Add elementwise scale + convert example.

* Add reduction instances.

* WIP: Client example for AMAX reduction.

* WIP: Add instances for multistage reduction.

* WIP: Implementation of multistage reduction.

* Refactoring.

* Clean up.

* Guard off FP8 instances when the data type is not available.

* Improve output readability.

* Addressing reviewer's comments.

[ROCm/composable_kernel commit: a94113a941]
2024-08-20 10:30:56 -05:00
Illia Silin
ad65d8d5b0 Re-enable fp8 types for all architectures. (#1470)
* re-enable fp8 and bf8 for all targets

* restore the fp8 gemm instances

* re-enable conv_3d fp8 on all architectures

* diasble several fp8 gemm instances on all architectures except gfx94

* clang format fix

[ROCm/composable_kernel commit: c8b6b64240]
2024-08-16 16:07:52 -06:00
Haocong WANG
65d6442b4c [GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 3049b5467c]
2024-08-14 10:42:30 +08:00
AngryLoki
6a4b36d948 Fix compilation errors with libc++ (#1461)
This fixes 2 issues when compiled with libc++.

First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min().
_Float16 is extension of libstdc++, it does not exist in C++ standard[2].
Luckily, there is NumericLimits class in composable_kernel, which does everything needed.

Second issue with call to 'check_err' is ambiguous: there are 2 candidates.
It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait.
However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral).

Closes: #1460

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>

[ROCm/composable_kernel commit: 50c423481b]
2024-08-13 14:31:15 -05:00
Illia Silin
92df7893df Disable inapplicable xdl and mha instances for gfx12 (#1464)
[ROCm/composable_kernel commit: cbb6f2ab8c]
2024-08-12 15:11:58 -07:00
Jun Liu
254a7dadb6 Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)
This reverts commit 0c367d5912486f4fcbae1dbb861a1fb8176ca308.

[ROCm/composable_kernel commit: 5ff8eeebf9]
2024-08-08 19:09:33 -07:00
bibek
c8c3293b0b adding mha as static lib (#1366)
* adding mha as static lib

* add fmha fwd compile options

* typo

* fix python version

* python version to 3

* increase path length

* add max path flag in mha cmake

* fix long path issue

* mha currently only runs in gfx94x

* only buld mha in mi300

* populate gpu_list

* add mha compile flags

* avoid building mha in gpu other then gfx94x

* some comments and  include ck_tile in rocm

* use rocm_install

* place ck_tile in include

* correct ck_tile path

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 840c5397bb]
2024-08-06 11:17:10 -05:00
Bartłomiej Kocot
69a6b563f9 Add Grouped Conv Fwd Large Tensor kernel (#1432)
* Support 64 bit indexing

* Add new grouped conv fwd kernel for large tensors

* Add instances large tensor

* Fixes for transform conv to gemm

* Fixes

* fixes

* Remove not needed instances

* examples fixes

* Remove not need ds arrays

* Fix tests

* Add 2GB check in gridwise dl

* Fixes

[ROCm/composable_kernel commit: 4ec5c52a0c]
2024-08-06 10:06:10 +02:00
Illia Silin
8f71de4707 add --offload-compress compiler flag (#1433)
* add --offload-compress compiler flag

* only apply the --offload-compress flag to the ckProfiler

* move the --offload-compress flag back to main cmake file

* add offload-compress to target compile option of ckProfiler

---------

Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: 7f57b2e02c]
2024-08-05 23:26:01 +08:00
Bartłomiej Kocot
1567614d80 Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)
[ROCm/composable_kernel commit: 33b399cc15]
2024-07-30 18:36:04 +02:00
Andriy Roshchenko
e3b469a493 Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)
* Add CMakePresets configurations.

* Add binary elementwise ConvScaleAdd and an example.

* Numerical verification of results.

Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000   f8_t=160.000000    e=144.000000
ConvScaleAdd: float=97.000000   f8_t=96.000000    e=104.000000
ConvScaleAdd: float=65.000000   f8_t=64.000000    e=72.000000
```

* Implemented ConvScaleAdd + Example.

* Add ConvScale+Bias Instances

* Add Client Example for ConvScale+Bias

* Fix number of bytes in an example..

* Cleanup.

[ROCm/composable_kernel commit: 4a8a1befd5]
2024-07-24 15:49:55 -05:00
Haocong WANG
c69df380b9 disable bad instance (#1410)
[ROCm/composable_kernel commit: d22713a719]
2024-07-23 09:05:03 -07:00
Bartłomiej Kocot
b23a3fcf77 Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406)
[ROCm/composable_kernel commit: 5d8c3d8190]
2024-07-22 14:21:24 +02:00
Haocong WANG
a0e0f3cdcc [GEMM] F8 GEMM, performance optimized. (#1384)
* add ab_scale init support

* enabled interwave

* add scale type; update isSupport

* adjust example

* clean

* enable f8 pure gemm rcr ckprofiler

* Add gemm_multiply_multiply instances

* clang format

* Optimize for ScaleBlockMNK=128

* enable abscale f8 gemm ck profiler

* Add pure f8 gemm test suite

* Reverting to the state of project at f60fd77

* update copyright

* clang format

* update copyright

---------

Co-authored-by: root <jizhan@amd.com>

[ROCm/composable_kernel commit: 8c90f25be3]
2024-07-19 22:06:52 +08:00
ltqin
50c6703b31 Universal gemm splitk using reduce (with multi-d) (#1341)
* init for reduce_threadwise multi_d

* add reduce_threadwise_multi_d

* add reduce_multi_d

* clean

* start add an other splitk device op

* add reduce template parameter to SplitKBatchOffset

* add reduce c matrix

* clean up code

* change example data type to bf16

* add bf16Ai8B example

* remove reduce template parameter

* add splitk atomic status to v4

* example add multi d parameters

* device op add multi-d parameters

* add multi-d to reduce

* fix kbach=1 bug

* change B layout to col in  bf16Ai8B example

* remove float adding struct

* change  multi-d interface

* change file and class name

* remove multi-d of bf16Ai8B example

* change IsReduce function to IsReduceAdd

* change example layout to RRR from RCR

* according layout to set ds stride

* reset parameter layout

* add gemm universal reduce instance

* add reduce factory

* add profile_gemm_universal_reduce

* add reduce to profiler

* fix reduce instance

* fix profiler reduce compiling bug

* format

* format library instance code

* add mem instance for reduce library

* fix call instance names

* add workspace for reduce in ckProfiler

* format

* add mnpading to reduce library instance

* add fp16 instance to reduce of profiler

* change copyright time

* restore profiler cmake file

* add reduce text to instances

* add DsLayout and DsDataType to instances template parameter

* fixed gemm_reduce_multi_d

* add an example without multi_d

* Update common.hpp

* Update gtest.cmake

* Update gemm_xdl_splitk_reduce_bf16.cpp

* clean

* Update gtest.cmake

* format

* fixe api

* format

* default parameter change to RRR

* add vector_len for multi_d

* format

* Update gtest.cmake

* fix bf16A iBB elementwiseop

* add ReduceDataType

* move ReduceDataType to end position

* format

* remove googletest git method  address

* fix copyright time

* update init data

---------

Co-authored-by: root <jizhan@amd.com>
Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: c544eb4da0]
2024-07-19 22:01:22 +08:00
Andriy Roshchenko
a765481437 Adding more instances of grouped convolution 3d forward for FP8 with ConvScale element-wise operation and ReLU activation. (#1386)
* Add CMakePresets configurations.

* Add ConvScale+ReLU Functor and an Example

* Account for ReLU FLOPs.

* Add instances of 3D convolutions with ConvscaleRelu operation.

* Implement Client Example

* Cleanup

[ROCm/composable_kernel commit: 802a8a1df1]
2024-07-16 08:51:49 -07:00
Haocong WANG
d25cc9596c Disbale failed instance in rocm6.2 rel (#1388)
[ROCm/composable_kernel commit: 1ff4f25138]
2024-07-16 08:46:48 -07:00
Bartłomiej Kocot
07ca6dacf1 Support access per groups and filter3x3 in grouped conv fwd (#1382)
* Support access per groups and filter3x3 in grouped conv fwd

* Fixes for large cases

* Fixes for large tensors

[ROCm/composable_kernel commit: 82e8a78a3f]
2024-07-12 11:08:42 -07:00
Rostyslav Geyyer
2cd91e7d12 Add instances for grouped conv fwd 3d with ConvScale for bf8@fp8->fp8 (#1369)
* Add an example

* Add instances

* Add a client example

[ROCm/composable_kernel commit: 7a46a91c84]
2024-07-11 13:31:39 -07:00
Illia Silin
3e1d0b3d5d Fix the cmake logic when building with INSTANCES_ONLY=ON. (#1376)
* fix the cmake logic when building for various targets

* another minor fix

[ROCm/composable_kernel commit: a328df25a1]
2024-07-08 21:21:16 -07:00
Andriy Roshchenko
0675d258a6 Add ckProfiler support for forward 3D convolutions with OUT element-wise operations. (#1354)
[ROCm/composable_kernel commit: eb44e0472a]
2024-07-08 10:55:54 -07:00
Harisankar Sadasivan
c5f81450e1 Universal streamk with atomics (#1360)
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). 

* Update README.md

* fixing clang-format issues

* removed conflicts in struct members between streamk and universal streamk

* corrected arg parsing for streamk and universal streamk

* added stream-k policies for 3 tile and 4 tile

* fixed argument type issue with parsing cmd args

* changes suggested in PR review are made- removing comments and correcting copyright

* file permissions updated

* added default value support for grid_size and streamk-policy selection set to -1

* print messages for arguments

* print messages for arguments

* print messages for arguments1

[ROCm/composable_kernel commit: 75e622f02f]
2024-07-05 21:40:30 -07:00
jakpiase
3a04bdded7 Add structural sparsity gemm instruction tests (#1309)
* first version of smfmac test

* add reviewer comments

* add reviewer suggestions

[ROCm/composable_kernel commit: ed21948bcd]
2024-06-27 11:30:32 +02:00
Illia Silin
cd1e33cce4 Merging the gfx12 code into public repo. (#1362)
[ROCm/composable_kernel commit: 941d1f7ce0]
2024-06-27 00:33:34 -07:00
Andriy Roshchenko
e05bfee7e5 Add instances of grouped convolution 3d forward with a ConvScale element-wise op for bf8@bf8->fp8 (#1326)
We are adding more instances of grouped convolution 3d forward with a ConvScale element-wise operation.
This commit handles bf8@bf8->fp8 data types combination.

* Included an example.
* Added instances.
* Added a client example.

---------

Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 05b10e0e5a]
2024-06-21 19:02:57 -06:00
jakpiase
92853de60e Switch to universal gemm in grouped gemm tile loop (#1335)
* switch to universal gemm in grouped gemm tile loop

* minor fixes

* add reviewers comments

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: e2d139201b]
2024-06-18 09:01:49 -05:00
Rostyslav Geyyer
2832eb1444 Add instances for grouped conv fwd 3d with ConvScale for fp8@bf8->fp8 (#1325)
* Add fp8 bf8 conv example

* Add instances

* Add client example

* Add random scale values

* Format

[ROCm/composable_kernel commit: acda4c5a3c]
2024-06-12 14:41:56 -05:00
Bartłomiej Kocot
f1600a0db5 Fix nhwgc f16 wmma instances (#1328)
[ROCm/composable_kernel commit: 5fc1bee4c5]
2024-06-11 09:52:38 +02:00
Rostyslav Geyyer
9416b16080 Add a convinvscale op, related instances and examples (#1307)
* Update the element op

* Add an example

* Add instances

* Add a client example

* make sure new instances only build on gfx9

* Update element op and its handling

* Format

* Update instances to take element op as an argument

* Update examples to use random scale values

* Format

* Update client example with random scales

* Format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: ce66277a76]
2024-06-10 14:48:49 -05:00
Bartłomiej Kocot
4716f8f70b Integrate universal gemm with conv forward (#1320)
* Integrate universal gemm with conv fwd

* Fix conv fwd wmma test

* Fix instances

* Remove direct load check

[ROCm/composable_kernel commit: ac58cc5d1d]
2024-06-05 13:01:29 -05:00
Rostyslav Geyyer
692ae331ca Add a scale op, related instances and examples (#1242)
* Add a scale op

* Update the element op

* Add instances

* Add an example

* Add a client example

* Add a flag check

* Revert flag check addition

* Fix flag check

* Update d strides in example

* Update d strides in client example

* Apply suggestions from code review

Update copyright header

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Move the example

* Move the client example

* Update element op

* Update example with the new element op

* Add scalar layout

* Update example

* Update kernel for scalar Ds

* Revert kernel changes

* Update element op

* Update example to use scales' pointers

* Format

* Update instances

* Update client example

* Move element op to unary elements

* Update element op to work with values instead of pointers

* Update instances to take element op as an argument

* Update examples to use random scale values

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: cb0645bedc]
2024-06-04 19:28:15 -05:00
Illia Silin
c6b1a8b2e9 Split the gemm_multi_abd instances. (#1306)
* split the gemm_multi_abd instances

* update the dates

[ROCm/composable_kernel commit: ec2bae27ff]
2024-05-23 09:17:02 -07:00
Bartłomiej Kocot
c6431f6c07 Optimize grouped conv bwd weight for small M and N (#1303)
* Optimize grouped conv bwd weight for small M and N

* Fixes

[ROCm/composable_kernel commit: fd72380aeb]
2024-05-22 21:01:01 +02:00
Illia Silin
6cf9f7f72c Select appropriate GPU targets for instances, tests, and examples. (#1304)
* set individual gpu targets for instances, examples, tests

* fix path to hip compiler

* fix path to hip compiler once more

* aggregate device macros in ck_tile config header

* fix the cmake logic for instances

* fix clang format

* add gfx900 and gfx906 to default set of targets

[ROCm/composable_kernel commit: 7b027d5643]
2024-05-22 11:45:27 -07:00
Bartłomiej Kocot
b7ee312021 Change output gemm type to AccDataType in two stage conv bwd wei (#1283)
[ROCm/composable_kernel commit: 8346af9c68]
2024-05-10 10:57:42 +02:00
Bartłomiej Kocot
68b2757f11 Add two stage grouped conv bwd weight kernel (#1280)
[ROCm/composable_kernel commit: 0b6b5d1785]
2024-05-08 09:53:24 +02:00