Commit Graph

1383 Commits

Author SHA1 Message Date
danyao12
3d5b0755ef non-iglp pipeline for headdim padding cases 2024-08-02 10:59:52 +00:00
danyao12
f8b146186d fix hd64 scratches and boost performance 2024-07-30 14:52:29 +00:00
danyao12
5d2a5a1131 more strides for fa integration 2024-07-30 10:57:22 +00:00
danyao12
fd28454d3a receipt 3 for simplified smoke test 2024-07-29 15:34:45 +00:00
danyao12
76e95a5e00 fix hd128 scratches and boost performance 2024-07-29 14:36:30 +08:00
danyao12
ad3e94bbaa fwd dropout revert 2024-07-28 17:51:48 +08:00
danyao12
a0c92495ea codegen update 2024-07-28 15:28:35 +08:00
danyao12
7e9d2390cc dq_acc stride stuff 2024-07-27 16:23:16 +08:00
danyao12
224a7b0244 dq_acc stride 2024-07-27 16:12:11 +08:00
danyao12
99ed2c1ae3 code sync up 2024-07-26 18:41:18 +08:00
danyao12
3552041a70 Merge branch 'develop' into ck_tile/fa_bwd_opt 2024-07-26 18:05:24 +08:00
Dan Yao
e892711040 Merge pull request #1419 from ROCm/ck_tile/fa_bwd_opt_clean
Remove duplicated codes for creating WarpGemm
2024-07-26 12:20:48 +08:00
danyao12
ed8ef7e58f dropout patch for mrepeat 16*16 2024-07-26 12:10:43 +08:00
trixirt
733f33af78 Introduce cmake USE_GLIBCXX_ASSERTIONS option (#1404)
A standard option in Fedora packaging that is used to check
the correctness of c++ use of the standard c++ library.

Signed-off-by: Tom Rix <trix@redhat.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2024-07-25 19:28:17 -07:00
Qianfeng Zhang
5a561b5e1d Remove duplicated WarpGemm definitions in the policy file 2024-07-25 16:36:37 +00:00
zjing14
105bd708c7 Add rotating buff for gemm_multi_d (#1411)
* add rotating_buff for gemm_multi_d

* format

* Update flush_cache.hpp

* Update gtest.cmake

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>
2024-07-25 23:21:21 +08:00
danyao12
94c957b3db revert 2024-07-25 20:51:02 +08:00
danyao12
ca2a0ebd93 Merge branch 'ck_tile/fa_bwd_opt' of https://github.com/ROCm/composable_kernel into ck_tile/fa_bwd_opt 2024-07-25 16:16:58 +08:00
danyao12
dcc3593fe4 fix hd32 error and boost performance 2024-07-25 16:16:30 +08:00
dependabot[bot]
1208082e53 Bump rocm-docs-core from 1.5.1 to 1.6.0 in /docs/sphinx (#1416)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.5.1 to 1.6.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.1...v1.6.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 22:56:29 -07:00
Andriy Roshchenko
4a8a1befd5 Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)
* Add CMakePresets configurations.

* Add binary elementwise ConvScaleAdd and an example.

* Numerical verification of results.

Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000   f8_t=160.000000    e=144.000000
ConvScaleAdd: float=97.000000   f8_t=96.000000    e=104.000000
ConvScaleAdd: float=65.000000   f8_t=64.000000    e=72.000000
```

* Implemented ConvScaleAdd + Example.

* Add ConvScale+Bias Instances

* Add Client Example for ConvScale+Bias

* Fix number of bytes in an example..

* Cleanup.
2024-07-24 15:49:55 -05:00
Bartłomiej Kocot
ffabd70a15 Add support for half_t and bfloat to reduction operations (#1395)
* Add support for half_t and bfloat to reduction operations

* Fix bhalf convert

* Next fix bf16
2024-07-24 12:12:37 -05:00
dependabot[bot]
33b2a2bdf5 Bump rocm-docs-core from 1.5.0 to 1.5.1 in /docs/sphinx (#1414)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.5.0 to 1.5.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.0...v1.5.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-24 07:10:50 -07:00
rocking
85ec3c7584 Do not store storerandval in bwd for flash attention integration 2024-07-24 12:38:31 +00:00
danyao12
b2510c0541 fix dq alignment 2024-07-24 12:28:41 +08:00
Haocong WANG
d22713a719 disable bad instance (#1410) 2024-07-23 09:05:03 -07:00
danyao12
da2dce1866 group convert_dq opt 2024-07-23 14:16:21 +08:00
danyao12
b69499b933 fix fwd dropout 2024-07-23 10:20:12 +08:00
Bartłomiej Kocot
5d8c3d8190 Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) 2024-07-22 14:21:24 +02:00
danyao12
0d93f4a068 fix epilogue problem 2024-07-22 18:05:45 +08:00
danyao12
edb77c8db8 comments 2024-07-22 13:26:05 +08:00
danyao12
260ace4b78 code cleanup 2024-07-22 11:35:34 +08:00
danyao12
9b4b4622c1 code cleanup 2024-07-22 11:06:04 +08:00
danyao12
06f575a3b7 refactor dropout 2024-07-21 17:34:40 +08:00
danyao12
99436cd4c6 save clear_tile 2024-07-20 17:21:12 +08:00
danyao12
b3100b6f43 remove FmhaBwdTilePartitioner 2024-07-20 16:09:14 +08:00
danyao12
9d78a6c506 comments 2024-07-20 15:00:02 +08:00
danyao12
42a7240a19 do{}while() -> while(){} 2024-07-20 14:52:35 +08:00
Haocong WANG
8c90f25be3 [GEMM] F8 GEMM, performance optimized. (#1384)
* add ab_scale init support

* enabled interwave

* add scale type; update isSupport

* adjust example

* clean

* enable f8 pure gemm rcr ckprofiler

* Add gemm_multiply_multiply instances

* clang format

* Optimize for ScaleBlockMNK=128

* enable abscale f8 gemm ck profiler

* Add pure f8 gemm test suite

* Reverting to the state of project at f60fd77

* update copyright

* clang format

* update copyright

---------

Co-authored-by: root <jizhan@amd.com>
2024-07-19 22:06:52 +08:00
ltqin
c544eb4da0 Universal gemm splitk using reduce (with multi-d) (#1341)
* init for reduce_threadwise multi_d

* add reduce_threadwise_multi_d

* add reduce_multi_d

* clean

* start add an other splitk device op

* add reduce template parameter to SplitKBatchOffset

* add reduce c matrix

* clean up code

* change example data type to bf16

* add bf16Ai8B example

* remove reduce template parameter

* add splitk atomic status to v4

* example add multi d parameters

* device op add multi-d parameters

* add multi-d to reduce

* fix kbach=1 bug

* change B layout to col in  bf16Ai8B example

* remove float adding struct

* change  multi-d interface

* change file and class name

* remove multi-d of bf16Ai8B example

* change IsReduce function to IsReduceAdd

* change example layout to RRR from RCR

* according layout to set ds stride

* reset parameter layout

* add gemm universal reduce instance

* add reduce factory

* add profile_gemm_universal_reduce

* add reduce to profiler

* fix reduce instance

* fix profiler reduce compiling bug

* format

* format library instance code

* add mem instance for reduce library

* fix call instance names

* add workspace for reduce in ckProfiler

* format

* add mnpading to reduce library instance

* add fp16 instance to reduce of profiler

* change copyright time

* restore profiler cmake file

* add reduce text to instances

* add DsLayout and DsDataType to instances template parameter

* fixed gemm_reduce_multi_d

* add an example without multi_d

* Update common.hpp

* Update gtest.cmake

* Update gemm_xdl_splitk_reduce_bf16.cpp

* clean

* Update gtest.cmake

* format

* fixe api

* format

* default parameter change to RRR

* add vector_len for multi_d

* format

* Update gtest.cmake

* fix bf16A iBB elementwiseop

* add ReduceDataType

* move ReduceDataType to end position

* format

* remove googletest git method  address

* fix copyright time

* update init data

---------

Co-authored-by: root <jizhan@amd.com>
Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2024-07-19 22:01:22 +08:00
Bartłomiej Kocot
70a814f163 Refactor transform conv to gemm fwd (#1391)
* Refactor transform conv to gemm fwd

* fixes codegen

* wmma fixes

* fix wmma

* Fix copyright
2024-07-19 09:29:25 +02:00
danyao12
1d7099b6c4 fix hd256 dropout scratch 2024-07-19 15:08:45 +08:00
Illia Silin
ab250afda0 add docker for rocm6.2_rc3 (#1401) 2024-07-18 09:41:33 -07:00
danyao12
a67bdd6349 simplify convert dq 2024-07-18 09:55:52 +08:00
Qianfeng
ee768148f0 Replace the using of __expf by __ocml_exp_f32 to work-around the test_softmax_rank4 failure (#1394) 2024-07-17 09:15:05 -07:00
Mateusz Ozga
9cac282793 An option whether to colorize output during build (#1390) 2024-07-16 09:52:44 -07:00
Illia Silin
4c3107fdcb [ASAN builds] Modify the list of default targets for ASAN builds. (#1389)
* add a build parameter to build only XNACK targets

* use ENABLE_ASAN_PACKAGING flag to set targets for ASAN builds

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
2024-07-16 09:19:23 -07:00
Andriy Roshchenko
802a8a1df1 Adding more instances of grouped convolution 3d forward for FP8 with ConvScale element-wise operation and ReLU activation. (#1386)
* Add CMakePresets configurations.

* Add ConvScale+ReLU Functor and an Example

* Account for ReLU FLOPs.

* Add instances of 3D convolutions with ConvscaleRelu operation.

* Implement Client Example

* Cleanup
2024-07-16 08:51:49 -07:00
Haocong WANG
1ff4f25138 Disbale failed instance in rocm6.2 rel (#1388) 2024-07-16 08:46:48 -07:00
Illia Silin
eca39050c6 add Rosty and Bartek to code owners (#1392) 2024-07-16 23:44:46 +08:00