Commit Graph

980 Commits

Author SHA1 Message Date
Bartłomiej Kocot
0c9a1d25b3 Add nhwgc dl generic instances for grouped conv fwd (#879) 2023-09-05 10:07:56 -05:00
Bartłomiej Kocot
c981f6d033 Fix K padding calculation for grouped conv data (#876)
* Fix K padding calculation for grouped conv data

* Restore previous padd for 1x1 specialization
2023-09-05 10:07:41 -05:00
Lauren Wrubleski
bd8024b84a Fix config header installation (#880) 2023-09-04 09:49:40 -07:00
zjing14
f5ec04f091 Grouped Gemm with Fixed K and N with SplitK (#818)
* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-08-31 09:22:12 -05:00
rocking
866377de18 MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861)
* Add maxpool instances

* Rename index pool to max pool.

* Add maxpool bwd bf16 instances

* Add avg pool bwd instances

* Rename avgpool and maxpool to avg_pool3d and max_pool

* Add bf16 pool fwd instances

* Add max pool bwd to ckProfiler

* Add avg pool3d bwd to ckProfiler

* Add avg pool bwd test

* Fix bug of reference pool fwd (dilation)

* Fix bug of max pool bwd  (dilation and initZero)

* Support bf16 compute data type

* Force compute type be f32. Because atomicAdd only support f32

* Add max pool bwd test

* Rename folder

* Rename pool

* Add max pool bwd client example

* Add avg pool bwd client example

* Add missing workspace

* clang format

* Rename macro

* remove useless header

* remove useless layout
2023-08-31 21:01:50 +08:00
Illia Silin
bf1912ed3d fix gemm_streamk example on mi300 (#875) 2023-08-30 20:18:38 -07:00
Bartłomiej Kocot
9e86ebd62d Add number of error when fail (#868) 2023-08-30 10:33:11 -05:00
zjing14
38ada109ea add an example of customized type convert - bfp16_rtn (#869)
* add an example of customized bfp16_rtn

* fixed threadwise_copy

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-08-29 12:31:24 -05:00
zjing14
31ea132aa2 Fp16/fp8 mixed-precision Gemm with multiply+add fusion (#865)
* add compute_type

* add multiply_add ckProfiler

* add f8_fp16 support

* clean

* clean

* fixed lds size calc

* format

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-08-28 16:27:32 -05:00
Jun Liu
c8a8385fdd [HotFix] add config and version files to pass on build info (#856)
* experiment with config file

* experiment with version.h config

* add more info to version.h

* minor updates

* minor updates

* fix case where DTYPE is not used

* large amount of files but minor changes

* remove white space

* minor changes to add more MACROs

* fix cmakedefine01

* fix issue with CK internal conflict

* fix define and define value

* fix clang-format

* fix formatting issue

* experiment with cmake

* clang format v12 to be consistent with miopen

* avoid clang-format for config file
2023-08-23 11:36:17 -07:00
Qianfeng
350d64f351 Add workspace setting up for batchnorm bwd/fwd client examples (#860) 2023-08-24 01:13:07 +08:00
Illia Silin
7c71dc7e70 use correct data types in cmake conditions for splitk gemm example (#862) 2023-08-23 09:52:11 -07:00
zjing14
8ebea3a56e add generic instances (#858)
Co-authored-by: Jing Zhang <jizha@amd.com>
2023-08-23 09:18:10 -05:00
zjing14
ca3115e7e8 Ck profiler splitk (#857)
* updated regular gemm

* update ckProfiler

* fixed gtests

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-08-22 16:54:34 -07:00
Bartłomiej Kocot
595d23be14 Fix transform and instances for grouped conv bwd data (#848)
* Fix transform and instances for grouped conv bwd data

* Add instances for small K and small C

* Remove workaround after fix

* Fix interface tests
2023-08-22 11:25:41 -05:00
Rostyslav Geyyer
eac50708d9 Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm (#853)
* Add ComputeType arg to splitk device and gridwise ops

* Update for gridwise op compatibility

* Update bf16 and int8 splitk gemm examples with ComputeType

* Add instances

* Update ckProfiler for mixed precision cases

* Add a mixed precision splitK gemm client example

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-08-22 09:34:49 -05:00
cloudhan
d52ec01652 Use asynchronous version of hipMemset (#850) 2023-08-18 11:14:59 +08:00
Bartlomiej Wroblewski
32fe996da0 Fix datatype in inner_product when V_DOT2 is disabled (#849) 2023-08-17 10:54:11 -05:00
Bartlomiej Wroblewski
d4c84256f7 Implement DPP8 based GEMM for Navi21 (#826) 2023-08-14 15:46:27 -05:00
rocking
f60f0a5e03 Refactor pool fwd (#815)
* Do not hardcode stride

* devicePool2DFwd Inherit devicePool3DFwd

* Move instance declaration out of common

* Add dilation

* use the pool3d rank, because pool2d inherit pooo3d

* calculate Do Ho Wo for the dilation

* Fix header name

* Modify ckProfiler

* Remove pool2d instance

* Remove pool2d in profiler

* Remove pool2d and add dilation

* In to client example, this commit revise following:
1. Add dilation.
2. Use pool3d to implement pool2d

* Refine naming and IsSupportedArgument()

* Add dilation to maxpool bwd example

* clang format

* 1. Remove useless header
2. Fix copyright
3. Refine naming

* Add layout parameter to pool fwd

* clang format

* Fix merge error

* Fix compile error

* Remove layout parameter in derived class

* Refine changlog

* Fix compile error

* Fix compiler error

* Add layout to external api and profiler
2023-08-15 02:25:28 +08:00
rocking
03b8119e2e Add Normalization splitk instances (#829)
* Add normalization splitK to layernorm and groupnorm instances

* Fix bug of GetKPerThread()

* Refine naming

* clang format
2023-08-12 01:31:31 +08:00
dependabot[bot]
a5343db00d Bump rocm-docs-core from 0.10.3 to 0.20.0 in /docs/sphinx (#844)
* Bump rocm-docs-core from 0.10.3 to 0.20.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.10.3 to 0.20.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.10.3...v0.20.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* set min version of rocm-docs-core

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-08-11 11:27:56 -06:00
Illia Silin
6237bd1247 Add the rocm5.7 RC1 compiler and use it for QA builds. (#842)
* add docker for rocm5.7 RC1

* fix rocm5.7 rc1 build

* build QA with rocm5.7 rc1 compiler
2023-08-10 09:25:21 -07:00
rocking
578142db3a Average pool backward deviceOP and example (#797)
* Add avgpool bwd reference code

* Refine naming

* Fix invalid in_element op in ref_conv

* Add example (only reference now)

* Add the full example of avgpool bwd

* Fix copyright

* Imitate MakeDescriptor from  transform_conv_bwd_data_to_gemm_v1.hpp

* rename channel to c from k

* Arrange the code

* Imitate the argument from conv bwd

* Implement invoker

* Fix order of parameter in example

* Refactor reference code for different dimension

* Support different stride

* Check if argument is valid

* Fix kernel parameter for NDHWC, fastest dimension C is not reduced

* Add more data type in example

* Fix bug in example

* calculate Do Ho Wo according to the dilation

* Remove useless header

* Add comment in reference code

* Add layout parameter

* Remove layout in derived class

* Refine reference comment
2023-08-10 12:04:35 +08:00
Illia Silin
cbbd172fd6 Update the rocm version threshold to apply the -fno-offload-uniform-block flag. (#839)
* add fno-offload-uniform-block flag for rocm5.7 and up

* add a comment and compiler ticket number

* update the threshold rocm version
2023-08-09 13:50:04 -07:00
Illia Silin
1b7da171c9 Update the list of contributors. (#836)
* add linting and update contributors list

* skip the linting and doc changes

* add Astha

* add YanXing
2023-08-09 13:44:13 -07:00
Illia Silin
9af519ee86 add gfx941 to the ckProfiler package (#840) 2023-08-09 10:30:40 -07:00
Bartłomiej Kocot
472fa029ba Enable grouped conv with small K or C (#822)
* Enable grouped conv with small K or C

* Add missing instances

* Refactor grouped conv fwd instances

* Fix fp16 instances since it supports src_per_vec %2 = 0

* Add generic instances
2023-08-09 10:40:55 -05:00
Rostyslav Geyyer
9c54eaab04 Enable f16/f8 mixed precision mode (#820)
* Enable f16/f8 mixed precision

* Add an argument to enable mixed precision

* Update for compatibility

* Add mixed precision example

* Introduce ComputeType argument
2023-08-09 08:44:23 -05:00
Illia Silin
6802611334 add no-offload-uniform-block flag for rocm5.7 and up (#838)
* add -fno-offload-uniform-block flag for rocm5.7 and up

* add a comment and compiler ticket number
2023-08-08 17:58:31 -07:00
Illia Silin
08eb176929 Allow building CK for specific data types and split off last remaining DL instances. (#830)
* properly split conv_nd_bwd_data instances

* split conv2d_fwd instance data types

* split the gemm, conv2d_fwd and batched_gemm_softamx_gemm

* split the tests by data types where possible

* filter examples by DTYPES

* split few remaining examples by DTYPES

* filter most instances by DTYPES

* add new lines at end of headers, fix grouped_gemm profiler

* fix syntax

* split the ckprofiler instances by DTYPES

* split the conv2d and quantization DL and XDL instances

* fix the splitting of conv2d DL instances

* split softmax and pool_fwd tests for fp16 and fp32 types

* fix syntax

* fix the dl_int8 quantization instances isolation
2023-08-07 14:56:10 -07:00
Bartłomiej Kocot
22443f7aae Add wei_strides to grouped conv3d wei to keep consistency (#817)
* Add wei_strides to grouped conv3d wei to keep consistency

* Fix strides in client examples

* Unify backward weight api with forward

* Fix for example

* Fixes for examples

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-08-07 10:23:45 -05:00
Illia Silin
2474dddbee add an option to build ckProfiler package for specific architectures (#828) 2023-08-03 10:10:27 -07:00
Bartlomiej Kocot
aac65a031e Change to github_issue prefix 2023-08-03 16:38:28 +02:00
Bartlomiej Kocot
e6a826d35a Rename the workaround to a proper issue name 2023-08-03 16:38:28 +02:00
Bartlomiej Wroblewski
8c13df07bf Improve formatting of docs; Add a note about the DL_KERNELS flag (#825)
* Improve formatting of docs; Add a note about the DL_KERNELS flag

* Change the recommended version of ROCm to 5.6
2023-08-03 15:50:38 +02:00
Po Yen Chen
f7cc8c3b03 Update tuning parameter & compilation options of DeviceGemmXdl<> instance (layout=TT) (#819)
* Enable pipeline v2 opt for layout=TT instance

* Use better thread mapping for reading A tile

* Conditionally enable pipeline v2 opt

* Allow enabling only fp16 gemm instances in profiler

* Fix formatting error

* Fix compilation error if we enable fp32 in profiler
2023-08-02 10:32:22 -05:00
Bartłomiej Kocot
7761e5232c Add s_nops after v_dot to avoid hazard (#808)
* Add s_nops after v_dot to avoid hazard

* Fix builtin for inner_produxt fp16

* Skip inline version to builtin

* Add comments regarding isa

* Fix comment regarding s_nop
2023-07-27 13:29:44 -05:00
carlushuang
e7dca79d27 initial stream-k implementation with example (#699)
* initial stream-k implementation with example

* fix unexpected change in err

* improve a little bit performance by reorganize pipeline.

* improve perf a little bit by swizzle block idx

* add profiler

* update example

* fix spelling

* shrink karg for streamk

* support dynamic buffer using memory coherence glc_slc bit from template

* control memory coherence while construct dynamic buffer

* update reduction for streamk(not ready yet)

* Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting

* fix build issue

* fix several bug

* now result is correct, everything works (but has scratch)

* remove scratch by manually reset coordinate

* update device code

* fix a bug in final reduce

* fix something in example

* update async memset

* fix enum as camel case

* modify coherence enum name

* clean code and use atomic streamk by default

* remove unused var

* throw exception if have empty pointer

* fix format

* fix CI warning

* fix type in init

* modify CI error

* filter out on gfx10+

* restore changed example code

---------

Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>
2023-07-26 14:18:15 -05:00
Illia Silin
9195435c77 Disable DL kernels by default. (#816) 2023-07-26 11:06:45 -05:00
Bartłomiej Kocot
ac6d68b353 Disable XDL kernels on unsupported HW Add ck::is_xdl_supported (#768)
* Disable XDL kernels on unsupported HW; Add ck::is_xdl_supported function (#765)

* Do not throw an error when GEMM problem is not supported.

---------

Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2023-07-26 07:19:55 -07:00
rocking
016bd428df Refine the dimension of host tesnor. This example only require 1D (#812) 2023-07-25 23:18:56 -05:00
Po Yen Chen
f4ea560112 Speed-up global memory reading for GEMM instances (#813)
* Use better ThreadClusterLengths to speed up

* Update B tile reading pattern for layout=NN instance
2023-07-25 18:54:47 -05:00
ltqin
50643dd555 Add bias scalar vectorload = 1 for gemm bias gemm (#791)
* first change bias load

* add bias dim and scalervector parameter

* make CDE0BlockTransferSrcVectorDim not work

* changse toinstance

* add limit for CDE0BlockTransferSrcScalarPerVector
2023-07-24 20:08:15 -05:00
Illia Silin
844b215d92 add ninja profiling tools to the base docker (#805) 2023-07-21 15:33:17 -07:00
Illia Silin
7a29f711d4 add INSTANCES_ONLY cmake macro to build only instances (#807) 2023-07-21 15:31:19 -07:00
Bartłomiej Kocot
10732847e7 Grouped conv bwd wei NDHWGC/NDHWGK (#804) 2023-07-21 12:00:55 -05:00
Bartłomiej Kocot
49180fd60b Grouped 3d conv backward data support (#799)
* Grouped 3d conv backward data support

* Fix comments
2023-07-18 11:01:33 -05:00
Rostyslav Geyyer
f82bd59389 Remove type_convert bf16 to int32 and back (#802) 2023-07-18 09:44:51 -05:00
Illia Silin
189ea3b9aa Add mechanism to build CK for select data types, add Navi3x CI. (#790)
* allow building CK for specific data types

* add CI build and test stage on Naiv3x without some int8 instances

* add missing gemm fp16 instances

* add the changes to the missed cmake file

* add empty lines at end of source files

* Do not build quantization client example on navi3 in CI

* disable batched_gemm_multi_d_int8 instances with DTYPES

* disable device_conv2d_bwd_data_instance with DTYPES

* fix ckprofiler for conv_bwd_data for int8

* properly isolate the conv_bwd_data int8 instances

* remove empty line
2023-07-17 18:02:42 -07:00