Commit Graph

304 Commits

Author SHA1 Message Date
Rostyslav Geyyer
ecc69920c3 Add native conversions fp8<->fp32 (#908)
* Add native conversions

* Add bf8 conversions

[ROCm/composable_kernel commit: f17af2e9ed]
2023-09-17 20:56:27 -05:00
Bartlomiej Kocot
775ae67af3 Stylistic improvements for grouped convolution code
Remove unnecessary ignoring

Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp


[ROCm/composable_kernel commit: bc2d0583d3]
2023-09-15 20:03:47 +02:00
zjing14
a49f4ff995 Add fp16/fp8 support into Grouped gemm FixedNK (#874)
* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* instance and client

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

* clean

* formatting

* clean

* clean

* fixed computeType

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: f9d0eddb90]
2023-09-14 21:04:10 -05:00
Bartłomiej Kocot
a13a1cb6d3 Add grouped conv bwd weight dl instances and new layout (#897)
* Add grouped conv bwd weight dl instances and new layout

* Add M and N padding

* Remove todo comment

* Enable grouped conv fwd dl k,c=1 generic instance

* Comment fixes

[ROCm/composable_kernel commit: 475188ca2e]
2023-09-13 10:14:31 -05:00
zjing14
8af8296c29 fixed fp8 issues (#894)
* fixed fp8 init; and reference gemm

* Update host_tensor_generator.hpp

* fixed convert

* fixed reference gemm

* fixed comments

* fixed comments

* fixed ci

* fixed computeType

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: a66d14edf2]
2023-09-12 22:17:56 -05:00
Rostyslav Geyyer
2e227b8581 Refactor f8_t, add bf8_t (#792)
* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments

[ROCm/composable_kernel commit: 62d4af7449]
2023-09-12 17:04:27 -05:00
Bartlomiej Wroblewski
df01b7c45a Add new instances and support for small cases in DPP8 GEMM (#896)
[ROCm/composable_kernel commit: 547dbcfbc2]
2023-09-12 10:05:23 -05:00
Bartlomiej Wroblewski
20d295b14b Enable DPP8 GEMM on Navi3 (#892)
[ROCm/composable_kernel commit: 8f84a01237]
2023-09-08 11:14:57 -05:00
Haocong WANG
68d7430ec5 [Navi3x] Add fp16/int8 wmma conv forward instances (#746)
* fix wmma gemm int8; add grouped conv int8 example

* Add int8 gemm-bilinear instances

* compile sanity check unknown

* Sanity pass + clang-format

* add int8 conv profiler instances

* solve merge conflict

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: 562b4cec48]
2023-09-07 21:59:26 -05:00
Bartlomiej Wroblewski
88bb9d5fac Redesign the DPP8 GEMM kernel to use warp-wise component (#863)
* Redesign the DPP8 GEMM kernel to use warp-wise component

* Review: Improve error messages

* Review: Remove unnecessary empty lines

* Review: Fix M, N per thread names

* Review: Rename mfma_input_type to dpp_input_type

* Review: Fix tensor adaptor; remove unnecessary element

* Review: Remove calls to dpp_gemm's MakeCDescriptor

* Review: Add blockwise doc, change function names to include dimension names

* Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file

* Review: Add __restrict__ keywords

* Review: Use MatrixPadder for padding A, B, C matrices

* Review: Remove hardcoded datatypes

* Review: Change names from FloatX to XDataType

* Review: Introduce AK0 and BK0 instead of a single K0

* Review: Remove construction of dpp_datatypes object

* Review: Rename DppInstrRunner to DppLanegroupGemm

[ROCm/composable_kernel commit: 37a8c1f756]
2023-09-06 11:44:09 -05:00
zjing14
3446ff1e7d added padding of K into gemm_v2r3 (#887)
* added kpad support into v2r3

* add generic instances

* fixed comments

* fixed mnk padding

* Update device_batched_gemm_xdl.hpp

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: 3786bfe1cc]
2023-09-06 10:15:52 -05:00
Illia Silin
3b3f0f225e fix syntax (#890)
[ROCm/composable_kernel commit: 7dcb14d9d4]
2023-09-05 11:29:44 -07:00
Bartłomiej Kocot
2ec7a9084a Add image to column kernel (#867)
* Add image to column kernel

* Add instances, tests, profiler, example

* Add client example

* Several fixes of image to column

* Fix variable name in device_image_to_column_impl

* Several fixes of image to column profiler

* Fix num_btype calculation

* Make new mesaurements for correct bytes calculation

[ROCm/composable_kernel commit: 0077eeb3be]
2023-09-05 10:11:40 -05:00
Bartłomiej Kocot
fcafba0fd4 Fix K padding calculation for grouped conv data (#876)
* Fix K padding calculation for grouped conv data

* Restore previous padd for 1x1 specialization

[ROCm/composable_kernel commit: c981f6d033]
2023-09-05 10:07:41 -05:00
zjing14
bc88b1a50b Grouped Gemm with Fixed K and N with SplitK (#818)
* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: f5ec04f091]
2023-08-31 09:22:12 -05:00
rocking
e422d088a3 MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861)
* Add maxpool instances

* Rename index pool to max pool.

* Add maxpool bwd bf16 instances

* Add avg pool bwd instances

* Rename avgpool and maxpool to avg_pool3d and max_pool

* Add bf16 pool fwd instances

* Add max pool bwd to ckProfiler

* Add avg pool3d bwd to ckProfiler

* Add avg pool bwd test

* Fix bug of reference pool fwd (dilation)

* Fix bug of max pool bwd  (dilation and initZero)

* Support bf16 compute data type

* Force compute type be f32. Because atomicAdd only support f32

* Add max pool bwd test

* Rename folder

* Rename pool

* Add max pool bwd client example

* Add avg pool bwd client example

* Add missing workspace

* clang format

* Rename macro

* remove useless header

* remove useless layout

[ROCm/composable_kernel commit: 866377de18]
2023-08-31 21:01:50 +08:00
Illia Silin
620a9319de fix gemm_streamk example on mi300 (#875)
[ROCm/composable_kernel commit: bf1912ed3d]
2023-08-30 20:18:38 -07:00
zjing14
d6005c1e0d add an example of customized type convert - bfp16_rtn (#869)
* add an example of customized bfp16_rtn

* fixed threadwise_copy

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: 38ada109ea]
2023-08-29 12:31:24 -05:00
zjing14
17d03c86d2 Fp16/fp8 mixed-precision Gemm with multiply+add fusion (#865)
* add compute_type

* add multiply_add ckProfiler

* add f8_fp16 support

* clean

* clean

* fixed lds size calc

* format

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: 31ea132aa2]
2023-08-28 16:27:32 -05:00
Jun Liu
a2392f1887 [HotFix] add config and version files to pass on build info (#856)
* experiment with config file

* experiment with version.h config

* add more info to version.h

* minor updates

* minor updates

* fix case where DTYPE is not used

* large amount of files but minor changes

* remove white space

* minor changes to add more MACROs

* fix cmakedefine01

* fix issue with CK internal conflict

* fix define and define value

* fix clang-format

* fix formatting issue

* experiment with cmake

* clang format v12 to be consistent with miopen

* avoid clang-format for config file

[ROCm/composable_kernel commit: c8a8385fdd]
2023-08-23 11:36:17 -07:00
zjing14
3d88622dda Ck profiler splitk (#857)
* updated regular gemm

* update ckProfiler

* fixed gtests

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: ca3115e7e8]
2023-08-22 16:54:34 -07:00
Bartłomiej Kocot
80659b5bc1 Fix transform and instances for grouped conv bwd data (#848)
* Fix transform and instances for grouped conv bwd data

* Add instances for small K and small C

* Remove workaround after fix

* Fix interface tests

[ROCm/composable_kernel commit: 595d23be14]
2023-08-22 11:25:41 -05:00
Rostyslav Geyyer
6f9eeb3190 Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm (#853)
* Add ComputeType arg to splitk device and gridwise ops

* Update for gridwise op compatibility

* Update bf16 and int8 splitk gemm examples with ComputeType

* Add instances

* Update ckProfiler for mixed precision cases

* Add a mixed precision splitK gemm client example

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: eac50708d9]
2023-08-22 09:34:49 -05:00
cloudhan
9bdc6209c4 Use asynchronous version of hipMemset (#850)
[ROCm/composable_kernel commit: d52ec01652]
2023-08-18 11:14:59 +08:00
Bartlomiej Wroblewski
460756b64d Fix datatype in inner_product when V_DOT2 is disabled (#849)
[ROCm/composable_kernel commit: 32fe996da0]
2023-08-17 10:54:11 -05:00
Bartlomiej Wroblewski
f81eb31934 Implement DPP8 based GEMM for Navi21 (#826)
[ROCm/composable_kernel commit: d4c84256f7]
2023-08-14 15:46:27 -05:00
rocking
ae36ead7f5 Refactor pool fwd (#815)
* Do not hardcode stride

* devicePool2DFwd Inherit devicePool3DFwd

* Move instance declaration out of common

* Add dilation

* use the pool3d rank, because pool2d inherit pooo3d

* calculate Do Ho Wo for the dilation

* Fix header name

* Modify ckProfiler

* Remove pool2d instance

* Remove pool2d in profiler

* Remove pool2d and add dilation

* In to client example, this commit revise following:
1. Add dilation.
2. Use pool3d to implement pool2d

* Refine naming and IsSupportedArgument()

* Add dilation to maxpool bwd example

* clang format

* 1. Remove useless header
2. Fix copyright
3. Refine naming

* Add layout parameter to pool fwd

* clang format

* Fix merge error

* Fix compile error

* Remove layout parameter in derived class

* Refine changlog

* Fix compile error

* Fix compiler error

* Add layout to external api and profiler

[ROCm/composable_kernel commit: f60f0a5e03]
2023-08-15 02:25:28 +08:00
rocking
9c24b3c23e Add Normalization splitk instances (#829)
* Add normalization splitK to layernorm and groupnorm instances

* Fix bug of GetKPerThread()

* Refine naming

* clang format

[ROCm/composable_kernel commit: 03b8119e2e]
2023-08-12 01:31:31 +08:00
rocking
b42f79dffb Average pool backward deviceOP and example (#797)
* Add avgpool bwd reference code

* Refine naming

* Fix invalid in_element op in ref_conv

* Add example (only reference now)

* Add the full example of avgpool bwd

* Fix copyright

* Imitate MakeDescriptor from  transform_conv_bwd_data_to_gemm_v1.hpp

* rename channel to c from k

* Arrange the code

* Imitate the argument from conv bwd

* Implement invoker

* Fix order of parameter in example

* Refactor reference code for different dimension

* Support different stride

* Check if argument is valid

* Fix kernel parameter for NDHWC, fastest dimension C is not reduced

* Add more data type in example

* Fix bug in example

* calculate Do Ho Wo according to the dilation

* Remove useless header

* Add comment in reference code

* Add layout parameter

* Remove layout in derived class

* Refine reference comment

[ROCm/composable_kernel commit: 578142db3a]
2023-08-10 12:04:35 +08:00
Bartłomiej Kocot
ac574360c7 Enable grouped conv with small K or C (#822)
* Enable grouped conv with small K or C

* Add missing instances

* Refactor grouped conv fwd instances

* Fix fp16 instances since it supports src_per_vec %2 = 0

* Add generic instances

[ROCm/composable_kernel commit: 472fa029ba]
2023-08-09 10:40:55 -05:00
Rostyslav Geyyer
5aa5116e9b Enable f16/f8 mixed precision mode (#820)
* Enable f16/f8 mixed precision

* Add an argument to enable mixed precision

* Update for compatibility

* Add mixed precision example

* Introduce ComputeType argument

[ROCm/composable_kernel commit: 9c54eaab04]
2023-08-09 08:44:23 -05:00
Bartłomiej Kocot
50dbce5272 Add wei_strides to grouped conv3d wei to keep consistency (#817)
* Add wei_strides to grouped conv3d wei to keep consistency

* Fix strides in client examples

* Unify backward weight api with forward

* Fix for example

* Fixes for examples

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 22443f7aae]
2023-08-07 10:23:45 -05:00
Bartlomiej Kocot
3b27ef7a8c Change to github_issue prefix
[ROCm/composable_kernel commit: aac65a031e]
2023-08-03 16:38:28 +02:00
Bartlomiej Kocot
d91003ad62 Rename the workaround to a proper issue name
[ROCm/composable_kernel commit: e6a826d35a]
2023-08-03 16:38:28 +02:00
Bartłomiej Kocot
9996b8c375 Add s_nops after v_dot to avoid hazard (#808)
* Add s_nops after v_dot to avoid hazard

* Fix builtin for inner_produxt fp16

* Skip inline version to builtin

* Add comments regarding isa

* Fix comment regarding s_nop

[ROCm/composable_kernel commit: 7761e5232c]
2023-07-27 13:29:44 -05:00
carlushuang
836a29fcd8 initial stream-k implementation with example (#699)
* initial stream-k implementation with example

* fix unexpected change in err

* improve a little bit performance by reorganize pipeline.

* improve perf a little bit by swizzle block idx

* add profiler

* update example

* fix spelling

* shrink karg for streamk

* support dynamic buffer using memory coherence glc_slc bit from template

* control memory coherence while construct dynamic buffer

* update reduction for streamk(not ready yet)

* Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting

* fix build issue

* fix several bug

* now result is correct, everything works (but has scratch)

* remove scratch by manually reset coordinate

* update device code

* fix a bug in final reduce

* fix something in example

* update async memset

* fix enum as camel case

* modify coherence enum name

* clean code and use atomic streamk by default

* remove unused var

* throw exception if have empty pointer

* fix format

* fix CI warning

* fix type in init

* modify CI error

* filter out on gfx10+

* restore changed example code

---------

Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

[ROCm/composable_kernel commit: e7dca79d27]
2023-07-26 14:18:15 -05:00
Bartłomiej Kocot
ffc13df816 Disable XDL kernels on unsupported HW Add ck::is_xdl_supported (#768)
* Disable XDL kernels on unsupported HW; Add ck::is_xdl_supported function (#765)

* Do not throw an error when GEMM problem is not supported.

---------

Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: ac6d68b353]
2023-07-26 07:19:55 -07:00
ltqin
714fc59909 Add bias scalar vectorload = 1 for gemm bias gemm (#791)
* first change bias load

* add bias dim and scalervector parameter

* make CDE0BlockTransferSrcVectorDim not work

* changse toinstance

* add limit for CDE0BlockTransferSrcScalarPerVector

[ROCm/composable_kernel commit: 50643dd555]
2023-07-24 20:08:15 -05:00
Bartłomiej Kocot
9eb711b9d2 Grouped conv bwd wei NDHWGC/NDHWGK (#804)
[ROCm/composable_kernel commit: 10732847e7]
2023-07-21 12:00:55 -05:00
Bartłomiej Kocot
686ad6b543 Grouped 3d conv backward data support (#799)
* Grouped 3d conv backward data support

* Fix comments

[ROCm/composable_kernel commit: 49180fd60b]
2023-07-18 11:01:33 -05:00
Rostyslav Geyyer
78ebad1588 Remove type_convert bf16 to int32 and back (#802)
[ROCm/composable_kernel commit: f82bd59389]
2023-07-18 09:44:51 -05:00
Bartłomiej Kocot
a1a8901df8 Support NHWGC conv2d_bwd_weight (#769)
* Support NHWGC conv2d_bwd_weight

* Fix client example

* Fix client example

* Fix comments

* Redesign grouped_conv_bwd_weight instances

* Clang format fix

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 1ee99dcaa6]
2023-07-12 08:25:02 -05:00
Po Yen Chen
7308ac2436 Split GEMM instance library & enable pipeline v2 optimization (#783)
* Move source file into sub-directories

* Add missing include directive

* Split DeviceGemmXdl<> fp16 instances

* Fix format

* Remove unnecessary CMakeLists.txt

* Add macros to toggle new features

* Remove debug message

* Turn off GEMM v2 pipeline optimization by default

* Fix format

* Extract duplicated string as list

* Enlarge indent in CMakeLists.txt

[ROCm/composable_kernel commit: 850144a0d3]
2023-07-06 10:59:35 -05:00
Qianfeng
ce225372d7 Batchnorm splitk single kernel (#771)
* Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance]

* Add CountDataType as template parameter in blockwise_welford

* Add utility/get_shift.hpp

* Add BatchNorm multiblock single-kernel implementation

* Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a

* Renaming in device_batchnorm_forward_impl.hpp

* Tiny fix in the batchnorm_fwd profiler

* Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a"

This reverts commit d16d00919c.

* Use the old two-kernel batchnorm multiblock method for gfx1030

* Use the old two-kernel batchnorm multiblock method for gfx908

* use the single-kernel batchnorm multiblock method only for gfx90a

* Remove get_wave_id() from utility/get_id.hpp since it is not used

* Set true for testing running mean/variance and saving mean/invvariance in the examples

* Fix to copy-right words

* Remove un-needed including in utility/get_id.hpp

* Add comments to workgroup_synchronization.hpp

* Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp

* Renaming in the kernels

* Remove un-used kernel file

[ROCm/composable_kernel commit: 8f5cafaf04]
2023-07-06 10:58:55 -05:00
Adam Osewski
5c2c77a439 Move Device Ops implementations into impl directory. (#777)
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: f4dfc060b7]
2023-07-06 16:15:51 +02:00
Rostyslav Geyyer
e53f1dbb46 Add the missing archs (#785)
[ROCm/composable_kernel commit: 61dc9aa932]
2023-07-05 18:29:56 -05:00
Rostyslav Geyyer
76be9bc130 Add fp8 GEMM and an example for it (#767)
* Add fp8 xdl gemm

* Add example

* Use int8 intrinsics for buffer load/store

* Format

* Update cmakelists

[ROCm/composable_kernel commit: 1cf5003179]
2023-07-04 20:38:49 -06:00
Bartłomiej Kocot
1dde9f03de Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757)
* Support bf16/f32/f16 and NHWGC conv2d_bwd_data

* Add interface test

* clang format

* Comment fixes

* Add more friendly error message

[ROCm/composable_kernel commit: 63388e84ab]
2023-06-21 08:20:31 -05:00
ltqin
d6b76736e7 remove useless comments (#760)
[ROCm/composable_kernel commit: 32d2f52bf7]
2023-06-19 19:25:08 -07:00
zjing14
3a6b484ed6 changed pipeline v1 (#763)
[ROCm/composable_kernel commit: 05ea6452b6]
2023-06-19 19:24:18 -07:00