Commit Graph

12 Commits

Author SHA1 Message Date
Bartłomiej Kocot
836b7e557d Introduce wrapper library (#1071)
* Introduce wrapper library

* Update cmake files

* Revert "Update cmake files"

This reverts commit c27f88b565.

* Fix comments
2023-12-06 11:58:59 +01:00
Bartlomiej Wroblewski
0abc0f87db Change 1d,2d,... to 1D,2D,... (#997) 2023-10-19 16:53:18 +02:00
Bartłomiej Kocot
16d7c4d2f7 Add grouped conv bwd weight wmma (#985)
* Add grouped conv bwd weight wmma

* Update README, changelog, profiler

* Minor fixes

* Fix grouped conv bwd wei dl kernel

* Minor fixes

* Minor stylistic fixes
2023-10-17 10:32:26 +02:00
Bartłomiej Kocot
271ef645ac Add grouped convolution changes to changelog (#952)
* Add grouped convolution changes to changelog

* Fix 0.2.0 ck release rocm version

* Suggested CHANGELOG.md edits

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Lisa <lisajdelaney@gmail.com>
2023-09-28 18:18:32 +02:00
Bartłomiej Kocot
e2243a4d1e Add column to image kernel (#930)
* Add column to image kernel

* Minor fixes for dtypes and client examples

* Disable tests for disabled dtypes

* Disable add instances functions for disabled data types

* Minor stylistic fixes

* Revert "Disable add instances functions for disabled data types"

This reverts commit 728b869563.

* Instances reduction

* Add comments in device_column_to_image_impl

* Update changelog and Copyrights

* Improve changelog
2023-09-27 17:19:06 +02:00
rocking
f60f0a5e03 Refactor pool fwd (#815)
* Do not hardcode stride

* devicePool2DFwd Inherit devicePool3DFwd

* Move instance declaration out of common

* Add dilation

* use the pool3d rank, because pool2d inherit pooo3d

* calculate Do Ho Wo for the dilation

* Fix header name

* Modify ckProfiler

* Remove pool2d instance

* Remove pool2d in profiler

* Remove pool2d and add dilation

* In to client example, this commit revise following:
1. Add dilation.
2. Use pool3d to implement pool2d

* Refine naming and IsSupportedArgument()

* Add dilation to maxpool bwd example

* clang format

* 1. Remove useless header
2. Fix copyright
3. Refine naming

* Add layout parameter to pool fwd

* clang format

* Fix merge error

* Fix compile error

* Remove layout parameter in derived class

* Refine changlog

* Fix compile error

* Fix compiler error

* Add layout to external api and profiler
2023-08-15 02:25:28 +08:00
Illia Silin
9195435c77 Disable DL kernels by default. (#816) 2023-07-26 11:06:45 -05:00
rocking
341ad95665 Maxpool bwd (#750)
* Add maxpool f32 kernel and example

* Revise copyright

* Add device pool bwd device op

* Support f16 and bf16

* Add compute datatype for reference code.
Prevent error in bf16

* Fix type error

* Remove layout

* Fix bf16 error

* Add f16 and bf16 example

* Add more operations

* Implement IsSupportedArgument

* Add changelog

* Add comment

* Add comment

* Remove useless header

* Move initialize of workspace to the run

* Move set din zero to the device operator

* Save din_length_raw

* Remove useless header

* Calculate gridsize according to the number of CU

* Calculate gridSize according to the number of CU.
Remove useless header

* Add put example

* Remove useless header

* Fix CI fail
2023-06-19 09:44:22 -05:00
Illia Silin
0ccecc7c31 [gfx110x] support Navi3x architectures. (#628)
* enable building on Nav31

* fix syntax

* replace GPU_TARGETS with offload-arch

* add gfx1102 rachitecture

* fix typo

* update changelog
2023-03-09 07:56:40 -06:00
Rostyslav Geyyer
246ceee49e Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505)
* Add DeviceOp and examples

* Format DeviceOp template arguments

* Remove bf16 example

* Format

* Format

* Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N

* Refactor argument preparation

* Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl

* Rename device op file

* Update include directive in the example file

* Update descriptor preparation for grouped op

* Update the argument

* Update batch handling

* Add gridwise gemm supporting batched input

* Update blockwise indexing, working version

* Update copyright year

* Update check if argument is supported

* Refactor and make consistent with xdl examples

* Update check if argument is supported

* Add changelog entry

* Added comments on Dl op split_k>1 support

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-02-22 11:59:53 -06:00
rocking5566
6a6163a3d1 Improve normalization (#580)
* Sync the order of type string with template parameter

* Add more instances

* Check the vector size and remove redundant var

* Extract var to static, prepare to separate sweep once kernel

* Separate sweeponce flow and optimize the flow

* 1. Rename AccDatatype in normalization to computeData
2. Rename AccElementwiseOperation to YElementwiseOperation in normalization

* Remove useless code

* Update naive variance kernel

* Refine string

* Fix typo

* Support naive variance for device_normalization

* Check the blocksize

* Share the VGPR of x and y

* Share the VGPR of gamma and beta

* Add more instances

* Support fp16 sqrt for experiment

* Add CHANGELOG

* Fix typo

* clang-format
2023-02-15 11:59:35 -06:00
Illia Silin
b63accee2b adding the first draft of changelog (#571)
* adding the first draft of changelog

* second draft of changelog
2023-02-08 17:25:53 -06:00