Commit Graph

28 Commits

Author SHA1 Message Date
Aviral Goel
004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>
2025-11-28 13:49:54 -08:00
yinglu
2a73eb3bc0 Simulate TF32 with BF16x3 (#3142)
* tf32:bf16x3:use bf16x3 emulate tf32 gemm

* change blockwiseGemm to demo bf16x3

* temp push

* self review

* self review

* fix multi-device compile error

* bug fix

* code refactor

* limit to gfx950

* enhance gemm gfx942 threshold

* lower change from blockwise to warpwise

* refact codes

* refact codes

* error fix

* change threshold

* bug fix

* fix threshold error

* change host reference implement to same as device

* bug fix

* bug fix

* code refact

* fix clang-format fail

* code refine
2025-11-13 16:21:09 -08:00
yinglu
dd7af118d7 TF32 POC in Conv3d on MI30x platform #2763 (second attempt) (#2852)
* Revert "Revert "feature:tf32:add initial conv3d fwd kernel support (#2763)" (#2848)"

This reverts commit 03b59f8c76.

* fix compile error on gf12x

* only run tf32 example on gfx942

* only build tf32 instance on gfx942

* ckProfiler:only support tf32 in gfx942

* delete unuseful messages
2025-09-17 14:50:15 -07:00
Illia Silin
03b59f8c76 Revert "feature:tf32:add initial conv3d fwd kernel support (#2763)" (#2848)
This reverts commit c51102144f.
2025-09-15 08:27:04 -07:00
lym
c51102144f feature:tf32:add initial conv3d fwd kernel support (#2763) 2025-09-15 21:03:00 +08:00
Illia Silin
1342ecf7fb Add a daily CI build on gfx908. (#1987)
* add one daily ci build on gfx908

* add redis invocation tag for gfx908

* make ci build for gfx908 conditional

* fix groovy logic

* add option to run perf tests for gfx908

* disable a few tests on mi100
2025-03-17 18:08:53 -07:00
Illia Silin
7843a8a7fb re-enable convnd_fwd_xdl_fp64 testing (#1289) 2024-05-10 22:48:28 -07:00
Bartłomiej Kocot
ad1597c499 Refactor elementwise kernels (#1222)
* Refactor elementwise kernels

* Instances fixes

* Fix cmake

* Fix max pool bwd test

* Update two stage gemm split k

* Restore elementwise scale for hiptensor backward compatiblity

* Fix Acc data type check in conv fwd multiple abd

* Disable conv fp64 fwd example

* Update grouped conv weight multi d
2024-04-19 13:31:17 +02:00
Rostyslav Geyyer
bbefc12a26 Add instances for conv_scale with bf8@fp8->fp8 (#1231)
* Add instances

* Add example

* Add profiler mode

* Add client example
2024-04-11 10:35:00 -05:00
Rostyslav Geyyer
50cc0a13a6 Add an example (#1225) 2024-04-09 13:56:54 -05:00
Rostyslav Geyyer
a61e73bc56 Add instances for conv_scale with fp8@bf8->fp8 (#1220)
* Update device op api to support BComputeType

* Add example

* Add instances

* Add profiler mode

* Add client example

* Update copyright year

* Add BComputeType check

* Fix compute types
2024-04-03 09:08:08 -05:00
Illia Silin
ae57e5938e Split the instances by architecture. (#1223)
* parse examples inside the add_example_executable function

* fix the example 64 cmake file

* add xdl flag to the gemm_bias_softmax_gemm_permute example

* add filtering of tests based on architecture type

* enable test_grouped_gemm for gfx9 only

* enable test_transpose only for gfx9

* only linnk test_transpose if it gets built

* split the gemm instances by architectures

* split gemm_bilinear,grouped_conv_bwd_weight instances by targets

* split instances by architecture

* split grouped_conv instances by architecture

* fix clang format

* fix the if-else logic in group_conv headers

* small fix for grouped convolution instances

* fix the grouped conv bwd weight dl instances

* fix client examples

* only enable client examples 3 and 4 on gfx9

* set the gfx9 macro

* make sure the architecture macros are set by cmake

* use separate set of xdl/wmma flags for host code

* sinmplify the main cmake file

* add conv_fwd_bf8 instance declaration
2024-04-02 09:42:17 -07:00
Rostyslav Geyyer
fd0d093e78 Add instances for conv_scale with bf8 in / fp8 out (#1200)
* Add bf8 conv fwd instances

* Add example

* Add profiler mode

* Add client example

* Fix copyright headers

* Format
2024-03-21 13:57:34 -05:00
Rostyslav Geyyer
e626d5202a Add instances for conv_scale with fp8 in/out (#1193)
* Add fp8 conv instances and client example

* Format

* Add example

* Update cmakelists

* Add profiler mode

* Format

* Fix copyright headers
2024-03-15 09:50:03 -07:00
Illia Silin
bba085d2b5 Refactoring cmake files to build data types separately. (#932)
* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances
2023-09-20 22:15:56 -07:00
Illia Silin
08eb176929 Allow building CK for specific data types and split off last remaining DL instances. (#830)
* properly split conv_nd_bwd_data instances

* split conv2d_fwd instance data types

* split the gemm, conv2d_fwd and batched_gemm_softamx_gemm

* split the tests by data types where possible

* filter examples by DTYPES

* split few remaining examples by DTYPES

* filter most instances by DTYPES

* add new lines at end of headers, fix grouped_gemm profiler

* fix syntax

* split the ckprofiler instances by DTYPES

* split the conv2d and quantization DL and XDL instances

* fix the splitting of conv2d DL instances

* split softmax and pool_fwd tests for fp16 and fp32 types

* fix syntax

* fix the dl_int8 quantization instances isolation
2023-08-07 14:56:10 -07:00
Illia Silin
027e46ee82 Enable gfx941 and gfx942 architectures. (#752)
* enable gfx941/942 targets

* fix clang format

* fix the cmake logic for multiple targets

* fix cmake syntax for looping over targets

* add gfx941/942 support for gemm_xdl instances
2023-06-15 08:20:59 -07:00
Illia Silin
d821d1e54f Enable gemm_dl and other kernels on Navi3x. (#714)
* enable dl kernels on navi3

* do not build xdl tests and examples on Navi

* run tests before building everything on jenkins

* disable gemm_bilinear on gfx1030

* add gpu targets to installer on Navi

* put tests in the same order as before

* reduce the number of navi targets in CI

* build CI installed for gfx940 as well

* only build for MI300 during QA runs
2023-05-23 11:23:16 -05:00
ltqin
23ecf0fa9e Add multiple d gridwise gemm on Navi21 for ResNet50 (#517)
* start add example

* add multiple d fp16 example

* device transfer elementwiseop to gridwise

* gridwise add multiple d

* change example for multiple d

* fix spill registers

* fix for passthrough element op

* fix int8 overflow

* change example file name

* add instance for dl multiple d

* example add DsDataType

* remove grouped_convolution_forward_dl.hpp

* add head file(was deleted before)

* fix not support device issue

* format

* remove passthrough check

Co-authored-by: letaoqin <letaoqin@amd.com>
2022-12-02 11:42:31 -06:00
ltqin
8ee36118be Add Conv Forward on Navi21 for ResNet50 (#490)
* add device of dl

* fix k1 of GridwiseGemmDl_km_kn_mn_v1r3

* init version for dl conv

* add example(init)

* result right

* disable elementwise operation

* check parameters

* add fp32,int8 example and change check code

* change deive file and class name

* add check vector access of C

* add instance

* add to ckProfiler

* add Filter1x1Pad0 instances

* fix ignore error

* fix for CI

Co-authored-by: letaoqin <letaoqin@amd.com>
2022-10-31 10:24:25 -06:00
Chao Liu
500fa99512 Clean up conv example, Instances, profiler and test (#324)
* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test
2022-07-29 18:19:25 -05:00
Chao Liu
85fc91c321 Minor fix for recent PR (#260)
* fix example

* update IsSupportedArgument

* fix

* disable fp64 conv example as test
2022-05-30 19:57:49 -05:00
ltqin
3e6c2610ae Add FP64 XDL GEMM built-in function (#199)
* add intrin_mfma_f64_16x16x4f64

* add example

* gemm reference add double data type

* chang init data

* fix M N PerXdlops

* fix ifdef

* add comparsion config

* add conv fwd example

* format log out

* change rc matrix egister layout

* reorganize example

* reorganize example 2

* format,because merge develop

* fix call impl adding acc data type

* lost ;

* add compiler warning

* change example tunning parameters

* add test for fp64

* add instance

* add test/gemm/gemm_fp64.cpp

* fix get name issue

* remove some tunning parameter

* fix conflict

* format

* use integer value for GEMM test

* add acc data type

* remove typeid because fp16

* fix streamconfig etc bug from merging develop

* format

* remove test_gemm_xdl_fp64

* add AccDataType

* AccDataType problem

Co-authored-by: qinletao <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-05-26 14:48:57 -05:00
Anthony Chang
9f71ff48e2 Validate examples in CI (#233)
* validate examples in ctest runs

* format

* fix usage of check_err

* amend

* add example codes to custom target 'check'

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-05-13 16:54:44 -05:00
Adam Osewski
712e464c4e Post PR183 review fixes. (#224)
* Suppress additional warnings for googltest.

* Rename file conv_fwd_util to conv_util.

* Update includes and ConvParams member access.

* Formatting.

* Change conv_fwd_util target to conv_util

* Fix compiler errors.

* Fix leftovers.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-05-10 15:41:29 -05:00
Adam Osewski
1a0cd5d160 Convolution FWD profiler refactor. (#183)
* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

* 3D cases for convolution utility functions.

* 3D reference convolution.

* Add support for 3D convolution.

* Check for inputs bigger than  2GB.

* Formatting

* Support for bf16/f16/f32/i8 - conv instances + UT.

* Use check_err from test_util.hpp.

* Split convnd test into separate files for each dim.

* Fix data generation and use proper instances.

* Formatting

* Skip tensor initialization if not necessary.

* Fix CMakefiles.

* Remove redundant conv2d_fwd test.

* Lower problem size for conv3D UT.

* 3D case for convnd example.

* Remove leftovers after merge.

* Add Conv Specialization string to GetTypeString

* Skip instance causing numerical errors.

* Small fixes.

* Remove redundant includes.

* Fix namespace name error.

* Script for automatic testing and logging convolution fwd UTs

* Comment out numactl cmd.

* Refine weights initalization and relax rtol for fp16

* Move test_util.hpp to check_err.hpp

* Refine weights initalization and relax rtol for fp16

* Refactor common part of test conv utils.

* Move utility function to single common place.

* Add additional common functions to utility.

* Refactor convnd_fwd_xdl examples.

* Remove redundant files.
* Unify structure.

* Add constructor to ConvParams.

* And add input parameters validation.

* Modify conv examples to use single utility file.

* Remove check_error from host_tensor.hpp

* Get rid of check_indices function.

* Remove bf16_to_f32 function overload for scalars.

* Fix namespace.

* Add half_float::half for check_err.

* Fix conv params size in UT.

* Fix weights initialization for int8.

* Fix weights initialization for int8.

* Add type_convert when store output in ref conv 1D.

* Get back old conv2d_fwd_xdl operation.

* Silence conv debug print.

* format

* clean

* clean

* Fix merge.

* Fix namespace for check_err

* Formatting.

* Fix merge artifacts.

* Remove deleted header.

* Fix some includes and use ck::utils::check_err.

* Remove unused check_indices restored by previous merge.

* Fix namespaces after merge.

* Fix compilation error.

* Small fixes.

* Use common functions.
* Fix filename
* Fix namespaces.

* Fix merge artifact - retrieve removed by accident fun.

* Fix ConvForwardSpecialization.

* Working example of OpInstanceRunEngine for conv2dfwd UT.

* Adhere to coding style rules.

* Formatting and adhere to coding style rules.

* Fix merge artifacts.

* Utility for collecting conv fwd instances.

+ Plus commmon part for parsing cmdline params.

* Refactor FillUniform because of segfault for int8_t.

* Naming convention.

* Elegant version of device mem allocation.

* Use OpInstanceRunEngine in conv fwd nd tests.

* Multiple refinements.

* conditional init
* don't run reference op if not provided.

* Use OpInstanceRunEngine for ckProfiler conv_fwd

* Refactor common tensor fill function to separate file.

* Clean up unused functions.

* Support different init methods.

* Create CMake target for conv_fwd_util.

* Add header for profile_convnd_fwd.cpp

* Fix CMakefiles to link with conv_fwd_util where needed.

* Fix some clutter.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-04-21 17:39:39 -05:00
Adam Osewski
abf4bdb9a9 Common forward convolution utility refactor. (#141)
* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

* 3D cases for convolution utility functions.

* 3D reference convolution.

* Add support for 3D convolution.

* Check for inputs bigger than  2GB.

* Formatting

* Support for bf16/f16/f32/i8 - conv instances + UT.

* Use check_err from test_util.hpp.

* Split convnd test into separate files for each dim.

* Fix data generation and use proper instances.

* Formatting

* Skip tensor initialization if not necessary.

* Fix CMakefiles.

* Remove redundant conv2d_fwd test.

* Lower problem size for conv3D UT.

* 3D case for convnd example.

* Remove leftovers after merge.

* Add Conv Specialization string to GetTypeString

* Skip instance causing numerical errors.

* Small fixes.

* Remove redundant includes.

* Fix namespace name error.

* Script for automatic testing and logging convolution fwd UTs

* Comment out numactl cmd.

* Refine weights initalization and relax rtol for fp16

* Move test_util.hpp to check_err.hpp

* Refine weights initalization and relax rtol for fp16

* Refactor common part of test conv utils.

* Move utility function to single common place.

* Add additional common functions to utility.

* Refactor convnd_fwd_xdl examples.

* Remove redundant files.
* Unify structure.

* Add constructor to ConvParams.

* And add input parameters validation.

* Modify conv examples to use single utility file.

* Remove check_error from host_tensor.hpp

* Get rid of check_indices function.

* Remove bf16_to_f32 function overload for scalars.

* Fix namespace.

* Add half_float::half for check_err.

* Fix conv params size in UT.

* Fix weights initialization for int8.

* Fix weights initialization for int8.

* Add type_convert when store output in ref conv 1D.

* Get back old conv2d_fwd_xdl operation.

* Silence conv debug print.

* format

* clean

* clean

* Fix merge.

* Fix namespace for check_err

* Formatting.

* Fix merge artifacts.

* Remove deleted header.

* Fix some includes and use ck::utils::check_err.

* Remove unused check_indices restored by previous merge.

* Fix namespaces after merge.

* Fix compilation error.

* Small fixes.

* Use common functions.
* Fix filename
* Fix namespaces.

* Fix merge artifact - retrieve removed by accident fun.

* Fix ConvForwardSpecialization.

* Adhere to coding style rules.

* Fix merge artifacts.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-04-05 15:16:59 -05:00
Chao Liu
5d37d7bff4 Reorganize files, Part 1 (#119)
* delete obselete files

* move files

* build

* update cmake

* update cmake

* fix build

* reorg examples

* update cmake for example and test
2022-03-08 21:46:36 -06:00