Commit Graph

1022 Commits

Author SHA1 Message Date
zjing14
9d58c42103 Contraction multi abd (#957)
* add gridwise_multi_abd

* move element_op into RunRead

* merge element_wise op with data read

* add multiABD example

* allow packed elementwise_op

* changed example

* clean

* clean

* add is_detected

* fix

* minor fix

* add scaleAdd_vec4 example

* init commit for contraction_multi_ABD

* add examples

* add examples of multiA and broadcast

* update example

* fixed comments

* Update cmake-ck-dev.sh

* Update cmake-ck-dev.sh

* Add comments into the example

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-10-02 09:18:36 -05:00
Illia Silin
6b5f647371 add gfx942 target to the daily ckprofiler package (#955) 2023-09-29 08:55:25 -07:00
Bartlomiej Wroblewski
f07485060e Add support for mixed precision in contraction scale and bilinear (#936)
* Extract common functionality to separate files

* Reference contraction: Remove incorrect consts from type_converts

* Reference contraction: Add missing type_convert for dst value

* Reference contraction: Fix incorrect order of B matrix dimensions

* Add support for mixed precision in contraction scale and bilinear

* Move using statements from instances to a common file

* Move using statements from examples to a common file

* Fix the order of B matrix dimensions across examples and profiler

* Fix the computation of error threshold

* Make ComputeDataType an optional argument

* Include possible DataType -> ComputeDataType casting error in the threshold

* Remove commented code
2023-09-29 10:54:31 -05:00
Bartłomiej Kocot
cb53874002 Add grouped conv bwd data wmma (#950)
* Add grouped conv bwd data wmma

* Fix copyrights

* Add instances with smaller NPerBlock

* Update interface test

* Minor stylistic fixes

* Minor stylistic fixes
2023-09-28 23:10:18 +02:00
Bartłomiej Kocot
271ef645ac Add grouped convolution changes to changelog (#952)
* Add grouped convolution changes to changelog

* Fix 0.2.0 ck release rocm version

* Suggested CHANGELOG.md edits

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Lisa <lisajdelaney@gmail.com>
2023-09-28 18:18:32 +02:00
Illia Silin
bc1108bb3e Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. (#951)
* Added error check after kernel launch (#919)

Co-authored-by: Xiaodong Wang <xdwang@meta.com>
Co-authored-by: Xiaodong Wang <xw285@cornell.edu>

* remove M=0 test cases for test_gemm_splitk

---------

Co-authored-by: Xiaodong Wang <xdwang@meta.com>
Co-authored-by: Xiaodong Wang <xw285@cornell.edu>
2023-09-27 15:19:33 -07:00
Bartlomiej Wroblewski
f4af5aed8b Handle type conversions to a const datatype (#944)
* Handle type conversions to a const datatype

* Review: Handle X being const data type as well

* Review: Remove typo
2023-09-27 15:02:42 -05:00
Bartłomiej Kocot
e2243a4d1e Add column to image kernel (#930)
* Add column to image kernel

* Minor fixes for dtypes and client examples

* Disable tests for disabled dtypes

* Disable add instances functions for disabled data types

* Minor stylistic fixes

* Revert "Disable add instances functions for disabled data types"

This reverts commit 728b869563.

* Instances reduction

* Add comments in device_column_to_image_impl

* Update changelog and Copyrights

* Improve changelog
2023-09-27 17:19:06 +02:00
zjing14
11676c7e49 Add multiple A/B support (#906)
* add gridwise_multi_abd

* move element_op into RunRead

* merge element_wise op with data read

* add multiABD example

* allow packed elementwise_op

* changed example

* clean

* clean

* add is_detected

* fix

* minor fix

* add scaleAdd_vec4 example

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-09-26 21:16:23 -05:00
Illia Silin
420b5a0382 Use lower case for ckprofiler package. (#948)
* split ckProfiler gfx9 package into gfx90 and gfx94

* use lower case for package names
2023-09-26 17:43:09 -07:00
zjing14
48ba6e8a69 Fixed Gemmv2r3 kpad (#938)
* added kpad support into v2r3

* add generic instances

* fixed comments

* fixed mnk padding

* Update device_batched_gemm_xdl.hpp

* fixed kpad

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-09-26 18:40:00 -05:00
Rostyslav Geyyer
94bfa50256 Add fp8 gemm instances (#920)
* Add fp8 gemm instances

* Update instance naming
2023-09-26 14:59:33 -05:00
Illia Silin
0b296a2722 split ckProfiler gfx9 package into gfx90 and gfx94 (#946) 2023-09-26 11:22:31 -07:00
Illia Silin
2ea75bd6d7 Resolve some data type issues and cmake policy. (#940)
* split the types in gemm_bilinear instances, add condition to cmake policy

* fix syntax

* split the data types in batchnorm examples

* fix the batchnorm_bwd test

* fix types in the batchnorm_bwd test
2023-09-26 08:39:11 -07:00
Bartłomiej Kocot
c95538325b Add 3d grouped conv fwd wmma instances (#935)
* Add 3d grouped conv fwd wmma instances

* Refactor fwd conv tests

* Split wmma instances for each specialization

* Minor stylistic fixes
2023-09-23 18:56:31 +02:00
Rostyslav Geyyer
ede64ae9db Update naming (#937) 2023-09-22 10:08:45 -05:00
Illia Silin
bba085d2b5 Refactoring cmake files to build data types separately. (#932)
* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances
2023-09-20 22:15:56 -07:00
Illia Silin
58817bf967 fix the building of the amd-stg-open compiler (#927) 2023-09-19 18:50:58 -07:00
Illia Silin
718065ebd2 update to rocm5.7 by default (#925)
* update to rocm5.7 by default

* fix jenkinsfile syntax
2023-09-19 09:35:45 -07:00
Illia Silin
5a4416c8a7 fix the ckprofiler package build in a loop (#926) 2023-09-19 09:17:39 -07:00
Bartlomiej Wroblewski
63cd459248 Fix DL GEMM instances with too large vector size (#901)
* Fix vector lengths of DL GEMM instances with padding
* Add checks for correctness of vector lenghts in DL GEMM
2023-09-18 14:08:23 +02:00
Rostyslav Geyyer
f17af2e9ed Add native conversions fp8<->fp32 (#908)
* Add native conversions

* Add bf8 conversions
2023-09-17 20:56:27 -05:00
Bartlomiej Kocot
bc2d0583d3 Stylistic improvements for grouped convolution code
Remove unnecessary ignoring

Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
2023-09-15 20:03:47 +02:00
zjing14
f9d0eddb90 Add fp16/fp8 support into Grouped gemm FixedNK (#874)
* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* instance and client

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

* clean

* formatting

* clean

* clean

* fixed computeType

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-09-14 21:04:10 -05:00
Illia Silin
0d8efaa13d change the cmake update method (#918) 2023-09-14 09:36:26 -07:00
Jun Liu
5fe687fa27 [Cmake] Set cmake default build type Release and path to /opt/rocm (#914) 2023-09-13 14:38:12 -07:00
Bartłomiej Kocot
475188ca2e Add grouped conv bwd weight dl instances and new layout (#897)
* Add grouped conv bwd weight dl instances and new layout

* Add M and N padding

* Remove todo comment

* Enable grouped conv fwd dl k,c=1 generic instance

* Comment fixes
2023-09-13 10:14:31 -05:00
zjing14
a66d14edf2 fixed fp8 issues (#894)
* fixed fp8 init; and reference gemm

* Update host_tensor_generator.hpp

* fixed convert

* fixed reference gemm

* fixed comments

* fixed comments

* fixed ci

* fixed computeType

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-09-12 22:17:56 -05:00
Illia Silin
74d32f0719 Add a switch to build DL kernels and build them with staging compiler. (#907)
* enable building DL kernels with the daily staging compiler

* move the DL_KERNELS flag to another function
2023-09-12 20:14:33 -05:00
Rostyslav Geyyer
62d4af7449 Refactor f8_t, add bf8_t (#792)
* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments
2023-09-12 17:04:27 -05:00
Illia Silin
56c0279bbd clean up the workspace after every stage (#909) 2023-09-12 08:57:12 -07:00
Bartlomiej Wroblewski
547dbcfbc2 Add new instances and support for small cases in DPP8 GEMM (#896) 2023-09-12 10:05:23 -05:00
Sam Wu
85e2e1e2e2 Add codeowners for documentation (#902)
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
2023-09-11 11:01:36 -06:00
Bartlomiej Wroblewski
8f84a01237 Enable DPP8 GEMM on Navi3 (#892) 2023-09-08 11:14:57 -05:00
Haocong WANG
562b4cec48 [Navi3x] Add fp16/int8 wmma conv forward instances (#746)
* fix wmma gemm int8; add grouped conv int8 example

* Add int8 gemm-bilinear instances

* compile sanity check unknown

* Sanity pass + clang-format

* add int8 conv profiler instances

* solve merge conflict

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2023-09-07 21:59:26 -05:00
Bartlomiej Wroblewski
37a8c1f756 Redesign the DPP8 GEMM kernel to use warp-wise component (#863)
* Redesign the DPP8 GEMM kernel to use warp-wise component

* Review: Improve error messages

* Review: Remove unnecessary empty lines

* Review: Fix M, N per thread names

* Review: Rename mfma_input_type to dpp_input_type

* Review: Fix tensor adaptor; remove unnecessary element

* Review: Remove calls to dpp_gemm's MakeCDescriptor

* Review: Add blockwise doc, change function names to include dimension names

* Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file

* Review: Add __restrict__ keywords

* Review: Use MatrixPadder for padding A, B, C matrices

* Review: Remove hardcoded datatypes

* Review: Change names from FloatX to XDataType

* Review: Introduce AK0 and BK0 instead of a single K0

* Review: Remove construction of dpp_datatypes object

* Review: Rename DppInstrRunner to DppLanegroupGemm
2023-09-06 11:44:09 -05:00
zjing14
3786bfe1cc added padding of K into gemm_v2r3 (#887)
* added kpad support into v2r3

* add generic instances

* fixed comments

* fixed mnk padding

* Update device_batched_gemm_xdl.hpp

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-09-06 10:15:52 -05:00
zjing14
a61b8b785e Fixed fp8 gemm (#882)
* add generic instances; fixed initi with fp8

* fixed comment

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-09-06 09:59:20 -05:00
Illia Silin
aae4df5596 set warnings as errors in doxygen (#864) 2023-09-05 14:29:37 -07:00
Bartlomiej Wroblewski
1e1f82d9b0 Add contribution guidelines to the documentation (#843)
Add contribution guidelines to the documentation
2023-09-05 21:25:28 +02:00
Illia Silin
7dcb14d9d4 fix syntax (#890) 2023-09-05 11:29:44 -07:00
Bartłomiej Kocot
0077eeb3be Add image to column kernel (#867)
* Add image to column kernel

* Add instances, tests, profiler, example

* Add client example

* Several fixes of image to column

* Fix variable name in device_image_to_column_impl

* Several fixes of image to column profiler

* Fix num_btype calculation

* Make new mesaurements for correct bytes calculation
2023-09-05 10:11:40 -05:00
Bartłomiej Kocot
0c9a1d25b3 Add nhwgc dl generic instances for grouped conv fwd (#879) 2023-09-05 10:07:56 -05:00
Bartłomiej Kocot
c981f6d033 Fix K padding calculation for grouped conv data (#876)
* Fix K padding calculation for grouped conv data

* Restore previous padd for 1x1 specialization
2023-09-05 10:07:41 -05:00
Lauren Wrubleski
bd8024b84a Fix config header installation (#880) 2023-09-04 09:49:40 -07:00
zjing14
f5ec04f091 Grouped Gemm with Fixed K and N with SplitK (#818)
* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-08-31 09:22:12 -05:00
rocking
866377de18 MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861)
* Add maxpool instances

* Rename index pool to max pool.

* Add maxpool bwd bf16 instances

* Add avg pool bwd instances

* Rename avgpool and maxpool to avg_pool3d and max_pool

* Add bf16 pool fwd instances

* Add max pool bwd to ckProfiler

* Add avg pool3d bwd to ckProfiler

* Add avg pool bwd test

* Fix bug of reference pool fwd (dilation)

* Fix bug of max pool bwd  (dilation and initZero)

* Support bf16 compute data type

* Force compute type be f32. Because atomicAdd only support f32

* Add max pool bwd test

* Rename folder

* Rename pool

* Add max pool bwd client example

* Add avg pool bwd client example

* Add missing workspace

* clang format

* Rename macro

* remove useless header

* remove useless layout
2023-08-31 21:01:50 +08:00
Illia Silin
bf1912ed3d fix gemm_streamk example on mi300 (#875) 2023-08-30 20:18:38 -07:00
Bartłomiej Kocot
9e86ebd62d Add number of error when fail (#868) 2023-08-30 10:33:11 -05:00
zjing14
38ada109ea add an example of customized type convert - bfp16_rtn (#869)
* add an example of customized bfp16_rtn

* fixed threadwise_copy

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-08-29 12:31:24 -05:00