Commit Graph

12 Commits

Author SHA1 Message Date
Illia Silin
3609ff10f7 Refactoring cmake files to build data types separately. (#932)
* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances

[ROCm/composable_kernel commit: bba085d2b5]
2023-09-20 22:15:56 -07:00
Illia Silin
25efecc13e Allow building CK for specific data types and split off last remaining DL instances. (#830)
* properly split conv_nd_bwd_data instances

* split conv2d_fwd instance data types

* split the gemm, conv2d_fwd and batched_gemm_softamx_gemm

* split the tests by data types where possible

* filter examples by DTYPES

* split few remaining examples by DTYPES

* filter most instances by DTYPES

* add new lines at end of headers, fix grouped_gemm profiler

* fix syntax

* split the ckprofiler instances by DTYPES

* split the conv2d and quantization DL and XDL instances

* fix the splitting of conv2d DL instances

* split softmax and pool_fwd tests for fp16 and fp32 types

* fix syntax

* fix the dl_int8 quantization instances isolation

[ROCm/composable_kernel commit: 08eb176929]
2023-08-07 14:56:10 -07:00
Illia Silin
541ee9c35d do not build gemm-gemm and conv-conv examples for gfx94* (#761)
* do not build gemm-gemm and conv-conv examples for gfx94*

* do not build gemm-gemm and conv-conv examples on navi

[ROCm/composable_kernel commit: 645eb2f2a0]
2023-06-19 16:55:03 -07:00
Illia Silin
48347d8653 Enable gfx941 and gfx942 architectures. (#752)
* enable gfx941/942 targets

* fix clang format

* fix the cmake logic for multiple targets

* fix cmake syntax for looping over targets

* add gfx941/942 support for gemm_xdl instances

[ROCm/composable_kernel commit: 027e46ee82]
2023-06-15 08:20:59 -07:00
Illia Silin
d40b8d5e2c update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
Illia Silin
2359d80980 Enable gemm_dl and other kernels on Navi3x. (#714)
* enable dl kernels on navi3

* do not build xdl tests and examples on Navi

* run tests before building everything on jenkins

* disable gemm_bilinear on gfx1030

* add gpu targets to installer on Navi

* put tests in the same order as before

* reduce the number of navi targets in CI

* build CI installed for gfx940 as well

* only build for MI300 during QA runs

[ROCm/composable_kernel commit: d821d1e54f]
2023-05-23 11:23:16 -05:00
Illia Silin
dda83a196e Syncing up from internal repo to enable MI300. (#690)
* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 4feebedd41]
2023-04-28 18:22:59 -05:00
Po Yen Chen
ff9f244625 Introduce ck::accumulate_n() (#439)
We can use this template to eliminate duplicated iterator computing
logics. By providing return type to ck::accumulate_n(), we can avoid
type conversion operations.

[ROCm/composable_kernel commit: 730204eed0]
2022-11-14 19:53:39 -06:00
Po Yen Chen
f2dd2e5b09 Rangify constructor of HostTensorDescriptor & Tensor<> (#445)
* Rangify STL algorithms

This commit adapts rangified std::copy(), std::fill() & std::transform()

* Rangify check_err()

By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.

* Allow constructing Tensor<> like a HostTensorDescriptor

* Simplify Tensor<> object construction logics

* Remove more unnecessary 'HostTensorDescriptor' objects

* Re-format example code

* Re-write more HostTensorDescriptor ctor call

[ROCm/composable_kernel commit: 4a2a56c22f]
2022-11-11 11:36:01 -06:00
Adam Osewski
8a8f8521f9 Refactor device op implementations into impl subdirectory. (#420)
* Move kernel implementation files under impl directory.

* Update examples paths.

* Update device kernel impl include paths.

* Update tensor operation instances include paths.

* Update profiler and tests include paths.

* Clang-format

* Update include paths for batched gemm reduce

* Refactor UnitTest ConvNDBwdWeight.

* Refactor fwd and bwd data convND UT.

* Fix used test macro.

* Fix include path.

* Fix include paths.

* Fix include paths in profiler and tests.

* Fix include paths.

Co-authored-by: Adam Osewski <aosewski@amd.com>

[ROCm/composable_kernel commit: 3048028897]
2022-10-13 09:05:08 -05:00
Chao Liu
3ef500d96b add more datatype to gemm+gemm and conv+conv example (#397)
* refactor

* refactor

* adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm

* adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm

* clean

[ROCm/composable_kernel commit: 204ef976ca]
2022-09-01 09:31:17 -05:00
Chao Liu
feca6e57f9 conv+conv (1x1 only) example using gemm+gemm (#393)
* refactor conv

* add conv+conv example, 1x1 only

[ROCm/composable_kernel commit: 4df6d93f60]
2022-08-31 11:27:11 -05:00