Commit Graph

419 Commits

Author SHA1 Message Date
carlushuang
855a264b72 remove ck_tile example from default cmake target like all/install/check 2024-03-30 23:58:48 +00:00
carlushuang
13311f2e5a fix clang-format 2024-03-26 18:53:10 +00:00
carlushuang
04ee01191a fix merge from upstream 2024-03-26 14:09:54 +00:00
carlushuang
c94b545747 update some readme 2024-03-26 13:35:53 +00:00
carlushuang
200d2b22d4 fix scratch in fp8 kernel 2024-03-25 19:45:38 +00:00
Po-Yen, Chen
1cacb713c5 Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode 2024-03-23 22:51:18 -04:00
carlushuang
bb1f6e48eb fix fp8 duplicated move/shift/and/or problem 2024-03-19 23:29:57 +00:00
carlushuang
886d040a81 fix compile error, fp8 not ready now 2024-03-18 07:58:00 +00:00
carlushuang
f55c7629bc not using custom data type by default, now we can have ISA-level same code as opt_padding 2024-03-17 23:23:32 +00:00
carlushuang
ee397d0ab2 temp fix buffer_store spill 2024-03-15 22:56:41 +00:00
carlushuang
04762d212b make sure thread_buffer can be tuple/array 2024-03-13 22:03:42 +00:00
carlushuang
616932068d let more integral_constant->constant, and formating 2024-03-13 18:33:10 +00:00
Po-Yen, Chen
b1dbf64c91 Some minor changes 2024-03-13 03:55:07 -04:00
Po-Yen, Chen
8d1631adc9 Re-use function 2024-03-13 03:38:12 -04:00
Po-Yen, Chen
60221b89f8 Add constraint to array<> ctor 2024-03-13 03:32:05 -04:00
Po-Yen, Chen
5c433432fd Fix format 2024-03-13 03:21:30 -04:00
Po-Yen, Chen
958218e9d0 Rename enum
Rename 'cood_transform_enum' to 'coord_transform_enum'
2024-03-13 03:15:04 -04:00
carlushuang
d962a0044b fix compile issue in transpose 2024-03-13 15:02:45 +00:00
carlushuang
a59e655eb2 remove wrong code in store_raw() 2024-03-13 14:30:55 +00:00
Po-Yen, Chen
8103048b99 Merge branch 'ck_tile/refactor' of github.com:ROCm/composable_kernel-internal into ck_tile/refactor 2024-03-13 01:53:43 -04:00
Po-Yen, Chen
2b4e54305b Merge function templates 2024-03-13 01:52:49 -04:00
carlushuang
9f34bcb431 re-structure tuple/array to avoid spill 2024-03-11 15:32:21 +00:00
carlushuang
26a25eb4cd unify as tuple_array 2024-03-06 18:36:45 +00:00
carlushuang
7df3947819 fix macro for exp2; fix warpgemm a/b in transposedC 2024-03-06 15:59:21 +00:00
carlushuang
0e7df1999f wip fix 2024-03-06 14:31:36 +00:00
carlushuang
f549bb5d39 minor fix 2024-03-04 21:11:53 +00:00
carlushuang
a67473fff8 now can build 2024-03-04 20:45:51 +00:00
carlushuang
112d521b09 fix xx 2024-03-03 23:48:31 +00:00
carlushuang
fbd25cea35 fix build wip 2024-02-29 22:27:31 +00:00
carlushuang
f69356b1d7 add code 2024-02-28 22:57:19 +00:00
illsilin
e60bf36c9e fix clang format 2024-02-14 16:16:38 -08:00
illsilin
d66da6bee9 initial enablement of gfx950 2024-02-14 15:33:50 -08:00
Lakhinder Walia
1f306024d0 fast_gelu: minor code reorg to enhance ref & gpu performance (#1162) 2024-02-07 19:24:51 -08:00
jakpiase
ba86eadce5 Add support for mixed-precision f16bf16_int8 gemm (#1127) 2024-02-07 15:54:13 +01:00
Bartlomiej Wroblewski
6951858221 Implement direct loads split-K GEMM kernel (#1137)
* WIP: Implement direct loads split-K GEMM kernel

* Clean the review

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
2024-02-07 01:08:34 +01:00
Illia Silin
180f16f9ac Add support for more Navi2x and Navi3x models. (#1152)
* add support for navi2x and navi3x models

* fix syntax

* use common macro for different mi300 architectures
2024-02-02 11:35:26 -08:00
Bartłomiej Kocot
171ca260b5 Extend gemm traits number for ck wrapper (#1153) 2024-02-02 11:25:54 -08:00
Bartłomiej Kocot
f3b6c23ac5 Add blockwise gemm to ck wrapper (#1139)
* Add blockwise gemm to ck wrapper

* Add blockwise gemm traits

* Disable test_gemm for non xdl devices

* Fixes

* Add c layout descritpions
2024-01-31 21:24:40 +01:00
Illia Silin
180e572076 Fixing most of the cppcheck errors. (#1142)
* fix cppcheck errors, first pass

* fix format

* fix returned value in examples

* add macro definitions for cppcheck

* fix the profile_gemm logic

* update the gemm profiler logic

* add more difinitions to cppcheck, fix couple more errors

* replace runtime error with message in device function

* fix a couple of int4 issues

* no return for fill function

* fix errors in data_types.hpp

* fix format

* fix few remaining errors

* fix errors in data_types.hpp

* fix last couple of errors in datat_types.hpp
2024-01-24 13:47:48 -08:00
Haocong WANG
bb63b9732c [GEMM] Optimization for MI200/300. (#1135)
* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances

* clang format + typo fix

* Fix a bug
2024-01-19 07:02:22 -06:00
Bartłomiej Kocot
7e4eb4b800 Add optimized copy to ck wrapper (#1126)
* Add optimized copy to ck wrapper

* Example optimizations

* Fixes

* Move img2col test to client example

* Refactor example

* Fix docs

* Fixes

* Fix

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2024-01-19 11:29:00 +01:00
Illia Silin
e6d099c830 Add cppcheck to CK CI. (#1125)
* add cppcheck to the CK CI

* fix the path to CK source for cppcheck

* fix the path to CK source for cppcheck one more time

* fix the path to CK source for cppcheck third time

* change the path to ck_cppcheck.log

* install latest cppcheck from source

* fix bug in ck.hpp and use 20 threads for cppcheck

* create a switch to turn cppckeck on and off in CI
2024-01-15 09:11:45 -08:00
Illia Silin
886d9eeb99 Add an option to change the number of warm-up cycles and iterations. (#1124)
* allow setting the number of warmup cycles and iterations for profiler

* fix the gemm_splitk and grouped_gemm examples
2024-01-09 09:43:08 -08:00
raramakr
e699dbd8a3 SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__ for debug prints. (#1123)
* SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__ for debug prints.

Hiptensor library is using the header files from CK. Hard coded ROCm path was getting embedded into the hiptensor library, since the header file was having the macro __FILE__. Replace the macro with filename.

* fix syntax

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
2024-01-09 08:21:47 -08:00
Bartłomiej Kocot
4234b3a691 Add tensor partition and generic copy for ck wrapper (#1108)
* Add tensor partition and generic copy for ck wrapper

* Update changelog

* Stylistic fixes

* Change shape/strides logic to descriptor transforms

* Fixes

* Fix client example

* Fix comments
2024-01-03 01:10:57 +01:00
Artur Wojcik
fb5bd51b42 enable compilation of INSTANCES_ONLY for Windows (#1082)
* enable compilation of INSTANCES_ONLY for Windows

* suppress ROCMChecks warnings on GoogleTests

* suppress -Wfloat-equal warning on GoogleTests

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2023-12-20 14:34:53 -08:00
rocking
a69aa2a11a layernorm and groupnorm backward data (#1083)
* rename folder

* Add type string

* Remove typo

* Add deviceOp to backward x

* Add comment to describe the behavior of backward normalization

* Add kernel function, prepare to implement

* implement generic kernel

* Check vector size

* Add sweep once pipeline for small reduce size

* Fix bug of KRaw_ error

* Fix bug of dx stride

* sanity check for mean and rstd

* backward x for groupnorm

* Add bwd x instance

* add layernorm 2d bwd gamma beta instances

* Change save mean var type from f32 to f16 in f16 mode

* Change the example to f16

* Add groupnorm bwd gamma beta instance

* Add groupnorm bwd x instance

* Fix naming

* Add layernorm bwd x ckprofiler

* Add groupnorm bwd x profiler

* clang format

* Rename bwd x to bwd data

* Fix bug of verification in profiler

* Add test of layernorm and groupnorm bwd data

* Add missing cmake

* Add layernorm2d bwd data

* rename fwd example

* Add groupnorm client example

* Fix typo. replace Invarient with Invariant

* Add checking before running the best instance
2023-12-19 04:23:11 +08:00
Bartłomiej Kocot
07092d68f0 Add tensor structure to wrapper (#1098)
* Add tensor structure to wrapper

* update changelog

* Fix names

* Comment fixes
2023-12-15 12:45:08 +01:00
Jun Liu
3a3b98ef79 [Doc][Werror] Fix security alerts and sync with MIOpen (#1085)
* fix Werror unused-parameter

* sync doc requirements

* fix blank space format

* fix dependency issue
2023-12-13 12:50:15 -08:00
Rostyslav Geyyer
6891e4d109 Fix the bugs (#1099) 2023-12-13 12:27:31 -08:00