rocking
cf57626c07
Merge branch 'ck_tile/refactor' into ck_tile/elementwise
2024-04-01 16:07:27 +08:00
carlushuang
42866940dc
remove mistake
2024-03-31 00:01:30 +00:00
carlushuang
855a264b72
remove ck_tile example from default cmake target like all/install/check
2024-03-30 23:58:48 +00:00
rocking
286c74468d
Add element function to fmha api
2024-03-29 18:05:36 -04:00
rocking
50c36f352a
Add SAccElementFunction, PComputeElementFunction, OAccElementFunction in pipeline
2024-03-29 07:09:06 -04:00
carlushuang
13311f2e5a
fix clang-format
2024-03-26 18:53:10 +00:00
carlushuang
04ee01191a
fix merge from upstream
2024-03-26 14:09:54 +00:00
carlushuang
c94b545747
update some readme
2024-03-26 13:35:53 +00:00
carlushuang
200d2b22d4
fix scratch in fp8 kernel
2024-03-25 19:45:38 +00:00
Po-Yen, Chen
1cacb713c5
Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode
2024-03-23 22:51:18 -04:00
carlushuang
bb1f6e48eb
fix fp8 duplicated move/shift/and/or problem
2024-03-19 23:29:57 +00:00
carlushuang
886d040a81
fix compile error, fp8 not ready now
2024-03-18 07:58:00 +00:00
carlushuang
f55c7629bc
not using custom data type by default, now we can have ISA-level same code as opt_padding
2024-03-17 23:23:32 +00:00
carlushuang
ee397d0ab2
temp fix buffer_store spill
2024-03-15 22:56:41 +00:00
carlushuang
04762d212b
make sure thread_buffer can be tuple/array
2024-03-13 22:03:42 +00:00
carlushuang
616932068d
let more integral_constant->constant, and formating
2024-03-13 18:33:10 +00:00
Po-Yen, Chen
b1dbf64c91
Some minor changes
2024-03-13 03:55:07 -04:00
Po-Yen, Chen
8d1631adc9
Re-use function
2024-03-13 03:38:12 -04:00
Po-Yen, Chen
60221b89f8
Add constraint to array<> ctor
2024-03-13 03:32:05 -04:00
Po-Yen, Chen
5c433432fd
Fix format
2024-03-13 03:21:30 -04:00
Po-Yen, Chen
958218e9d0
Rename enum
...
Rename 'cood_transform_enum' to 'coord_transform_enum'
2024-03-13 03:15:04 -04:00
carlushuang
d962a0044b
fix compile issue in transpose
2024-03-13 15:02:45 +00:00
carlushuang
a59e655eb2
remove wrong code in store_raw()
2024-03-13 14:30:55 +00:00
Po-Yen, Chen
8103048b99
Merge branch 'ck_tile/refactor' of github.com:ROCm/composable_kernel-internal into ck_tile/refactor
2024-03-13 01:53:43 -04:00
Po-Yen, Chen
2b4e54305b
Merge function templates
2024-03-13 01:52:49 -04:00
carlushuang
9f34bcb431
re-structure tuple/array to avoid spill
2024-03-11 15:32:21 +00:00
carlushuang
26a25eb4cd
unify as tuple_array
2024-03-06 18:36:45 +00:00
carlushuang
7df3947819
fix macro for exp2; fix warpgemm a/b in transposedC
2024-03-06 15:59:21 +00:00
carlushuang
0e7df1999f
wip fix
2024-03-06 14:31:36 +00:00
carlushuang
f549bb5d39
minor fix
2024-03-04 21:11:53 +00:00
carlushuang
a67473fff8
now can build
2024-03-04 20:45:51 +00:00
carlushuang
112d521b09
fix xx
2024-03-03 23:48:31 +00:00
carlushuang
fbd25cea35
fix build wip
2024-02-29 22:27:31 +00:00
carlushuang
f69356b1d7
add code
2024-02-28 22:57:19 +00:00
illsilin
e60bf36c9e
fix clang format
2024-02-14 16:16:38 -08:00
illsilin
d66da6bee9
initial enablement of gfx950
2024-02-14 15:33:50 -08:00
Lakhinder Walia
1f306024d0
fast_gelu: minor code reorg to enhance ref & gpu performance ( #1162 )
2024-02-07 19:24:51 -08:00
jakpiase
ba86eadce5
Add support for mixed-precision f16bf16_int8 gemm ( #1127 )
2024-02-07 15:54:13 +01:00
Bartlomiej Wroblewski
6951858221
Implement direct loads split-K GEMM kernel ( #1137 )
...
* WIP: Implement direct loads split-K GEMM kernel
* Clean the review
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
2024-02-07 01:08:34 +01:00
Illia Silin
180f16f9ac
Add support for more Navi2x and Navi3x models. ( #1152 )
...
* add support for navi2x and navi3x models
* fix syntax
* use common macro for different mi300 architectures
2024-02-02 11:35:26 -08:00
Bartłomiej Kocot
171ca260b5
Extend gemm traits number for ck wrapper ( #1153 )
2024-02-02 11:25:54 -08:00
Bartłomiej Kocot
f3b6c23ac5
Add blockwise gemm to ck wrapper ( #1139 )
...
* Add blockwise gemm to ck wrapper
* Add blockwise gemm traits
* Disable test_gemm for non xdl devices
* Fixes
* Add c layout descritpions
2024-01-31 21:24:40 +01:00
Illia Silin
180e572076
Fixing most of the cppcheck errors. ( #1142 )
...
* fix cppcheck errors, first pass
* fix format
* fix returned value in examples
* add macro definitions for cppcheck
* fix the profile_gemm logic
* update the gemm profiler logic
* add more difinitions to cppcheck, fix couple more errors
* replace runtime error with message in device function
* fix a couple of int4 issues
* no return for fill function
* fix errors in data_types.hpp
* fix format
* fix few remaining errors
* fix errors in data_types.hpp
* fix last couple of errors in datat_types.hpp
2024-01-24 13:47:48 -08:00
Haocong WANG
bb63b9732c
[GEMM] Optimization for MI200/300. ( #1135 )
...
* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances
* clang format + typo fix
* Fix a bug
2024-01-19 07:02:22 -06:00
Bartłomiej Kocot
7e4eb4b800
Add optimized copy to ck wrapper ( #1126 )
...
* Add optimized copy to ck wrapper
* Example optimizations
* Fixes
* Move img2col test to client example
* Refactor example
* Fix docs
* Fixes
* Fix
* Fixes
* Fixes
* Fixes
* Fixes
* Fixes
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
2024-01-19 11:29:00 +01:00
Illia Silin
e6d099c830
Add cppcheck to CK CI. ( #1125 )
...
* add cppcheck to the CK CI
* fix the path to CK source for cppcheck
* fix the path to CK source for cppcheck one more time
* fix the path to CK source for cppcheck third time
* change the path to ck_cppcheck.log
* install latest cppcheck from source
* fix bug in ck.hpp and use 20 threads for cppcheck
* create a switch to turn cppckeck on and off in CI
2024-01-15 09:11:45 -08:00
Illia Silin
886d9eeb99
Add an option to change the number of warm-up cycles and iterations. ( #1124 )
...
* allow setting the number of warmup cycles and iterations for profiler
* fix the gemm_splitk and grouped_gemm examples
2024-01-09 09:43:08 -08:00
raramakr
e699dbd8a3
SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__ for debug prints. ( #1123 )
...
* SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__ for debug prints.
Hiptensor library is using the header files from CK. Hard coded ROCm path was getting embedded into the hiptensor library, since the header file was having the macro __FILE__. Replace the macro with filename.
* fix syntax
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-01-09 08:21:47 -08:00
Bartłomiej Kocot
4234b3a691
Add tensor partition and generic copy for ck wrapper ( #1108 )
...
* Add tensor partition and generic copy for ck wrapper
* Update changelog
* Stylistic fixes
* Change shape/strides logic to descriptor transforms
* Fixes
* Fix client example
* Fix comments
2024-01-03 01:10:57 +01:00
Artur Wojcik
fb5bd51b42
enable compilation of INSTANCES_ONLY for Windows ( #1082 )
...
* enable compilation of INSTANCES_ONLY for Windows
* suppress ROCMChecks warnings on GoogleTests
* suppress -Wfloat-equal warning on GoogleTests
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2023-12-20 14:34:53 -08:00