rocking
29a0670744
Remove remove_cvref_t
2024-04-08 10:03:48 +00:00
rocking
5c3fdeb0b8
Remove f8 pipeline, we should share the same pipeline even in f8
2024-04-08 09:56:23 +00:00
rocking
f7d81364f3
To prevent compiler issue, remove the elementwise function we have not used.
2024-04-08 09:44:21 +00:00
rocking
68153dea0b
Let generate.py can generate different elementwise function
2024-04-04 03:59:38 +00:00
rocking
d6cb104d0f
Add some elementwise op, prepare to quantization
2024-04-04 03:18:39 +00:00
rocking
d9323ea261
Fix bug of elementwise op, our elementwise op is not inout
2024-04-04 03:17:36 +00:00
rocking
bfcf550305
Adjust P elementwise function
2024-04-03 11:07:21 +00:00
rocking
cf57626c07
Merge branch 'ck_tile/refactor' into ck_tile/elementwise
2024-04-01 16:07:27 +08:00
carlushuang
42866940dc
remove mistake
2024-03-31 00:01:30 +00:00
carlushuang
855a264b72
remove ck_tile example from default cmake target like all/install/check
2024-03-30 23:58:48 +00:00
rocking
286c74468d
Add element function to fmha api
2024-03-29 18:05:36 -04:00
rocking
50c36f352a
Add SAccElementFunction, PComputeElementFunction, OAccElementFunction in pipeline
2024-03-29 07:09:06 -04:00
carlushuang
13311f2e5a
fix clang-format
2024-03-26 18:53:10 +00:00
carlushuang
04ee01191a
fix merge from upstream
2024-03-26 14:09:54 +00:00
carlushuang
c94b545747
update some readme
2024-03-26 13:35:53 +00:00
carlushuang
200d2b22d4
fix scratch in fp8 kernel
2024-03-25 19:45:38 +00:00
Po-Yen, Chen
1cacb713c5
Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode
2024-03-23 22:51:18 -04:00
carlushuang
bb1f6e48eb
fix fp8 duplicated move/shift/and/or problem
2024-03-19 23:29:57 +00:00
carlushuang
886d040a81
fix compile error, fp8 not ready now
2024-03-18 07:58:00 +00:00
carlushuang
f55c7629bc
not using custom data type by default, now we can have ISA-level same code as opt_padding
2024-03-17 23:23:32 +00:00
carlushuang
ee397d0ab2
temp fix buffer_store spill
2024-03-15 22:56:41 +00:00
carlushuang
04762d212b
make sure thread_buffer can be tuple/array
2024-03-13 22:03:42 +00:00
carlushuang
616932068d
let more integral_constant->constant, and formating
2024-03-13 18:33:10 +00:00
Po-Yen, Chen
b1dbf64c91
Some minor changes
2024-03-13 03:55:07 -04:00
Po-Yen, Chen
8d1631adc9
Re-use function
2024-03-13 03:38:12 -04:00
Po-Yen, Chen
60221b89f8
Add constraint to array<> ctor
2024-03-13 03:32:05 -04:00
Po-Yen, Chen
5c433432fd
Fix format
2024-03-13 03:21:30 -04:00
Po-Yen, Chen
958218e9d0
Rename enum
...
Rename 'cood_transform_enum' to 'coord_transform_enum'
2024-03-13 03:15:04 -04:00
carlushuang
d962a0044b
fix compile issue in transpose
2024-03-13 15:02:45 +00:00
carlushuang
a59e655eb2
remove wrong code in store_raw()
2024-03-13 14:30:55 +00:00
Po-Yen, Chen
8103048b99
Merge branch 'ck_tile/refactor' of github.com:ROCm/composable_kernel-internal into ck_tile/refactor
2024-03-13 01:53:43 -04:00
Po-Yen, Chen
2b4e54305b
Merge function templates
2024-03-13 01:52:49 -04:00
carlushuang
9f34bcb431
re-structure tuple/array to avoid spill
2024-03-11 15:32:21 +00:00
carlushuang
26a25eb4cd
unify as tuple_array
2024-03-06 18:36:45 +00:00
carlushuang
7df3947819
fix macro for exp2; fix warpgemm a/b in transposedC
2024-03-06 15:59:21 +00:00
carlushuang
0e7df1999f
wip fix
2024-03-06 14:31:36 +00:00
carlushuang
f549bb5d39
minor fix
2024-03-04 21:11:53 +00:00
carlushuang
a67473fff8
now can build
2024-03-04 20:45:51 +00:00
carlushuang
112d521b09
fix xx
2024-03-03 23:48:31 +00:00
carlushuang
fbd25cea35
fix build wip
2024-02-29 22:27:31 +00:00
carlushuang
f69356b1d7
add code
2024-02-28 22:57:19 +00:00
illsilin
e60bf36c9e
fix clang format
2024-02-14 16:16:38 -08:00
illsilin
d66da6bee9
initial enablement of gfx950
2024-02-14 15:33:50 -08:00
Lakhinder Walia
1f306024d0
fast_gelu: minor code reorg to enhance ref & gpu performance ( #1162 )
2024-02-07 19:24:51 -08:00
jakpiase
ba86eadce5
Add support for mixed-precision f16bf16_int8 gemm ( #1127 )
2024-02-07 15:54:13 +01:00
Bartlomiej Wroblewski
6951858221
Implement direct loads split-K GEMM kernel ( #1137 )
...
* WIP: Implement direct loads split-K GEMM kernel
* Clean the review
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
2024-02-07 01:08:34 +01:00
Illia Silin
180f16f9ac
Add support for more Navi2x and Navi3x models. ( #1152 )
...
* add support for navi2x and navi3x models
* fix syntax
* use common macro for different mi300 architectures
2024-02-02 11:35:26 -08:00
Bartłomiej Kocot
171ca260b5
Extend gemm traits number for ck wrapper ( #1153 )
2024-02-02 11:25:54 -08:00
Bartłomiej Kocot
f3b6c23ac5
Add blockwise gemm to ck wrapper ( #1139 )
...
* Add blockwise gemm to ck wrapper
* Add blockwise gemm traits
* Disable test_gemm for non xdl devices
* Fixes
* Add c layout descritpions
2024-01-31 21:24:40 +01:00
Illia Silin
180e572076
Fixing most of the cppcheck errors. ( #1142 )
...
* fix cppcheck errors, first pass
* fix format
* fix returned value in examples
* add macro definitions for cppcheck
* fix the profile_gemm logic
* update the gemm profiler logic
* add more difinitions to cppcheck, fix couple more errors
* replace runtime error with message in device function
* fix a couple of int4 issues
* no return for fill function
* fix errors in data_types.hpp
* fix format
* fix few remaining errors
* fix errors in data_types.hpp
* fix last couple of errors in datat_types.hpp
2024-01-24 13:47:48 -08:00