Po Yen Chen
b71b9d2159
Add equal<float> & equal<double>
2024-04-09 13:49:31 +00:00
Po Yen Chen
54c28861fe
Reuse the existing template less_equal<> in check_err()
2024-04-09 13:39:18 +00:00
Po Yen Chen
3f57b3068a
Extend less_equal<>
2024-04-09 13:38:40 +00:00
Po Yen Chen
c6eac9746f
Fix type errors in composes<>
2024-04-09 13:18:17 +00:00
Po Yen Chen
35e2d18e5e
Merge branch 'ck_tile/elementwise' of github.com:ROCm/composable_kernel into ck_tile/elementwise
2024-04-09 12:50:40 +00:00
Po Yen Chen
a66409cfd9
Unify saturates<> implementation
2024-04-09 12:49:07 +00:00
rocking
83b8a99018
Merge branch 'ck_tile/refactor' into ck_tile/elementwise
2024-04-09 19:45:43 +08:00
carlushuang
89a75a97fa
fix some bug in group-mode masking and codegen. update README
2024-04-09 19:01:25 +00:00
Po Yen Chen
ecc64bce12
Generalize the composes<> template
2024-04-09 10:14:56 +00:00
Po Yen Chen
6ed739f913
Fix wrong value produced by saturating
2024-04-09 09:27:58 +00:00
Po Yen Chen
5d0ebdbfe4
Re-use already-existing scales<> functor template
2024-04-09 08:06:38 +00:00
Po Yen Chen
ad45cf8613
Support heterogeneous argument for binary function types
2024-04-09 07:41:30 +00:00
Po Yen Chen
a9adfbe54a
Small refinements in C++ source files
2024-04-09 06:45:03 +00:00
Po Yen Chen
20fcd69687
Remove not-in-use elementwise function kargs
2024-04-09 06:03:35 +00:00
rocking
5860f3134a
Merge branch 'ck_tile/refactor' into ck_tile/elementwise
2024-04-09 02:37:42 +08:00
Po Yen Chen
87f3cd1ddd
Use CK_TILE_FLOAT_TO_FP8_STANDARD as default fp8 rounding mode
2024-04-08 12:39:58 +00:00
Po Yen Chen
641ae96215
Check fp8 rounding error in check_err()
2024-04-08 12:39:27 +00:00
Po Yen Chen
92d45d1681
Fix wrong fp8 QK/KV block gemm setting
2024-04-08 12:39:17 +00:00
rocking
4e005f2457
Avoid warning
2024-04-08 10:11:51 +00:00
rocking
29a0670744
Remove remove_cvref_t
2024-04-08 10:03:48 +00:00
rocking
5c3fdeb0b8
Remove f8 pipeline, we should share the same pipeline even in f8
2024-04-08 09:56:23 +00:00
rocking
f7d81364f3
To prevent compiler issue, remove the elementwise function we have not used.
2024-04-08 09:44:21 +00:00
carlushuang
42ebffe822
1).support receipe in generate.py 2).use simplified mask type 3).change left/right to pass into karg
2024-04-07 23:30:34 +00:00
carlushuang
8050921512
Merge branch 'develop' into ck_tile/refactor
2024-04-05 20:49:13 +08:00
jakpiase
c701071666
Add Grouped Gemm Multiple D SplitK TwoStage ( #1212 )
...
* Support A/B/C elementwise ops.
* First part of GGEMM multiD splitk two stage.
* WIP - changes for debuggin.
* tmp save
* working version
* added bf16@int8 version
* fixes
* add reviewers sugestions
* pre-commited missing files
* switched to ifs from elseifs
---------
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com >
2024-04-04 11:01:33 +02:00
rocking
68153dea0b
Let generate.py can generate different elementwise function
2024-04-04 03:59:38 +00:00
rocking
d6cb104d0f
Add some elementwise op, prepare to quantization
2024-04-04 03:18:39 +00:00
rocking
d9323ea261
Fix bug of elementwise op, our elementwise op is not inout
2024-04-04 03:17:36 +00:00
Rostyslav Geyyer
a61e73bc56
Add instances for conv_scale with fp8@bf8->fp8 ( #1220 )
...
* Update device op api to support BComputeType
* Add example
* Add instances
* Add profiler mode
* Add client example
* Update copyright year
* Add BComputeType check
* Fix compute types
2024-04-03 09:08:08 -05:00
carlushuang
06f1cabd78
Merge branch 'develop' into ck_tile/refactor
2024-04-03 20:51:01 +08:00
rocking
bfcf550305
Adjust P elementwise function
2024-04-03 11:07:21 +00:00
Bartłomiej Kocot
9a194837af
Introduce combined elementwise ops ( #1217 )
...
* Introduce combined elementwise ops
* Introduce refrence elementwise
2024-04-02 17:23:49 -05:00
Illia Silin
ae57e5938e
Split the instances by architecture. ( #1223 )
...
* parse examples inside the add_example_executable function
* fix the example 64 cmake file
* add xdl flag to the gemm_bias_softmax_gemm_permute example
* add filtering of tests based on architecture type
* enable test_grouped_gemm for gfx9 only
* enable test_transpose only for gfx9
* only linnk test_transpose if it gets built
* split the gemm instances by architectures
* split gemm_bilinear,grouped_conv_bwd_weight instances by targets
* split instances by architecture
* split grouped_conv instances by architecture
* fix clang format
* fix the if-else logic in group_conv headers
* small fix for grouped convolution instances
* fix the grouped conv bwd weight dl instances
* fix client examples
* only enable client examples 3 and 4 on gfx9
* set the gfx9 macro
* make sure the architecture macros are set by cmake
* use separate set of xdl/wmma flags for host code
* sinmplify the main cmake file
* add conv_fwd_bf8 instance declaration
2024-04-02 09:42:17 -07:00
zjing14
303d4594f4
improved zeroing ( #1221 )
2024-04-02 11:02:52 -05:00
rocking
cf57626c07
Merge branch 'ck_tile/refactor' into ck_tile/elementwise
2024-04-01 16:07:27 +08:00
carlushuang
42866940dc
remove mistake
2024-03-31 00:01:30 +00:00
carlushuang
855a264b72
remove ck_tile example from default cmake target like all/install/check
2024-03-30 23:58:48 +00:00
rocking
286c74468d
Add element function to fmha api
2024-03-29 18:05:36 -04:00
rocking
50c36f352a
Add SAccElementFunction, PComputeElementFunction, OAccElementFunction in pipeline
2024-03-29 07:09:06 -04:00
carlushuang
13311f2e5a
fix clang-format
2024-03-26 18:53:10 +00:00
carlushuang
04ee01191a
fix merge from upstream
2024-03-26 14:09:54 +00:00
carlushuang
c94b545747
update some readme
2024-03-26 13:35:53 +00:00
carlushuang
200d2b22d4
fix scratch in fp8 kernel
2024-03-25 19:45:38 +00:00
Po-Yen, Chen
1cacb713c5
Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode
2024-03-23 22:51:18 -04:00
Bartłomiej Kocot
9c052804a7
Add elementwise with dynamic vector dim ( #1198 )
...
* Add elementwise with dynamic vector dim
* Reduce number of instaces
* Fixes
* Fixes
2024-03-22 10:40:43 +01:00
carlushuang
bb1f6e48eb
fix fp8 duplicated move/shift/and/or problem
2024-03-19 23:29:57 +00:00
carlushuang
886d040a81
fix compile error, fp8 not ready now
2024-03-18 07:58:00 +00:00
carlushuang
f55c7629bc
not using custom data type by default, now we can have ISA-level same code as opt_padding
2024-03-17 23:23:32 +00:00
carlushuang
ee397d0ab2
temp fix buffer_store spill
2024-03-15 22:56:41 +00:00
Bartłomiej Kocot
285251768e
Add conv fwd/bwd data scale instances, extend bilinear instances ( #1178 )
...
* Add conv fwd/bwd data scale instances
* Fix cmake client example file
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-03-13 23:09:08 +01:00