zjing14
7efc5b518d
remove example 60 ( #963 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 570ff3ddbe ]
2023-10-05 09:41:01 -07:00
zjing14
ad4c658125
Grouped conv bwd data with fp16 input and bf8fp8 comp ( #962 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
* Update naming (#937 )
* Add a client example
* Add computetypes to device and gridwise ops
* Add instances, update instance factory
* Format
* Fix a flag
* Add ckProfiler mode
* Fix typos
* Add an example
* Add bf8 generator
* add bf8 mfma; fixed type_convert for bf8
* move verfication ahead of timing
* Update reference calculation
* Fix reference
* Narrow down float init range
* Fix bf8 bf8 mfma
* Add bf8 @ fp8 mfma
* Update example
* Update instances
* Update profiler api
* Update for compatibility
* Format
* Remove extra example
* Clean up
* workaround convert
* added instance of f16_bf8f8, and client example
* fixed mfma selector
* format
---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 04f93aadb8 ]
2023-10-04 18:04:27 -05:00
Rostyslav Geyyer
80b2318991
Add conv bwd weight fp16 comp bf8 fp8 op, instances and example ( #945 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
* Update naming (#937 )
* Add a client example
* Add computetypes to device and gridwise ops
* Add instances, update instance factory
* Format
* Fix a flag
* Add ckProfiler mode
* Fix typos
* Add an example
* Add bf8 generator
* add bf8 mfma; fixed type_convert for bf8
* move verfication ahead of timing
* Update reference calculation
* Fix reference
* Narrow down float init range
* Fix bf8 bf8 mfma
* Add bf8 @ fp8 mfma
* Update example
* Update instances
* Update profiler api
* Update for compatibility
* Format
* Remove extra example
* Clean up
* workaround convert
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 42facfc6b7 ]
2023-10-04 08:19:08 -05:00
zjing14
b65829d9a2
3d grouped conv fwd with input/output fp16 and comp fp8 ( #931 )
...
* add f8 comp instance
* fixed
* fixed comments
* rename
* fixed dtype
* format
* fixed CI
* fixed ci
* add missing ComputeType
* fixed cit
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: e921e1f08d ]
2023-10-03 20:04:26 -05:00
zjing14
85457bf0b4
changed test for grouped_gemm to be random ( #959 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 5311d1b325 ]
2023-10-03 09:32:58 -05:00
zjing14
c8e83bab45
Fixed contraction issues ( #960 )
...
* add missing ComputeType
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: aa46039f2d ]
2023-10-03 09:32:44 -05:00
zjing14
dd53f66e0a
add generic instances ( #947 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: f477fca436 ]
2023-10-03 09:32:28 -05:00
Rostyslav Geyyer
e80e4bedba
Add fp8 @ bf8 gemm support and example ( #933 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
[ROCm/composable_kernel commit: bd09b5c538 ]
2023-10-02 16:39:03 -05:00
Illia Silin
c0282c59cc
get rid of gfx900/906, set rocm5.7 as default ( #958 )
...
[ROCm/composable_kernel commit: 59dbb01fd1 ]
2023-10-02 12:01:11 -07:00
zjing14
1939f1820f
Contraction multi abd ( #957 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
* init commit for contraction_multi_ABD
* add examples
* add examples of multiA and broadcast
* update example
* fixed comments
* Update cmake-ck-dev.sh
* Update cmake-ck-dev.sh
* Add comments into the example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 9d58c42103 ]
2023-10-02 09:18:36 -05:00
Illia Silin
95ef7f577b
add gfx942 target to the daily ckprofiler package ( #955 )
...
[ROCm/composable_kernel commit: 6b5f647371 ]
2023-09-29 08:55:25 -07:00
Bartlomiej Wroblewski
ba712aee9a
Add support for mixed precision in contraction scale and bilinear ( #936 )
...
* Extract common functionality to separate files
* Reference contraction: Remove incorrect consts from type_converts
* Reference contraction: Add missing type_convert for dst value
* Reference contraction: Fix incorrect order of B matrix dimensions
* Add support for mixed precision in contraction scale and bilinear
* Move using statements from instances to a common file
* Move using statements from examples to a common file
* Fix the order of B matrix dimensions across examples and profiler
* Fix the computation of error threshold
* Make ComputeDataType an optional argument
* Include possible DataType -> ComputeDataType casting error in the threshold
* Remove commented code
[ROCm/composable_kernel commit: f07485060e ]
2023-09-29 10:54:31 -05:00
Bartłomiej Kocot
8784ee2ca5
Add grouped conv bwd data wmma ( #950 )
...
* Add grouped conv bwd data wmma
* Fix copyrights
* Add instances with smaller NPerBlock
* Update interface test
* Minor stylistic fixes
* Minor stylistic fixes
[ROCm/composable_kernel commit: cb53874002 ]
2023-09-28 23:10:18 +02:00
Bartłomiej Kocot
20cb9eb4bb
Add grouped convolution changes to changelog ( #952 )
...
* Add grouped convolution changes to changelog
* Fix 0.2.0 ck release rocm version
* Suggested CHANGELOG.md edits
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
---------
Co-authored-by: Lisa <lisajdelaney@gmail.com >
[ROCm/composable_kernel commit: 271ef645ac ]
2023-09-28 18:18:32 +02:00
Illia Silin
f79efda8eb
Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. ( #951 )
...
* Added error check after kernel launch (#919 )
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
* remove M=0 test cases for test_gemm_splitk
---------
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
[ROCm/composable_kernel commit: bc1108bb3e ]
2023-09-27 15:19:33 -07:00
Bartlomiej Wroblewski
b50a087d91
Handle type conversions to a const datatype ( #944 )
...
* Handle type conversions to a const datatype
* Review: Handle X being const data type as well
* Review: Remove typo
[ROCm/composable_kernel commit: f4af5aed8b ]
2023-09-27 15:02:42 -05:00
Bartłomiej Kocot
9bc92adde3
Add column to image kernel ( #930 )
...
* Add column to image kernel
* Minor fixes for dtypes and client examples
* Disable tests for disabled dtypes
* Disable add instances functions for disabled data types
* Minor stylistic fixes
* Revert "Disable add instances functions for disabled data types"
This reverts commit 728b869563 .
* Instances reduction
* Add comments in device_column_to_image_impl
* Update changelog and Copyrights
* Improve changelog
[ROCm/composable_kernel commit: e2243a4d1e ]
2023-09-27 17:19:06 +02:00
zjing14
e21f21022c
Add multiple A/B support ( #906 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 11676c7e49 ]
2023-09-26 21:16:23 -05:00
Illia Silin
832e7a6941
Use lower case for ckprofiler package. ( #948 )
...
* split ckProfiler gfx9 package into gfx90 and gfx94
* use lower case for package names
[ROCm/composable_kernel commit: 420b5a0382 ]
2023-09-26 17:43:09 -07:00
zjing14
6469878b49
Fixed Gemmv2r3 kpad ( #938 )
...
* added kpad support into v2r3
* add generic instances
* fixed comments
* fixed mnk padding
* Update device_batched_gemm_xdl.hpp
* fixed kpad
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 48ba6e8a69 ]
2023-09-26 18:40:00 -05:00
Rostyslav Geyyer
9f2256f867
Add fp8 gemm instances ( #920 )
...
* Add fp8 gemm instances
* Update instance naming
[ROCm/composable_kernel commit: 94bfa50256 ]
2023-09-26 14:59:33 -05:00
Illia Silin
f8ae1bc9b5
split ckProfiler gfx9 package into gfx90 and gfx94 ( #946 )
...
[ROCm/composable_kernel commit: 0b296a2722 ]
2023-09-26 11:22:31 -07:00
Illia Silin
8bcd830114
Resolve some data type issues and cmake policy. ( #940 )
...
* split the types in gemm_bilinear instances, add condition to cmake policy
* fix syntax
* split the data types in batchnorm examples
* fix the batchnorm_bwd test
* fix types in the batchnorm_bwd test
[ROCm/composable_kernel commit: 2ea75bd6d7 ]
2023-09-26 08:39:11 -07:00
Bartłomiej Kocot
a85bb57471
Add 3d grouped conv fwd wmma instances ( #935 )
...
* Add 3d grouped conv fwd wmma instances
* Refactor fwd conv tests
* Split wmma instances for each specialization
* Minor stylistic fixes
[ROCm/composable_kernel commit: c95538325b ]
2023-09-23 18:56:31 +02:00
Rostyslav Geyyer
c5a26b77ea
Update naming ( #937 )
...
[ROCm/composable_kernel commit: ede64ae9db ]
2023-09-22 10:08:45 -05:00
Illia Silin
43098d2a23
Refactoring cmake files to build data types separately. ( #932 )
...
* refactor cmake files for the tests
* refactor cmake files for examples
* fix cmake for gemm example
* fix the cmake file for all examples
* add splitting by data types in gemm_splitk instance header
* rename test to reflect only dl instances are used
* clean up CI workspace, update cmake for instances
* change the jenkinsfile syntax
* build all instances except DL on gfx11
* move workspace cleanup after stages
* clean up workspace after every stage
* isolate data types in grouped_conv_fwd header
* isolate dl instances for grouped_conv2d_fwd
* fix syntax
* fix cmake and batchnorm instances
* fix typo
* fix reduction instances
* fix grouped_conv headers
* fix syntax
* replace parsing logic for instances, replace bfp16 with bf16
* fix the client examples build
* clean up DTYPES from instances cmake files
* update the parsing logic in cmake files
* make an exception for reduction kernels
* update few remaining cmake files to handle DTYPES
* fix syntax
* fix cmake conflicts
* replace f8 with fp8 test name
* resolve conflicts for dpp instances
[ROCm/composable_kernel commit: bba085d2b5 ]
2023-09-20 22:15:56 -07:00
Illia Silin
2f7d38e8bf
fix the building of the amd-stg-open compiler ( #927 )
...
[ROCm/composable_kernel commit: 58817bf967 ]
2023-09-19 18:50:58 -07:00
Illia Silin
828f8e5554
update to rocm5.7 by default ( #925 )
...
* update to rocm5.7 by default
* fix jenkinsfile syntax
[ROCm/composable_kernel commit: 718065ebd2 ]
2023-09-19 09:35:45 -07:00
Illia Silin
c469df8124
fix the ckprofiler package build in a loop ( #926 )
...
[ROCm/composable_kernel commit: 5a4416c8a7 ]
2023-09-19 09:17:39 -07:00
Bartlomiej Wroblewski
f26463b810
Fix DL GEMM instances with too large vector size ( #901 )
...
* Fix vector lengths of DL GEMM instances with padding
* Add checks for correctness of vector lenghts in DL GEMM
[ROCm/composable_kernel commit: 63cd459248 ]
2023-09-18 14:08:23 +02:00
Rostyslav Geyyer
ecc69920c3
Add native conversions fp8<->fp32 ( #908 )
...
* Add native conversions
* Add bf8 conversions
[ROCm/composable_kernel commit: f17af2e9ed ]
2023-09-17 20:56:27 -05:00
Bartlomiej Kocot
775ae67af3
Stylistic improvements for grouped convolution code
...
Remove unnecessary ignoring
Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
[ROCm/composable_kernel commit: bc2d0583d3 ]
2023-09-15 20:03:47 +02:00
zjing14
a49f4ff995
Add fp16/fp8 support into Grouped gemm FixedNK ( #874 )
...
* move all arguments into device
* add b2c_tile_map
* add examples
* add SetDeviceKernelArgs
* dedicated fixed_nk solution
* init client api
* add grouped_gemm_bias example
* add a instance
* add instances
* formatting
* fixed cmake
* Update EnableCompilerWarnings.cmake
* Update cmake-ck-dev.sh
* clean; fixed comments
* fixed comment
* add instances for fp32 output
* add instances for fp32 output
* add fp32 out client example
* fixed CI
* init commit for kbatch
* add splitk gridwise
* format
* fixed
* clean deviceop
* clean code
* finish splitk
* fixed instances
* change m_loops to tile_loops
* add setkbatch
* clean code
* add splitK+bias
* add instances
* opt mk_nk instances
* clean examples
* fixed CI
* remove zero
* finished non-zero
* clean
* clean code
* optimized global_barrier
* fixed ci
* fixed CI
* instance and client
* removed AddBias
* format
* fixed CI
* fixed CI
* move 20_grouped_gemm to 21_grouped_gemm
* clean
* formatting
* clean
* clean
* fixed computeType
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: f9d0eddb90 ]
2023-09-14 21:04:10 -05:00
Illia Silin
5ee23dd30d
change the cmake update method ( #918 )
...
[ROCm/composable_kernel commit: 0d8efaa13d ]
2023-09-14 09:36:26 -07:00
Jun Liu
4b789f8934
[Cmake] Set cmake default build type Release and path to /opt/rocm ( #914 )
...
[ROCm/composable_kernel commit: 5fe687fa27 ]
2023-09-13 14:38:12 -07:00
Bartłomiej Kocot
a13a1cb6d3
Add grouped conv bwd weight dl instances and new layout ( #897 )
...
* Add grouped conv bwd weight dl instances and new layout
* Add M and N padding
* Remove todo comment
* Enable grouped conv fwd dl k,c=1 generic instance
* Comment fixes
[ROCm/composable_kernel commit: 475188ca2e ]
2023-09-13 10:14:31 -05:00
zjing14
8af8296c29
fixed fp8 issues ( #894 )
...
* fixed fp8 init; and reference gemm
* Update host_tensor_generator.hpp
* fixed convert
* fixed reference gemm
* fixed comments
* fixed comments
* fixed ci
* fixed computeType
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: a66d14edf2 ]
2023-09-12 22:17:56 -05:00
Illia Silin
5b9d3268d7
Add a switch to build DL kernels and build them with staging compiler. ( #907 )
...
* enable building DL kernels with the daily staging compiler
* move the DL_KERNELS flag to another function
[ROCm/composable_kernel commit: 74d32f0719 ]
2023-09-12 20:14:33 -05:00
Rostyslav Geyyer
2e227b8581
Refactor f8_t, add bf8_t ( #792 )
...
* Refactor f8_t to add bf8_t
* Add check_err impl for f8_t
* Update fp8 test
* Format
* Revert the fix
* Update vector_type implementation
* Add bf8 test
* Add bf8, use BitInt types
* Add bf8 conversion methods
* Update type_convert for fp8/bf8
* Add check_err fp8/bf8 support
* Add subnorm fp8 tests
* Add subnorm bf8 tests
* Fix conversion
* Add bf8 cmake bindings
* Add macros to enable build with disabled fp8/bf8
* Remove is_native method
* Update flag combination for mixed precision instances
* Add more flag checks
* Add another flag to a client example
* Add type traits, decouple f8/bf8 casting
* Clean up
* Decouple fp8 and bf8 flags
* Remove more redundant flags
* Remove leftover comments
[ROCm/composable_kernel commit: 62d4af7449 ]
2023-09-12 17:04:27 -05:00
Illia Silin
e885110f62
clean up the workspace after every stage ( #909 )
...
[ROCm/composable_kernel commit: 56c0279bbd ]
2023-09-12 08:57:12 -07:00
Bartlomiej Wroblewski
df01b7c45a
Add new instances and support for small cases in DPP8 GEMM ( #896 )
...
[ROCm/composable_kernel commit: 547dbcfbc2 ]
2023-09-12 10:05:23 -05:00
Sam Wu
a41a9fa3cd
Add codeowners for documentation ( #902 )
...
Co-authored-by: samjwu <samjwu@users.noreply.github.com >
[ROCm/composable_kernel commit: 85e2e1e2e2 ]
2023-09-11 11:01:36 -06:00
Bartlomiej Wroblewski
20d295b14b
Enable DPP8 GEMM on Navi3 ( #892 )
...
[ROCm/composable_kernel commit: 8f84a01237 ]
2023-09-08 11:14:57 -05:00
Haocong WANG
68d7430ec5
[Navi3x] Add fp16/int8 wmma conv forward instances ( #746 )
...
* fix wmma gemm int8; add grouped conv int8 example
* Add int8 gemm-bilinear instances
* compile sanity check unknown
* Sanity pass + clang-format
* add int8 conv profiler instances
* solve merge conflict
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 562b4cec48 ]
2023-09-07 21:59:26 -05:00
Bartlomiej Wroblewski
88bb9d5fac
Redesign the DPP8 GEMM kernel to use warp-wise component ( #863 )
...
* Redesign the DPP8 GEMM kernel to use warp-wise component
* Review: Improve error messages
* Review: Remove unnecessary empty lines
* Review: Fix M, N per thread names
* Review: Rename mfma_input_type to dpp_input_type
* Review: Fix tensor adaptor; remove unnecessary element
* Review: Remove calls to dpp_gemm's MakeCDescriptor
* Review: Add blockwise doc, change function names to include dimension names
* Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file
* Review: Add __restrict__ keywords
* Review: Use MatrixPadder for padding A, B, C matrices
* Review: Remove hardcoded datatypes
* Review: Change names from FloatX to XDataType
* Review: Introduce AK0 and BK0 instead of a single K0
* Review: Remove construction of dpp_datatypes object
* Review: Rename DppInstrRunner to DppLanegroupGemm
[ROCm/composable_kernel commit: 37a8c1f756 ]
2023-09-06 11:44:09 -05:00
zjing14
3446ff1e7d
added padding of K into gemm_v2r3 ( #887 )
...
* added kpad support into v2r3
* add generic instances
* fixed comments
* fixed mnk padding
* Update device_batched_gemm_xdl.hpp
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 3786bfe1cc ]
2023-09-06 10:15:52 -05:00
zjing14
88af65157c
Fixed fp8 gemm ( #882 )
...
* add generic instances; fixed initi with fp8
* fixed comment
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: a61b8b785e ]
2023-09-06 09:59:20 -05:00
Illia Silin
a9d5a61e90
set warnings as errors in doxygen ( #864 )
...
[ROCm/composable_kernel commit: aae4df5596 ]
2023-09-05 14:29:37 -07:00
Bartlomiej Wroblewski
423aab15f6
Add contribution guidelines to the documentation ( #843 )
...
Add contribution guidelines to the documentation
[ROCm/composable_kernel commit: 1e1f82d9b0 ]
2023-09-05 21:25:28 +02:00
Illia Silin
3b3f0f225e
fix syntax ( #890 )
...
[ROCm/composable_kernel commit: 7dcb14d9d4 ]
2023-09-05 11:29:44 -07:00