zjing14
33859062bd
Fixed contraction issues ( #960 )
...
* add missing ComputeType
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: aa46039f2d ]
2023-10-03 09:32:44 -05:00
zjing14
498e886c85
add generic instances ( #947 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: f477fca436 ]
2023-10-03 09:32:28 -05:00
Rostyslav Geyyer
28a1199b62
Add fp8 @ bf8 gemm support and example ( #933 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
[ROCm/composable_kernel commit: bd09b5c538 ]
2023-10-02 16:39:03 -05:00
Illia Silin
6a8658812a
get rid of gfx900/906, set rocm5.7 as default ( #958 )
...
[ROCm/composable_kernel commit: 59dbb01fd1 ]
2023-10-02 12:01:11 -07:00
zjing14
50c12c6c43
Contraction multi abd ( #957 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
* init commit for contraction_multi_ABD
* add examples
* add examples of multiA and broadcast
* update example
* fixed comments
* Update cmake-ck-dev.sh
* Update cmake-ck-dev.sh
* Add comments into the example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 9d58c42103 ]
2023-10-02 09:18:36 -05:00
Illia Silin
24eac8e7b4
add gfx942 target to the daily ckprofiler package ( #955 )
...
[ROCm/composable_kernel commit: 6b5f647371 ]
2023-09-29 08:55:25 -07:00
Bartlomiej Wroblewski
ce003d6493
Add support for mixed precision in contraction scale and bilinear ( #936 )
...
* Extract common functionality to separate files
* Reference contraction: Remove incorrect consts from type_converts
* Reference contraction: Add missing type_convert for dst value
* Reference contraction: Fix incorrect order of B matrix dimensions
* Add support for mixed precision in contraction scale and bilinear
* Move using statements from instances to a common file
* Move using statements from examples to a common file
* Fix the order of B matrix dimensions across examples and profiler
* Fix the computation of error threshold
* Make ComputeDataType an optional argument
* Include possible DataType -> ComputeDataType casting error in the threshold
* Remove commented code
[ROCm/composable_kernel commit: f07485060e ]
2023-09-29 10:54:31 -05:00
Bartłomiej Kocot
612cbbdc54
Add grouped conv bwd data wmma ( #950 )
...
* Add grouped conv bwd data wmma
* Fix copyrights
* Add instances with smaller NPerBlock
* Update interface test
* Minor stylistic fixes
* Minor stylistic fixes
[ROCm/composable_kernel commit: cb53874002 ]
2023-09-28 23:10:18 +02:00
Bartłomiej Kocot
254844f582
Add grouped convolution changes to changelog ( #952 )
...
* Add grouped convolution changes to changelog
* Fix 0.2.0 ck release rocm version
* Suggested CHANGELOG.md edits
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
---------
Co-authored-by: Lisa <lisajdelaney@gmail.com >
[ROCm/composable_kernel commit: 271ef645ac ]
2023-09-28 18:18:32 +02:00
Illia Silin
96f752aba9
Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. ( #951 )
...
* Added error check after kernel launch (#919 )
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
* remove M=0 test cases for test_gemm_splitk
---------
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
[ROCm/composable_kernel commit: bc1108bb3e ]
2023-09-27 15:19:33 -07:00
Bartlomiej Wroblewski
bf38d27453
Handle type conversions to a const datatype ( #944 )
...
* Handle type conversions to a const datatype
* Review: Handle X being const data type as well
* Review: Remove typo
[ROCm/composable_kernel commit: f4af5aed8b ]
2023-09-27 15:02:42 -05:00
Bartłomiej Kocot
be5cb244c0
Add column to image kernel ( #930 )
...
* Add column to image kernel
* Minor fixes for dtypes and client examples
* Disable tests for disabled dtypes
* Disable add instances functions for disabled data types
* Minor stylistic fixes
* Revert "Disable add instances functions for disabled data types"
This reverts commit 728b869563 .
* Instances reduction
* Add comments in device_column_to_image_impl
* Update changelog and Copyrights
* Improve changelog
[ROCm/composable_kernel commit: e2243a4d1e ]
2023-09-27 17:19:06 +02:00
zjing14
fb513ac42b
Add multiple A/B support ( #906 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 11676c7e49 ]
2023-09-26 21:16:23 -05:00
Illia Silin
f9ce51a187
Use lower case for ckprofiler package. ( #948 )
...
* split ckProfiler gfx9 package into gfx90 and gfx94
* use lower case for package names
[ROCm/composable_kernel commit: 420b5a0382 ]
2023-09-26 17:43:09 -07:00
zjing14
3fe6761718
Fixed Gemmv2r3 kpad ( #938 )
...
* added kpad support into v2r3
* add generic instances
* fixed comments
* fixed mnk padding
* Update device_batched_gemm_xdl.hpp
* fixed kpad
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 48ba6e8a69 ]
2023-09-26 18:40:00 -05:00
Rostyslav Geyyer
b74d4f5fc6
Add fp8 gemm instances ( #920 )
...
* Add fp8 gemm instances
* Update instance naming
[ROCm/composable_kernel commit: 94bfa50256 ]
2023-09-26 14:59:33 -05:00
Illia Silin
37f4626e3e
split ckProfiler gfx9 package into gfx90 and gfx94 ( #946 )
...
[ROCm/composable_kernel commit: 0b296a2722 ]
2023-09-26 11:22:31 -07:00
Illia Silin
99024ff371
Resolve some data type issues and cmake policy. ( #940 )
...
* split the types in gemm_bilinear instances, add condition to cmake policy
* fix syntax
* split the data types in batchnorm examples
* fix the batchnorm_bwd test
* fix types in the batchnorm_bwd test
[ROCm/composable_kernel commit: 2ea75bd6d7 ]
2023-09-26 08:39:11 -07:00
Bartłomiej Kocot
e9ef4df3b2
Add 3d grouped conv fwd wmma instances ( #935 )
...
* Add 3d grouped conv fwd wmma instances
* Refactor fwd conv tests
* Split wmma instances for each specialization
* Minor stylistic fixes
[ROCm/composable_kernel commit: c95538325b ]
2023-09-23 18:56:31 +02:00
Rostyslav Geyyer
6fb3141b62
Update naming ( #937 )
...
[ROCm/composable_kernel commit: ede64ae9db ]
2023-09-22 10:08:45 -05:00
Illia Silin
3609ff10f7
Refactoring cmake files to build data types separately. ( #932 )
...
* refactor cmake files for the tests
* refactor cmake files for examples
* fix cmake for gemm example
* fix the cmake file for all examples
* add splitting by data types in gemm_splitk instance header
* rename test to reflect only dl instances are used
* clean up CI workspace, update cmake for instances
* change the jenkinsfile syntax
* build all instances except DL on gfx11
* move workspace cleanup after stages
* clean up workspace after every stage
* isolate data types in grouped_conv_fwd header
* isolate dl instances for grouped_conv2d_fwd
* fix syntax
* fix cmake and batchnorm instances
* fix typo
* fix reduction instances
* fix grouped_conv headers
* fix syntax
* replace parsing logic for instances, replace bfp16 with bf16
* fix the client examples build
* clean up DTYPES from instances cmake files
* update the parsing logic in cmake files
* make an exception for reduction kernels
* update few remaining cmake files to handle DTYPES
* fix syntax
* fix cmake conflicts
* replace f8 with fp8 test name
* resolve conflicts for dpp instances
[ROCm/composable_kernel commit: bba085d2b5 ]
2023-09-20 22:15:56 -07:00
Illia Silin
cc6ce6c2be
fix the building of the amd-stg-open compiler ( #927 )
...
[ROCm/composable_kernel commit: 58817bf967 ]
2023-09-19 18:50:58 -07:00
Illia Silin
8476969c2b
update to rocm5.7 by default ( #925 )
...
* update to rocm5.7 by default
* fix jenkinsfile syntax
[ROCm/composable_kernel commit: 718065ebd2 ]
2023-09-19 09:35:45 -07:00
Illia Silin
91cb870871
fix the ckprofiler package build in a loop ( #926 )
...
[ROCm/composable_kernel commit: 5a4416c8a7 ]
2023-09-19 09:17:39 -07:00
Bartlomiej Wroblewski
4497a8874f
Fix DL GEMM instances with too large vector size ( #901 )
...
* Fix vector lengths of DL GEMM instances with padding
* Add checks for correctness of vector lenghts in DL GEMM
[ROCm/composable_kernel commit: 63cd459248 ]
2023-09-18 14:08:23 +02:00
Rostyslav Geyyer
1a7a4a775e
Add native conversions fp8<->fp32 ( #908 )
...
* Add native conversions
* Add bf8 conversions
[ROCm/composable_kernel commit: f17af2e9ed ]
2023-09-17 20:56:27 -05:00
Bartlomiej Kocot
b287234d67
Stylistic improvements for grouped convolution code
...
Remove unnecessary ignoring
Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
[ROCm/composable_kernel commit: bc2d0583d3 ]
2023-09-15 20:03:47 +02:00
zjing14
2d384eaba7
Add fp16/fp8 support into Grouped gemm FixedNK ( #874 )
...
* move all arguments into device
* add b2c_tile_map
* add examples
* add SetDeviceKernelArgs
* dedicated fixed_nk solution
* init client api
* add grouped_gemm_bias example
* add a instance
* add instances
* formatting
* fixed cmake
* Update EnableCompilerWarnings.cmake
* Update cmake-ck-dev.sh
* clean; fixed comments
* fixed comment
* add instances for fp32 output
* add instances for fp32 output
* add fp32 out client example
* fixed CI
* init commit for kbatch
* add splitk gridwise
* format
* fixed
* clean deviceop
* clean code
* finish splitk
* fixed instances
* change m_loops to tile_loops
* add setkbatch
* clean code
* add splitK+bias
* add instances
* opt mk_nk instances
* clean examples
* fixed CI
* remove zero
* finished non-zero
* clean
* clean code
* optimized global_barrier
* fixed ci
* fixed CI
* instance and client
* removed AddBias
* format
* fixed CI
* fixed CI
* move 20_grouped_gemm to 21_grouped_gemm
* clean
* formatting
* clean
* clean
* fixed computeType
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: f9d0eddb90 ]
2023-09-14 21:04:10 -05:00
Illia Silin
3564d74b6a
change the cmake update method ( #918 )
...
[ROCm/composable_kernel commit: 0d8efaa13d ]
2023-09-14 09:36:26 -07:00
Jun Liu
f9e7629556
[Cmake] Set cmake default build type Release and path to /opt/rocm ( #914 )
...
[ROCm/composable_kernel commit: 5fe687fa27 ]
2023-09-13 14:38:12 -07:00
Bartłomiej Kocot
f4999cd99a
Add grouped conv bwd weight dl instances and new layout ( #897 )
...
* Add grouped conv bwd weight dl instances and new layout
* Add M and N padding
* Remove todo comment
* Enable grouped conv fwd dl k,c=1 generic instance
* Comment fixes
[ROCm/composable_kernel commit: 475188ca2e ]
2023-09-13 10:14:31 -05:00
zjing14
5bb25a9688
fixed fp8 issues ( #894 )
...
* fixed fp8 init; and reference gemm
* Update host_tensor_generator.hpp
* fixed convert
* fixed reference gemm
* fixed comments
* fixed comments
* fixed ci
* fixed computeType
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: a66d14edf2 ]
2023-09-12 22:17:56 -05:00
Illia Silin
f295fc0629
Add a switch to build DL kernels and build them with staging compiler. ( #907 )
...
* enable building DL kernels with the daily staging compiler
* move the DL_KERNELS flag to another function
[ROCm/composable_kernel commit: 74d32f0719 ]
2023-09-12 20:14:33 -05:00
Rostyslav Geyyer
0752117077
Refactor f8_t, add bf8_t ( #792 )
...
* Refactor f8_t to add bf8_t
* Add check_err impl for f8_t
* Update fp8 test
* Format
* Revert the fix
* Update vector_type implementation
* Add bf8 test
* Add bf8, use BitInt types
* Add bf8 conversion methods
* Update type_convert for fp8/bf8
* Add check_err fp8/bf8 support
* Add subnorm fp8 tests
* Add subnorm bf8 tests
* Fix conversion
* Add bf8 cmake bindings
* Add macros to enable build with disabled fp8/bf8
* Remove is_native method
* Update flag combination for mixed precision instances
* Add more flag checks
* Add another flag to a client example
* Add type traits, decouple f8/bf8 casting
* Clean up
* Decouple fp8 and bf8 flags
* Remove more redundant flags
* Remove leftover comments
[ROCm/composable_kernel commit: 62d4af7449 ]
2023-09-12 17:04:27 -05:00
Illia Silin
b026f6fcfd
clean up the workspace after every stage ( #909 )
...
[ROCm/composable_kernel commit: 56c0279bbd ]
2023-09-12 08:57:12 -07:00
Bartlomiej Wroblewski
b4064d1401
Add new instances and support for small cases in DPP8 GEMM ( #896 )
...
[ROCm/composable_kernel commit: 547dbcfbc2 ]
2023-09-12 10:05:23 -05:00
Sam Wu
1b7c79fe03
Add codeowners for documentation ( #902 )
...
Co-authored-by: samjwu <samjwu@users.noreply.github.com >
[ROCm/composable_kernel commit: 85e2e1e2e2 ]
2023-09-11 11:01:36 -06:00
Bartlomiej Wroblewski
bf5b711799
Enable DPP8 GEMM on Navi3 ( #892 )
...
[ROCm/composable_kernel commit: 8f84a01237 ]
2023-09-08 11:14:57 -05:00
Haocong WANG
c2866bb432
[Navi3x] Add fp16/int8 wmma conv forward instances ( #746 )
...
* fix wmma gemm int8; add grouped conv int8 example
* Add int8 gemm-bilinear instances
* compile sanity check unknown
* Sanity pass + clang-format
* add int8 conv profiler instances
* solve merge conflict
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
[ROCm/composable_kernel commit: 562b4cec48 ]
2023-09-07 21:59:26 -05:00
Bartlomiej Wroblewski
02f8f707e8
Redesign the DPP8 GEMM kernel to use warp-wise component ( #863 )
...
* Redesign the DPP8 GEMM kernel to use warp-wise component
* Review: Improve error messages
* Review: Remove unnecessary empty lines
* Review: Fix M, N per thread names
* Review: Rename mfma_input_type to dpp_input_type
* Review: Fix tensor adaptor; remove unnecessary element
* Review: Remove calls to dpp_gemm's MakeCDescriptor
* Review: Add blockwise doc, change function names to include dimension names
* Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file
* Review: Add __restrict__ keywords
* Review: Use MatrixPadder for padding A, B, C matrices
* Review: Remove hardcoded datatypes
* Review: Change names from FloatX to XDataType
* Review: Introduce AK0 and BK0 instead of a single K0
* Review: Remove construction of dpp_datatypes object
* Review: Rename DppInstrRunner to DppLanegroupGemm
[ROCm/composable_kernel commit: 37a8c1f756 ]
2023-09-06 11:44:09 -05:00
zjing14
29daafc158
added padding of K into gemm_v2r3 ( #887 )
...
* added kpad support into v2r3
* add generic instances
* fixed comments
* fixed mnk padding
* Update device_batched_gemm_xdl.hpp
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 3786bfe1cc ]
2023-09-06 10:15:52 -05:00
zjing14
762d558a06
Fixed fp8 gemm ( #882 )
...
* add generic instances; fixed initi with fp8
* fixed comment
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: a61b8b785e ]
2023-09-06 09:59:20 -05:00
Illia Silin
b79bf27996
set warnings as errors in doxygen ( #864 )
...
[ROCm/composable_kernel commit: aae4df5596 ]
2023-09-05 14:29:37 -07:00
Bartlomiej Wroblewski
d529e6de33
Add contribution guidelines to the documentation ( #843 )
...
Add contribution guidelines to the documentation
[ROCm/composable_kernel commit: 1e1f82d9b0 ]
2023-09-05 21:25:28 +02:00
Illia Silin
8860638f7c
fix syntax ( #890 )
...
[ROCm/composable_kernel commit: 7dcb14d9d4 ]
2023-09-05 11:29:44 -07:00
Bartłomiej Kocot
d79b1c5dd0
Add image to column kernel ( #867 )
...
* Add image to column kernel
* Add instances, tests, profiler, example
* Add client example
* Several fixes of image to column
* Fix variable name in device_image_to_column_impl
* Several fixes of image to column profiler
* Fix num_btype calculation
* Make new mesaurements for correct bytes calculation
[ROCm/composable_kernel commit: 0077eeb3be ]
2023-09-05 10:11:40 -05:00
Bartłomiej Kocot
562be55437
Add nhwgc dl generic instances for grouped conv fwd ( #879 )
...
[ROCm/composable_kernel commit: 0c9a1d25b3 ]
2023-09-05 10:07:56 -05:00
Bartłomiej Kocot
748899987a
Fix K padding calculation for grouped conv data ( #876 )
...
* Fix K padding calculation for grouped conv data
* Restore previous padd for 1x1 specialization
[ROCm/composable_kernel commit: c981f6d033 ]
2023-09-05 10:07:41 -05:00
Lauren Wrubleski
766b5dc9d7
Fix config header installation ( #880 )
...
[ROCm/composable_kernel commit: bd8024b84a ]
2023-09-04 09:49:40 -07:00
zjing14
c79ecbccfb
Grouped Gemm with Fixed K and N with SplitK ( #818 )
...
* move all arguments into device
* add b2c_tile_map
* add examples
* add SetDeviceKernelArgs
* dedicated fixed_nk solution
* init client api
* add grouped_gemm_bias example
* add a instance
* add instances
* formatting
* fixed cmake
* Update EnableCompilerWarnings.cmake
* Update cmake-ck-dev.sh
* clean; fixed comments
* fixed comment
* add instances for fp32 output
* add instances for fp32 output
* add fp32 out client example
* fixed CI
* init commit for kbatch
* add splitk gridwise
* format
* fixed
* clean deviceop
* clean code
* finish splitk
* fixed instances
* change m_loops to tile_loops
* add setkbatch
* clean code
* add splitK+bias
* add instances
* opt mk_nk instances
* clean examples
* fixed CI
* remove zero
* finished non-zero
* clean
* clean code
* optimized global_barrier
* fixed ci
* fixed CI
* removed AddBias
* format
* fixed CI
* fixed CI
* move 20_grouped_gemm to 21_grouped_gemm
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: f5ec04f091 ]
2023-08-31 09:22:12 -05:00