Illia Silin
52b0bffec0
Support fp64 contraction on gfx94x. ( #1029 )
...
* enable contraction fp64 on gfx94*
* fix the logic
rocm-6.0.2
rocm-6.0.0
2023-11-08 15:03:57 -08:00
Po Yen Chen
ebcfdb3b40
Disable the SLP vectorizer to prevent unnecessary wait ( #1008 )
...
* Disable the SLP vectorizer to prevent unnecessary wait
* Add comment to the reason of adding flag
* Fix wording
2023-11-07 22:04:49 -08:00
Po Yen Chen
dcb013fcf2
Avoid force setting ENABLE_PIPELINE_V2_OPT to OFF ( #961 )
...
* Avoid force setting ENABLE_PIPELINE_V2_OPT to OFF
* Remove compilation option variable MAX_ILP_OPTS
2023-11-07 22:04:37 -08:00
Jun Liu
5032041365
Merge branch 'amd-develop' into amd-master
2023-10-11 12:26:02 -07:00
Jun Liu
91b414cdac
Merge commit 'ac9595a9f118a023e248eaffcfa5c324f36fd081' into amd-develop
2023-10-11 12:24:51 -07:00
zjing14
ac9595a9f1
Fixed f8_gemm NaN ( #975 )
...
* workaround nan problem by changing output to fp16
* enable f8/bf8 gemm tests on MI200
* workaround f16 to f8 conversion
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-10 10:30:26 -05:00
Jun Liu
0b70e1cd3c
Merge branch 'amd-develop' into amd-master
2023-10-05 15:46:50 -07:00
Jun Liu
082cf64310
Merge branch 'develop' into amd-develop
2023-10-05 15:46:27 -07:00
Lauren Wrubleski
5913609168
Replace CMake return from later CMake ( #970 )
2023-10-05 14:58:58 -07:00
Illia Silin
4daedf8ca5
Revert "Add support for mixed precision in contraction scale and bilinear" ( #967 )
...
* Revert "Add support for mixed precision in contraction scale and bilinear (#936 )"
This reverts commit f07485060e .
* revert commits #957 and #960
2023-10-05 14:58:23 -07:00
zjing14
570ff3ddbe
remove example 60 ( #963 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-05 09:41:01 -07:00
zjing14
04f93aadb8
Grouped conv bwd data with fp16 input and bf8fp8 comp ( #962 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
* Update naming (#937 )
* Add a client example
* Add computetypes to device and gridwise ops
* Add instances, update instance factory
* Format
* Fix a flag
* Add ckProfiler mode
* Fix typos
* Add an example
* Add bf8 generator
* add bf8 mfma; fixed type_convert for bf8
* move verfication ahead of timing
* Update reference calculation
* Fix reference
* Narrow down float init range
* Fix bf8 bf8 mfma
* Add bf8 @ fp8 mfma
* Update example
* Update instances
* Update profiler api
* Update for compatibility
* Format
* Remove extra example
* Clean up
* workaround convert
* added instance of f16_bf8f8, and client example
* fixed mfma selector
* format
---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-04 18:04:27 -05:00
Rostyslav Geyyer
42facfc6b7
Add conv bwd weight fp16 comp bf8 fp8 op, instances and example ( #945 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
* Update naming (#937 )
* Add a client example
* Add computetypes to device and gridwise ops
* Add instances, update instance factory
* Format
* Fix a flag
* Add ckProfiler mode
* Fix typos
* Add an example
* Add bf8 generator
* add bf8 mfma; fixed type_convert for bf8
* move verfication ahead of timing
* Update reference calculation
* Fix reference
* Narrow down float init range
* Fix bf8 bf8 mfma
* Add bf8 @ fp8 mfma
* Update example
* Update instances
* Update profiler api
* Update for compatibility
* Format
* Remove extra example
* Clean up
* workaround convert
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-04 08:19:08 -05:00
zjing14
e921e1f08d
3d grouped conv fwd with input/output fp16 and comp fp8 ( #931 )
...
* add f8 comp instance
* fixed
* fixed comments
* rename
* fixed dtype
* format
* fixed CI
* fixed ci
* add missing ComputeType
* fixed cit
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 20:04:26 -05:00
zjing14
5311d1b325
changed test for grouped_gemm to be random ( #959 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 09:32:58 -05:00
zjing14
aa46039f2d
Fixed contraction issues ( #960 )
...
* add missing ComputeType
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 09:32:44 -05:00
zjing14
f477fca436
add generic instances ( #947 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 09:32:28 -05:00
Jun Liu
7b7a3978b5
Merge branch 'amd-develop' into amd-master
2023-10-02 17:09:58 -07:00
Jun Liu
7e8230daa3
Merge branch 'develop' into amd-develop
2023-10-02 17:08:42 -07:00
Rostyslav Geyyer
bd09b5c538
Add fp8 @ bf8 gemm support and example ( #933 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
2023-10-02 16:39:03 -05:00
Illia Silin
59dbb01fd1
get rid of gfx900/906, set rocm5.7 as default ( #958 )
2023-10-02 12:01:11 -07:00
zjing14
9d58c42103
Contraction multi abd ( #957 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
* init commit for contraction_multi_ABD
* add examples
* add examples of multiA and broadcast
* update example
* fixed comments
* Update cmake-ck-dev.sh
* Update cmake-ck-dev.sh
* Add comments into the example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-02 09:18:36 -05:00
Illia Silin
6b5f647371
add gfx942 target to the daily ckprofiler package ( #955 )
2023-09-29 08:55:25 -07:00
Bartlomiej Wroblewski
f07485060e
Add support for mixed precision in contraction scale and bilinear ( #936 )
...
* Extract common functionality to separate files
* Reference contraction: Remove incorrect consts from type_converts
* Reference contraction: Add missing type_convert for dst value
* Reference contraction: Fix incorrect order of B matrix dimensions
* Add support for mixed precision in contraction scale and bilinear
* Move using statements from instances to a common file
* Move using statements from examples to a common file
* Fix the order of B matrix dimensions across examples and profiler
* Fix the computation of error threshold
* Make ComputeDataType an optional argument
* Include possible DataType -> ComputeDataType casting error in the threshold
* Remove commented code
2023-09-29 10:54:31 -05:00
Bartłomiej Kocot
cb53874002
Add grouped conv bwd data wmma ( #950 )
...
* Add grouped conv bwd data wmma
* Fix copyrights
* Add instances with smaller NPerBlock
* Update interface test
* Minor stylistic fixes
* Minor stylistic fixes
2023-09-28 23:10:18 +02:00
Bartłomiej Kocot
271ef645ac
Add grouped convolution changes to changelog ( #952 )
...
* Add grouped convolution changes to changelog
* Fix 0.2.0 ck release rocm version
* Suggested CHANGELOG.md edits
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
---------
Co-authored-by: Lisa <lisajdelaney@gmail.com >
2023-09-28 18:18:32 +02:00
Jun Liu
b24d93a127
Merge branch 'amd-develop' into amd-master
2023-09-28 07:52:34 -07:00
Jun Liu
56c7203541
Merge branch 'develop' into amd-develop
2023-09-28 07:52:02 -07:00
Illia Silin
bc1108bb3e
Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. ( #951 )
...
* Added error check after kernel launch (#919 )
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
* remove M=0 test cases for test_gemm_splitk
---------
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
2023-09-27 15:19:33 -07:00
Bartlomiej Wroblewski
f4af5aed8b
Handle type conversions to a const datatype ( #944 )
...
* Handle type conversions to a const datatype
* Review: Handle X being const data type as well
* Review: Remove typo
2023-09-27 15:02:42 -05:00
Bartłomiej Kocot
e2243a4d1e
Add column to image kernel ( #930 )
...
* Add column to image kernel
* Minor fixes for dtypes and client examples
* Disable tests for disabled dtypes
* Disable add instances functions for disabled data types
* Minor stylistic fixes
* Revert "Disable add instances functions for disabled data types"
This reverts commit 728b869563 .
* Instances reduction
* Add comments in device_column_to_image_impl
* Update changelog and Copyrights
* Improve changelog
2023-09-27 17:19:06 +02:00
zjing14
11676c7e49
Add multiple A/B support ( #906 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-09-26 21:16:23 -05:00
Illia Silin
420b5a0382
Use lower case for ckprofiler package. ( #948 )
...
* split ckProfiler gfx9 package into gfx90 and gfx94
* use lower case for package names
2023-09-26 17:43:09 -07:00
zjing14
48ba6e8a69
Fixed Gemmv2r3 kpad ( #938 )
...
* added kpad support into v2r3
* add generic instances
* fixed comments
* fixed mnk padding
* Update device_batched_gemm_xdl.hpp
* fixed kpad
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-09-26 18:40:00 -05:00
Rostyslav Geyyer
94bfa50256
Add fp8 gemm instances ( #920 )
...
* Add fp8 gemm instances
* Update instance naming
2023-09-26 14:59:33 -05:00
Jun Liu
742dd3aa32
Merge branch 'amd-develop' into amd-master
2023-09-26 12:00:18 -07:00
Jun Liu
1f02eaef56
Merge branch 'develop' into amd-develop
2023-09-26 11:59:54 -07:00
Illia Silin
0b296a2722
split ckProfiler gfx9 package into gfx90 and gfx94 ( #946 )
2023-09-26 11:22:31 -07:00
Illia Silin
2ea75bd6d7
Resolve some data type issues and cmake policy. ( #940 )
...
* split the types in gemm_bilinear instances, add condition to cmake policy
* fix syntax
* split the data types in batchnorm examples
* fix the batchnorm_bwd test
* fix types in the batchnorm_bwd test
2023-09-26 08:39:11 -07:00
Jun Liu
c9013009a0
Merge branch 'amd-develop' into amd-master
2023-09-25 14:32:03 -07:00
Jun Liu
84dcf5d043
Merge branch 'develop' into amd-develop
2023-09-23 18:10:33 -07:00
Bartłomiej Kocot
c95538325b
Add 3d grouped conv fwd wmma instances ( #935 )
...
* Add 3d grouped conv fwd wmma instances
* Refactor fwd conv tests
* Split wmma instances for each specialization
* Minor stylistic fixes
2023-09-23 18:56:31 +02:00
Rostyslav Geyyer
ede64ae9db
Update naming ( #937 )
2023-09-22 10:08:45 -05:00
Illia Silin
bba085d2b5
Refactoring cmake files to build data types separately. ( #932 )
...
* refactor cmake files for the tests
* refactor cmake files for examples
* fix cmake for gemm example
* fix the cmake file for all examples
* add splitting by data types in gemm_splitk instance header
* rename test to reflect only dl instances are used
* clean up CI workspace, update cmake for instances
* change the jenkinsfile syntax
* build all instances except DL on gfx11
* move workspace cleanup after stages
* clean up workspace after every stage
* isolate data types in grouped_conv_fwd header
* isolate dl instances for grouped_conv2d_fwd
* fix syntax
* fix cmake and batchnorm instances
* fix typo
* fix reduction instances
* fix grouped_conv headers
* fix syntax
* replace parsing logic for instances, replace bfp16 with bf16
* fix the client examples build
* clean up DTYPES from instances cmake files
* update the parsing logic in cmake files
* make an exception for reduction kernels
* update few remaining cmake files to handle DTYPES
* fix syntax
* fix cmake conflicts
* replace f8 with fp8 test name
* resolve conflicts for dpp instances
2023-09-20 22:15:56 -07:00
Illia Silin
58817bf967
fix the building of the amd-stg-open compiler ( #927 )
2023-09-19 18:50:58 -07:00
Illia Silin
718065ebd2
update to rocm5.7 by default ( #925 )
...
* update to rocm5.7 by default
* fix jenkinsfile syntax
2023-09-19 09:35:45 -07:00
Illia Silin
5a4416c8a7
fix the ckprofiler package build in a loop ( #926 )
2023-09-19 09:17:39 -07:00
Bartlomiej Wroblewski
63cd459248
Fix DL GEMM instances with too large vector size ( #901 )
...
* Fix vector lengths of DL GEMM instances with padding
* Add checks for correctness of vector lenghts in DL GEMM
2023-09-18 14:08:23 +02:00
Rostyslav Geyyer
f17af2e9ed
Add native conversions fp8<->fp32 ( #908 )
...
* Add native conversions
* Add bf8 conversions
2023-09-17 20:56:27 -05:00
Bartlomiej Kocot
bc2d0583d3
Stylistic improvements for grouped convolution code
...
Remove unnecessary ignoring
Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
2023-09-15 20:03:47 +02:00