zjing14
570ff3ddbe
remove example 60 ( #963 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-05 09:41:01 -07:00
zjing14
04f93aadb8
Grouped conv bwd data with fp16 input and bf8fp8 comp ( #962 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
* Update naming (#937 )
* Add a client example
* Add computetypes to device and gridwise ops
* Add instances, update instance factory
* Format
* Fix a flag
* Add ckProfiler mode
* Fix typos
* Add an example
* Add bf8 generator
* add bf8 mfma; fixed type_convert for bf8
* move verfication ahead of timing
* Update reference calculation
* Fix reference
* Narrow down float init range
* Fix bf8 bf8 mfma
* Add bf8 @ fp8 mfma
* Update example
* Update instances
* Update profiler api
* Update for compatibility
* Format
* Remove extra example
* Clean up
* workaround convert
* added instance of f16_bf8f8, and client example
* fixed mfma selector
* format
---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-04 18:04:27 -05:00
Rostyslav Geyyer
42facfc6b7
Add conv bwd weight fp16 comp bf8 fp8 op, instances and example ( #945 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
* Update naming (#937 )
* Add a client example
* Add computetypes to device and gridwise ops
* Add instances, update instance factory
* Format
* Fix a flag
* Add ckProfiler mode
* Fix typos
* Add an example
* Add bf8 generator
* add bf8 mfma; fixed type_convert for bf8
* move verfication ahead of timing
* Update reference calculation
* Fix reference
* Narrow down float init range
* Fix bf8 bf8 mfma
* Add bf8 @ fp8 mfma
* Update example
* Update instances
* Update profiler api
* Update for compatibility
* Format
* Remove extra example
* Clean up
* workaround convert
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-04 08:19:08 -05:00
zjing14
e921e1f08d
3d grouped conv fwd with input/output fp16 and comp fp8 ( #931 )
...
* add f8 comp instance
* fixed
* fixed comments
* rename
* fixed dtype
* format
* fixed CI
* fixed ci
* add missing ComputeType
* fixed cit
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 20:04:26 -05:00
zjing14
5311d1b325
changed test for grouped_gemm to be random ( #959 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 09:32:58 -05:00
zjing14
aa46039f2d
Fixed contraction issues ( #960 )
...
* add missing ComputeType
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 09:32:44 -05:00
zjing14
f477fca436
add generic instances ( #947 )
...
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 09:32:28 -05:00
Rostyslav Geyyer
bd09b5c538
Add fp8 @ bf8 gemm support and example ( #933 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
2023-10-02 16:39:03 -05:00
Illia Silin
59dbb01fd1
get rid of gfx900/906, set rocm5.7 as default ( #958 )
2023-10-02 12:01:11 -07:00
zjing14
9d58c42103
Contraction multi abd ( #957 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
* init commit for contraction_multi_ABD
* add examples
* add examples of multiA and broadcast
* update example
* fixed comments
* Update cmake-ck-dev.sh
* Update cmake-ck-dev.sh
* Add comments into the example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-02 09:18:36 -05:00
Illia Silin
6b5f647371
add gfx942 target to the daily ckprofiler package ( #955 )
2023-09-29 08:55:25 -07:00
Bartlomiej Wroblewski
f07485060e
Add support for mixed precision in contraction scale and bilinear ( #936 )
...
* Extract common functionality to separate files
* Reference contraction: Remove incorrect consts from type_converts
* Reference contraction: Add missing type_convert for dst value
* Reference contraction: Fix incorrect order of B matrix dimensions
* Add support for mixed precision in contraction scale and bilinear
* Move using statements from instances to a common file
* Move using statements from examples to a common file
* Fix the order of B matrix dimensions across examples and profiler
* Fix the computation of error threshold
* Make ComputeDataType an optional argument
* Include possible DataType -> ComputeDataType casting error in the threshold
* Remove commented code
2023-09-29 10:54:31 -05:00
Bartłomiej Kocot
cb53874002
Add grouped conv bwd data wmma ( #950 )
...
* Add grouped conv bwd data wmma
* Fix copyrights
* Add instances with smaller NPerBlock
* Update interface test
* Minor stylistic fixes
* Minor stylistic fixes
2023-09-28 23:10:18 +02:00
Bartłomiej Kocot
271ef645ac
Add grouped convolution changes to changelog ( #952 )
...
* Add grouped convolution changes to changelog
* Fix 0.2.0 ck release rocm version
* Suggested CHANGELOG.md edits
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
---------
Co-authored-by: Lisa <lisajdelaney@gmail.com >
2023-09-28 18:18:32 +02:00
Illia Silin
bc1108bb3e
Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. ( #951 )
...
* Added error check after kernel launch (#919 )
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
* remove M=0 test cases for test_gemm_splitk
---------
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
2023-09-27 15:19:33 -07:00
Bartlomiej Wroblewski
f4af5aed8b
Handle type conversions to a const datatype ( #944 )
...
* Handle type conversions to a const datatype
* Review: Handle X being const data type as well
* Review: Remove typo
2023-09-27 15:02:42 -05:00
Bartłomiej Kocot
e2243a4d1e
Add column to image kernel ( #930 )
...
* Add column to image kernel
* Minor fixes for dtypes and client examples
* Disable tests for disabled dtypes
* Disable add instances functions for disabled data types
* Minor stylistic fixes
* Revert "Disable add instances functions for disabled data types"
This reverts commit 728b869563 .
* Instances reduction
* Add comments in device_column_to_image_impl
* Update changelog and Copyrights
* Improve changelog
2023-09-27 17:19:06 +02:00
zjing14
11676c7e49
Add multiple A/B support ( #906 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-09-26 21:16:23 -05:00
Illia Silin
420b5a0382
Use lower case for ckprofiler package. ( #948 )
...
* split ckProfiler gfx9 package into gfx90 and gfx94
* use lower case for package names
2023-09-26 17:43:09 -07:00
zjing14
48ba6e8a69
Fixed Gemmv2r3 kpad ( #938 )
...
* added kpad support into v2r3
* add generic instances
* fixed comments
* fixed mnk padding
* Update device_batched_gemm_xdl.hpp
* fixed kpad
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-09-26 18:40:00 -05:00
Rostyslav Geyyer
94bfa50256
Add fp8 gemm instances ( #920 )
...
* Add fp8 gemm instances
* Update instance naming
2023-09-26 14:59:33 -05:00
Illia Silin
0b296a2722
split ckProfiler gfx9 package into gfx90 and gfx94 ( #946 )
2023-09-26 11:22:31 -07:00
Illia Silin
2ea75bd6d7
Resolve some data type issues and cmake policy. ( #940 )
...
* split the types in gemm_bilinear instances, add condition to cmake policy
* fix syntax
* split the data types in batchnorm examples
* fix the batchnorm_bwd test
* fix types in the batchnorm_bwd test
2023-09-26 08:39:11 -07:00
Bartłomiej Kocot
c95538325b
Add 3d grouped conv fwd wmma instances ( #935 )
...
* Add 3d grouped conv fwd wmma instances
* Refactor fwd conv tests
* Split wmma instances for each specialization
* Minor stylistic fixes
2023-09-23 18:56:31 +02:00
Rostyslav Geyyer
ede64ae9db
Update naming ( #937 )
2023-09-22 10:08:45 -05:00
Illia Silin
bba085d2b5
Refactoring cmake files to build data types separately. ( #932 )
...
* refactor cmake files for the tests
* refactor cmake files for examples
* fix cmake for gemm example
* fix the cmake file for all examples
* add splitting by data types in gemm_splitk instance header
* rename test to reflect only dl instances are used
* clean up CI workspace, update cmake for instances
* change the jenkinsfile syntax
* build all instances except DL on gfx11
* move workspace cleanup after stages
* clean up workspace after every stage
* isolate data types in grouped_conv_fwd header
* isolate dl instances for grouped_conv2d_fwd
* fix syntax
* fix cmake and batchnorm instances
* fix typo
* fix reduction instances
* fix grouped_conv headers
* fix syntax
* replace parsing logic for instances, replace bfp16 with bf16
* fix the client examples build
* clean up DTYPES from instances cmake files
* update the parsing logic in cmake files
* make an exception for reduction kernels
* update few remaining cmake files to handle DTYPES
* fix syntax
* fix cmake conflicts
* replace f8 with fp8 test name
* resolve conflicts for dpp instances
2023-09-20 22:15:56 -07:00
Illia Silin
58817bf967
fix the building of the amd-stg-open compiler ( #927 )
2023-09-19 18:50:58 -07:00
Illia Silin
718065ebd2
update to rocm5.7 by default ( #925 )
...
* update to rocm5.7 by default
* fix jenkinsfile syntax
2023-09-19 09:35:45 -07:00
Illia Silin
5a4416c8a7
fix the ckprofiler package build in a loop ( #926 )
2023-09-19 09:17:39 -07:00
Bartlomiej Wroblewski
63cd459248
Fix DL GEMM instances with too large vector size ( #901 )
...
* Fix vector lengths of DL GEMM instances with padding
* Add checks for correctness of vector lenghts in DL GEMM
2023-09-18 14:08:23 +02:00
Rostyslav Geyyer
f17af2e9ed
Add native conversions fp8<->fp32 ( #908 )
...
* Add native conversions
* Add bf8 conversions
2023-09-17 20:56:27 -05:00
Bartlomiej Kocot
bc2d0583d3
Stylistic improvements for grouped convolution code
...
Remove unnecessary ignoring
Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
2023-09-15 20:03:47 +02:00
zjing14
f9d0eddb90
Add fp16/fp8 support into Grouped gemm FixedNK ( #874 )
...
* move all arguments into device
* add b2c_tile_map
* add examples
* add SetDeviceKernelArgs
* dedicated fixed_nk solution
* init client api
* add grouped_gemm_bias example
* add a instance
* add instances
* formatting
* fixed cmake
* Update EnableCompilerWarnings.cmake
* Update cmake-ck-dev.sh
* clean; fixed comments
* fixed comment
* add instances for fp32 output
* add instances for fp32 output
* add fp32 out client example
* fixed CI
* init commit for kbatch
* add splitk gridwise
* format
* fixed
* clean deviceop
* clean code
* finish splitk
* fixed instances
* change m_loops to tile_loops
* add setkbatch
* clean code
* add splitK+bias
* add instances
* opt mk_nk instances
* clean examples
* fixed CI
* remove zero
* finished non-zero
* clean
* clean code
* optimized global_barrier
* fixed ci
* fixed CI
* instance and client
* removed AddBias
* format
* fixed CI
* fixed CI
* move 20_grouped_gemm to 21_grouped_gemm
* clean
* formatting
* clean
* clean
* fixed computeType
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-09-14 21:04:10 -05:00
Illia Silin
0d8efaa13d
change the cmake update method ( #918 )
2023-09-14 09:36:26 -07:00
Jun Liu
5fe687fa27
[Cmake] Set cmake default build type Release and path to /opt/rocm ( #914 )
2023-09-13 14:38:12 -07:00
Bartłomiej Kocot
475188ca2e
Add grouped conv bwd weight dl instances and new layout ( #897 )
...
* Add grouped conv bwd weight dl instances and new layout
* Add M and N padding
* Remove todo comment
* Enable grouped conv fwd dl k,c=1 generic instance
* Comment fixes
2023-09-13 10:14:31 -05:00
zjing14
a66d14edf2
fixed fp8 issues ( #894 )
...
* fixed fp8 init; and reference gemm
* Update host_tensor_generator.hpp
* fixed convert
* fixed reference gemm
* fixed comments
* fixed comments
* fixed ci
* fixed computeType
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-09-12 22:17:56 -05:00
Illia Silin
74d32f0719
Add a switch to build DL kernels and build them with staging compiler. ( #907 )
...
* enable building DL kernels with the daily staging compiler
* move the DL_KERNELS flag to another function
2023-09-12 20:14:33 -05:00
Rostyslav Geyyer
62d4af7449
Refactor f8_t, add bf8_t ( #792 )
...
* Refactor f8_t to add bf8_t
* Add check_err impl for f8_t
* Update fp8 test
* Format
* Revert the fix
* Update vector_type implementation
* Add bf8 test
* Add bf8, use BitInt types
* Add bf8 conversion methods
* Update type_convert for fp8/bf8
* Add check_err fp8/bf8 support
* Add subnorm fp8 tests
* Add subnorm bf8 tests
* Fix conversion
* Add bf8 cmake bindings
* Add macros to enable build with disabled fp8/bf8
* Remove is_native method
* Update flag combination for mixed precision instances
* Add more flag checks
* Add another flag to a client example
* Add type traits, decouple f8/bf8 casting
* Clean up
* Decouple fp8 and bf8 flags
* Remove more redundant flags
* Remove leftover comments
2023-09-12 17:04:27 -05:00
Illia Silin
56c0279bbd
clean up the workspace after every stage ( #909 )
2023-09-12 08:57:12 -07:00
Bartlomiej Wroblewski
547dbcfbc2
Add new instances and support for small cases in DPP8 GEMM ( #896 )
2023-09-12 10:05:23 -05:00
Sam Wu
85e2e1e2e2
Add codeowners for documentation ( #902 )
...
Co-authored-by: samjwu <samjwu@users.noreply.github.com >
2023-09-11 11:01:36 -06:00
Bartlomiej Wroblewski
8f84a01237
Enable DPP8 GEMM on Navi3 ( #892 )
2023-09-08 11:14:57 -05:00
Haocong WANG
562b4cec48
[Navi3x] Add fp16/int8 wmma conv forward instances ( #746 )
...
* fix wmma gemm int8; add grouped conv int8 example
* Add int8 gemm-bilinear instances
* compile sanity check unknown
* Sanity pass + clang-format
* add int8 conv profiler instances
* solve merge conflict
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
2023-09-07 21:59:26 -05:00
Bartlomiej Wroblewski
37a8c1f756
Redesign the DPP8 GEMM kernel to use warp-wise component ( #863 )
...
* Redesign the DPP8 GEMM kernel to use warp-wise component
* Review: Improve error messages
* Review: Remove unnecessary empty lines
* Review: Fix M, N per thread names
* Review: Rename mfma_input_type to dpp_input_type
* Review: Fix tensor adaptor; remove unnecessary element
* Review: Remove calls to dpp_gemm's MakeCDescriptor
* Review: Add blockwise doc, change function names to include dimension names
* Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file
* Review: Add __restrict__ keywords
* Review: Use MatrixPadder for padding A, B, C matrices
* Review: Remove hardcoded datatypes
* Review: Change names from FloatX to XDataType
* Review: Introduce AK0 and BK0 instead of a single K0
* Review: Remove construction of dpp_datatypes object
* Review: Rename DppInstrRunner to DppLanegroupGemm
2023-09-06 11:44:09 -05:00
zjing14
3786bfe1cc
added padding of K into gemm_v2r3 ( #887 )
...
* added kpad support into v2r3
* add generic instances
* fixed comments
* fixed mnk padding
* Update device_batched_gemm_xdl.hpp
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-09-06 10:15:52 -05:00
zjing14
a61b8b785e
Fixed fp8 gemm ( #882 )
...
* add generic instances; fixed initi with fp8
* fixed comment
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-09-06 09:59:20 -05:00
Illia Silin
aae4df5596
set warnings as errors in doxygen ( #864 )
2023-09-05 14:29:37 -07:00
Bartlomiej Wroblewski
1e1f82d9b0
Add contribution guidelines to the documentation ( #843 )
...
Add contribution guidelines to the documentation
2023-09-05 21:25:28 +02:00
Illia Silin
7dcb14d9d4
fix syntax ( #890 )
2023-09-05 11:29:44 -07:00