zjing14
c79ecbccfb
Grouped Gemm with Fixed K and N with SplitK ( #818 )
...
* move all arguments into device
* add b2c_tile_map
* add examples
* add SetDeviceKernelArgs
* dedicated fixed_nk solution
* init client api
* add grouped_gemm_bias example
* add a instance
* add instances
* formatting
* fixed cmake
* Update EnableCompilerWarnings.cmake
* Update cmake-ck-dev.sh
* clean; fixed comments
* fixed comment
* add instances for fp32 output
* add instances for fp32 output
* add fp32 out client example
* fixed CI
* init commit for kbatch
* add splitk gridwise
* format
* fixed
* clean deviceop
* clean code
* finish splitk
* fixed instances
* change m_loops to tile_loops
* add setkbatch
* clean code
* add splitK+bias
* add instances
* opt mk_nk instances
* clean examples
* fixed CI
* remove zero
* finished non-zero
* clean
* clean code
* optimized global_barrier
* fixed ci
* fixed CI
* removed AddBias
* format
* fixed CI
* fixed CI
* move 20_grouped_gemm to 21_grouped_gemm
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: f5ec04f091 ]
2023-08-31 09:22:12 -05:00
Illia Silin
65eccfd426
do not build gfx941/942 targets during daily QA runs ( #758 )
...
[ROCm/composable_kernel commit: d140bdc9fa ]
2023-06-16 12:13:16 -07:00
Illia Silin
48347d8653
Enable gfx941 and gfx942 architectures. ( #752 )
...
* enable gfx941/942 targets
* fix clang format
* fix the cmake logic for multiple targets
* fix cmake syntax for looping over targets
* add gfx941/942 support for gemm_xdl instances
[ROCm/composable_kernel commit: 027e46ee82 ]
2023-06-15 08:20:59 -07:00
Illia Silin
dda83a196e
Syncing up from internal repo to enable MI300. ( #690 )
...
* enable gfx940
* switch between intrinsic mfma routines on mi100/200 and mi300
* fix mfma_int8 on MI300
* disable 2 int8 examples on MI300
* Update cmake-ck-dev.sh
* restore gitignore file
* modify Jenkinsfile to the internal repo
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
[ROCm/composable_kernel commit: 4feebedd41 ]
2023-04-28 18:22:59 -05:00
Haocong WANG
ec634a3d32
Add CMake Option "USE_OPT_NAVI3X" ( #647 )
...
* Add CMake Option "USE_OPT_NAVI3X"
* remove navi3x opt compile option from cmake script
[ROCm/composable_kernel commit: 4e097ad283 ]
2023-03-29 14:07:33 -05:00
Rostyslav Geyyer
81187d3553
Update cmake-ck-dev.sh script ( #641 )
...
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
[ROCm/composable_kernel commit: fa998675fc ]
2023-03-15 18:38:11 -05:00
zjing14
84a4731c15
disable tensor contraction f64 on MI100 ( #602 )
...
[ROCm/composable_kernel commit: 209baee299 ]
2023-02-23 16:59:37 -08:00
zjing14
af49d3cc89
Add contraction_fp64 example ( #570 )
...
* add contraction_bilinear
* add contraction_scale_xdl_fp64
* reduce tile size to avoid register spill
---------
Co-authored-by: root <root@ctr-ubbsmc16.amd.com >
[ROCm/composable_kernel commit: 24c9ee1d22 ]
2023-02-15 12:00:58 -06:00
rocking5566
9052e8501c
Conv perlayer int8 quantization ( #471 )
...
* Add conv2d requant example
* Fix bash error
* Rename example
* 1. Rename gemm quantization
2. shares the requantization lambda function with conv
* Refine declare type
* Add conv bias relu quantization exmaple
* clang format
* Fix compile error due to merge develop
* Fix CI error
* Extract quantization post operation into another file
* Support quantization for non piecewise linear function
* Add instance for conv quantization
* Add convolution quantization factory
* Add convolution quantization client example
* Add more instances with different template parameters
* clang format
* Sync the naming with the develop
[ROCm/composable_kernel commit: 226bc02b73 ]
2022-11-02 13:56:26 -06:00
Chao Liu
34f18d8e24
update document: Readme, contributors, citation, ( #463 )
...
* update cmake script
* update readme
* Update README.md
* add citation
* add images
* Update README.md
* update
* Update README.md
* Update CONTRIBUTORS.md
* Update README.md
* Update CITATION.cff
* Update README.md
* Update CITATION.cff
[ROCm/composable_kernel commit: 473ba5bc4a ]
2022-10-03 00:48:24 -05:00