zjing14
f5ec04f091
Grouped Gemm with Fixed K and N with SplitK ( #818 )
...
* move all arguments into device
* add b2c_tile_map
* add examples
* add SetDeviceKernelArgs
* dedicated fixed_nk solution
* init client api
* add grouped_gemm_bias example
* add a instance
* add instances
* formatting
* fixed cmake
* Update EnableCompilerWarnings.cmake
* Update cmake-ck-dev.sh
* clean; fixed comments
* fixed comment
* add instances for fp32 output
* add instances for fp32 output
* add fp32 out client example
* fixed CI
* init commit for kbatch
* add splitk gridwise
* format
* fixed
* clean deviceop
* clean code
* finish splitk
* fixed instances
* change m_loops to tile_loops
* add setkbatch
* clean code
* add splitK+bias
* add instances
* opt mk_nk instances
* clean examples
* fixed CI
* remove zero
* finished non-zero
* clean
* clean code
* optimized global_barrier
* fixed ci
* fixed CI
* removed AddBias
* format
* fixed CI
* fixed CI
* move 20_grouped_gemm to 21_grouped_gemm
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-08-31 09:22:12 -05:00
Illia Silin
d140bdc9fa
do not build gfx941/942 targets during daily QA runs ( #758 )
2023-06-16 12:13:16 -07:00
Illia Silin
027e46ee82
Enable gfx941 and gfx942 architectures. ( #752 )
...
* enable gfx941/942 targets
* fix clang format
* fix the cmake logic for multiple targets
* fix cmake syntax for looping over targets
* add gfx941/942 support for gemm_xdl instances
2023-06-15 08:20:59 -07:00
Illia Silin
4feebedd41
Syncing up from internal repo to enable MI300. ( #690 )
...
* enable gfx940
* switch between intrinsic mfma routines on mi100/200 and mi300
* fix mfma_int8 on MI300
* disable 2 int8 examples on MI300
* Update cmake-ck-dev.sh
* restore gitignore file
* modify Jenkinsfile to the internal repo
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
2023-04-28 18:22:59 -05:00
Haocong WANG
4e097ad283
Add CMake Option "USE_OPT_NAVI3X" ( #647 )
...
* Add CMake Option "USE_OPT_NAVI3X"
* remove navi3x opt compile option from cmake script
2023-03-29 14:07:33 -05:00
Rostyslav Geyyer
fa998675fc
Update cmake-ck-dev.sh script ( #641 )
...
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
2023-03-15 18:38:11 -05:00
zjing14
209baee299
disable tensor contraction f64 on MI100 ( #602 )
2023-02-23 16:59:37 -08:00
zjing14
24c9ee1d22
Add contraction_fp64 example ( #570 )
...
* add contraction_bilinear
* add contraction_scale_xdl_fp64
* reduce tile size to avoid register spill
---------
Co-authored-by: root <root@ctr-ubbsmc16.amd.com >
2023-02-15 12:00:58 -06:00
rocking5566
226bc02b73
Conv perlayer int8 quantization ( #471 )
...
* Add conv2d requant example
* Fix bash error
* Rename example
* 1. Rename gemm quantization
2. shares the requantization lambda function with conv
* Refine declare type
* Add conv bias relu quantization exmaple
* clang format
* Fix compile error due to merge develop
* Fix CI error
* Extract quantization post operation into another file
* Support quantization for non piecewise linear function
* Add instance for conv quantization
* Add convolution quantization factory
* Add convolution quantization client example
* Add more instances with different template parameters
* clang format
* Sync the naming with the develop
2022-11-02 13:56:26 -06:00
Chao Liu
473ba5bc4a
update document: Readme, contributors, citation, ( #463 )
...
* update cmake script
* update readme
* Update README.md
* add citation
* add images
* Update README.md
* update
* Update README.md
* Update CONTRIBUTORS.md
* Update README.md
* Update CITATION.cff
* Update README.md
* Update CITATION.cff
2022-10-03 00:48:24 -05:00