Aviral Goel
d85f065b15
chore(copyright): update copyright header for example directory ( #3273 )
...
* chore(copyright): update copyright header for codegen directory
* chore(copyright): update copyright header for example directory
2025-11-24 18:02:41 -08:00
Muhammed Emin Ozturk
6fad1c4874
Stream-K Reduction option as Runtime parameter and Compilation Error Fix (SK- Reduction) ( #2145 )
...
* reduction is passed as runtime parameter
* clang
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_streamk_v3.hpp
Co-authored-by: John Afaganis <john.afaganis@amd.com >
* Update include/ck/tensor_operation/gpu/grid/block_to_ctile_map.hpp
* remove comment
---------
2025-06-11 10:59:44 -07:00
Aviral Goel
11f6c14e03
Add 0 as an acceptable arguement for strides in CK GEMM example (Issue 2037) ( #2268 )
...
* add 0 as valid default arguement for strides
* add 0 as valid default arguement for strides
# Conflicts:
# example/01_gemm/common.hpp
2025-06-03 07:26:58 -07:00
Muhammed Emin Ozturk
9e95d54cd2
BF16 GEMM Stream-K ( #1541 )
...
* initial
* Cmake file
* successfull compilation but validation failed
* Cmake
* update
* gpu validation
* gemm universal
* gemm universal sk update
* sk bf16 universal instance
* gemm_universal_streamk.hpp
* only build for gfx94
* Cmakelist
* profiler update, bf16 sk only works at gfx42
* clang
* clang
* clang all
* no need flags
* cmake script
* delete comment
* gemm universal sk fix
* clang
* profiler fix
* clang
* update
* update
* delete comment
* code formatting
* cmake
* fix instance
* clang
* argument supported
* argument supported and clang
* update
* fix
* removing unnecessary comments
* clang formatting
* Update library/src/tensor_operation_instance/gpu/CMakeLists.txt
Co-authored-by: afagaj <john.afaganis@gmail.com >
* CopyRight Comment 2025
* clang reformatting
* copy right 2025
---------
Co-authored-by: Emin Ozturk <ozturk.27@osu.edu >
Co-authored-by: root <root@ctr-ubbsmc16.amd.com >
Co-authored-by: Muhammed Emin Ozturk <meozturk@t004-008.hpcfund >
Co-authored-by: root <root@splinter-126-wr-d3.amd.com >
Co-authored-by: Muhammed Emin Ozturk <meozturk@t006-001.hpcfund >
Co-authored-by: Muhammed Emin Ozturk <meozturk@login1.hpcfund >
Co-authored-by: Muhammed Emin Ozturk <meozturk@t004-004.hpcfund >
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu >
Co-authored-by: Muhammed Emin Ozturk <meozturk@t008-001.hpcfund >
Co-authored-by: afagaj <john.afaganis@gmail.com >
2025-01-02 10:30:04 -08:00
Adam Osewski
1d8e4ec2ce
Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM ( #1762 )
...
* add a prototype of int4
* clean
* debug
* clean
* clean
* move packed into dynamic_buffer
* fixed coord reset
* add fast pki4 to half conversion
* fix
* fixed reference and host_tensor
* fixed tensor init
* format
* debug i4_to_f16_convert
* format
* fixed splitk
* weight permute
* add b tile permute
* clean
* weight permute with splitki
* format
* improve weight layout
* add and_or_b32
* fixed splitk crush
* add permute switch as a template
* recover v3r1
* clean
* failure with intrawave v2
* fixed
* fixed
* add ckProfiler
* add bfp16 support
* add bf16 example
* fixed int4 to bhalf_t conversion
* format
* fixed int4 to bf16 conversion
* clean
* add instances for mem
* clean
* fixed host tensor size
* fixed
* debug
* fixed
* add pk_i4_t as a struct
* fix
* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* revert
* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* fixed comments
* revert
* clean
* revert
* revert
* fixed
* Update CMakeLists.txt
* Update script/cmake-ck-dev.sh
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Update include/ck/tensor_operation/gpu/element/unary_element_wise_operation.hpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Update CMakeLists.txt
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* fixed
* fixed
* fixed
* revert
* revert
* add comments
* format
* fixed assert
* fixed
* Fix I4 define in ckProfiler
* Fixed example_gemm_xdl_bf16_pk_i4_v3 test failed issue
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
2025-01-02 11:48:06 +08:00
Harisankar Sadasivan
d6d4c2788b
universal streamk fp8 changes ( #1665 )
...
* universal streamk fp8 changes & ckprofiler instances
* revert strides to -1 and verification options
* fp8 exclusion on pre-gfx94 for universal_streamk
* PR review based revisions: permissions reverted, removed hip err checks
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-11-21 08:21:37 -08:00
Rostyslav Geyyer
7d576f1748
Update GPU verification ( #1596 )
...
* Update inits
* Update static_cast to type_convert
* Add verification option selection
2024-10-25 08:13:46 -07:00
Rostyslav Geyyer
3f710930f6
Update default stride ( #1576 )
...
* Update default stride value to -1
* Fix format
* Revert "Fix format"
This reverts commit ae0c3649ec .
---------
Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com >
2024-10-21 08:45:22 -07:00
Rostyslav Geyyer
d18fc0797f
Fix default stride value ( #1559 )
2024-10-10 07:37:09 -07:00
Harisankar Sadasivan
75e622f02f
Universal streamk with atomics ( #1360 )
...
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile).
* Update README.md
* fixing clang-format issues
* removed conflicts in struct members between streamk and universal streamk
* corrected arg parsing for streamk and universal streamk
* added stream-k policies for 3 tile and 4 tile
* fixed argument type issue with parsing cmd args
* changes suggested in PR review are made- removing comments and correcting copyright
* file permissions updated
* added default value support for grid_size and streamk-policy selection set to -1
* print messages for arguments
* print messages for arguments
* print messages for arguments1
2024-07-05 21:40:30 -07:00