Adam Osewski
|
1d8e4ec2ce
|
Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762)
* add a prototype of int4
* clean
* debug
* clean
* clean
* move packed into dynamic_buffer
* fixed coord reset
* add fast pki4 to half conversion
* fix
* fixed reference and host_tensor
* fixed tensor init
* format
* debug i4_to_f16_convert
* format
* fixed splitk
* weight permute
* add b tile permute
* clean
* weight permute with splitki
* format
* improve weight layout
* add and_or_b32
* fixed splitk crush
* add permute switch as a template
* recover v3r1
* clean
* failure with intrawave v2
* fixed
* fixed
* add ckProfiler
* add bfp16 support
* add bf16 example
* fixed int4 to bhalf_t conversion
* format
* fixed int4 to bf16 conversion
* clean
* add instances for mem
* clean
* fixed host tensor size
* fixed
* debug
* fixed
* add pk_i4_t as a struct
* fix
* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* revert
* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* fixed comments
* revert
* clean
* revert
* revert
* fixed
* Update CMakeLists.txt
* Update script/cmake-ck-dev.sh
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Update include/ck/tensor_operation/gpu/element/unary_element_wise_operation.hpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Update CMakeLists.txt
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* fixed
* fixed
* fixed
* revert
* revert
* add comments
* format
* fixed assert
* fixed
* Fix I4 define in ckProfiler
* Fixed example_gemm_xdl_bf16_pk_i4_v3 test failed issue
---------
Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: mtgu0705 <mtgu@amd.com>
|
2025-01-02 11:48:06 +08:00 |
|
Illia Silin
|
b94fd0b227
|
update copyright headers (#726)
|
2023-05-31 18:46:57 -05:00 |
|
Chao Liu
|
d3051d7517
|
add license in file (#303)
|
2022-06-24 23:32:43 -05:00 |
|
Chao Liu
|
cd167e492a
|
Compile for gfx908 and gfx90a (#130)
* adding compilation for multiple targets
* fix build
* clean
* update Jekinsfile
* update readme
* update Jenkins
* use ck::half_t instead of ushort for bf16
* rename enum classes
* clean
* rename
* clean
|
2022-03-31 12:33:34 -05:00 |
|
Chao Liu
|
5d37d7bff4
|
Reorganize files, Part 1 (#119)
* delete obselete files
* move files
* build
* update cmake
* update cmake
* fix build
* reorg examples
* update cmake for example and test
|
2022-03-08 21:46:36 -06:00 |
|