Aviral Goel
f5ac3ee359
chore(copyright): update copyright header for include directory ( #3224 )
...
* chore(copyright): update copyright header for tile_engine directory
* chore(copyright): update copyright header for script directory
* chore(copyright): update copyright header for test_data directory
* chore(copyright): update copyright header for python directory
* chore(copyright): update copyright header for profiler directory
* chore(copyright): update copyright header for library directory
* chore(copyright): update copyright header for include directory
2025-11-18 10:17:18 -08:00
Max Podkorytov
f39626fcf7
[CK][host] limit the rotating count to prevent oom ( #3089 )
...
* [CK][host] limit the rotating count to prevent oom
* add numeric header for accumulate
2025-10-24 08:55:54 -07:00
Enrico Degregori
3d29bff2f0
Wmma support for multiple ABD GEMM ( #2803 )
...
* multi_abd wmma support:
- Add multiple A and B support to multiple D implementation (gridwise level)
- Add multi_abd GEMM (device level)
- Add instances (xdl parity)
- Add tests (both xdl and wmma)
- Add examples
- Add ckProfiler support (both xdl and wmma)
* Fix bug in device print function
* Fix unused template parameter
* Fix batched gemm for multiABD gridwise implementation
* Fix gemm_universal_reduce with multiABDs gridwise implementation
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-09-22 18:49:06 -07:00
Illia Silin
572cd820ce
Split env.hpp header from the ck.hpp header. ( #2049 )
...
* split env.hpp out of main headers
* fix namespace logic
2025-04-03 15:30:21 -07:00
valarLip
37f7afed1e
add int8 gemm multiply multiply a8w8 ( #1591 )
...
* add int8 gemm multiply multiply a8w8
* uncomment
* clang-format-12
* Add example_gemm_multiply_multiply_xdl_int8
* Remove shell scripts
* update preprocess number for mi308; bring back printout in ckprofiler
* format
---------
Co-authored-by: chenjun <junchen2@amd.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-10-26 16:39:34 +08:00
zjing14
105bd708c7
Add rotating buff for gemm_multi_d ( #1411 )
...
* add rotating_buff for gemm_multi_d
* format
* Update flush_cache.hpp
* Update gtest.cmake
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
2024-07-25 23:21:21 +08:00
Bartłomiej Kocot
fd72380aeb
Optimize grouped conv bwd weight for small M and N ( #1303 )
...
* Optimize grouped conv bwd weight for small M and N
* Fixes
2024-05-22 21:01:01 +02:00
Illia Silin
1274861a9d
replace the ENV macro with CK_ENV ( #1296 )
2024-05-17 10:42:51 -07:00
Illia Silin
fdbf8ccbd7
fix the output formatting ( #1282 )
2024-05-08 16:11:54 -07:00
Illia Silin
bf42097646
Enable logging in CK with environment variable. ( #1278 )
...
* enable logging using environment variable
* update ck.hpp header
* fix typo
* fix clang format
* Update include/ck/utility/env.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
2024-05-07 16:26:43 -07:00
Illia Silin
08d51d9bc4
add missing vector header ( #1275 )
2024-05-02 11:27:59 -07:00
ltqin
f448d179b7
Universal gemm flush cache ( #1251 )
...
* add flush cache to device op
* add flush cache parameter to ckProfiler
* change calculate size a and b method
* chang evaluation time method foro AVERAGE to MEDIAN
* format code
* adjust some code
* fix core dumped
* remove loop call flush icache in kernel
* remove loop(outer) call flush icache
---------
Co-authored-by: letaoqin <letaoqin@amd.com >
2024-04-25 15:07:14 -05:00