Aviral Goel
f5ac3ee359
chore(copyright): update copyright header for include directory ( #3224 )
...
* chore(copyright): update copyright header for tile_engine directory
* chore(copyright): update copyright header for script directory
* chore(copyright): update copyright header for test_data directory
* chore(copyright): update copyright header for python directory
* chore(copyright): update copyright header for profiler directory
* chore(copyright): update copyright header for library directory
* chore(copyright): update copyright header for include directory
2025-11-18 10:17:18 -08:00
Illia Silin
572cd820ce
Split env.hpp header from the ck.hpp header. ( #2049 )
...
* split env.hpp out of main headers
* fix namespace logic
2025-04-03 15:30:21 -07:00
Illia Silin
68a08c872e
Rebase the PR #1520 to ROCm repo. ( #1574 )
...
* Implement hiprtc for codegen tests
* Introduce gemm_softmax_gemm to codegen.
* Fix codegen build issues.
* Address PR comments.
* Separate ck_host lib and gemm_softmax_gemm into different PR.
* Fix cmake.
* Replace ENV variable with CMake option for toggling hipRTC in codegen
tests.
* Address PR comments.
* fix clang format
* Add missing header in magic_division.hpp
* - Workaround for hipRTC content wrapper
- Move descriptor for gemm_softmax_gemm to different branch
* Fix formatting.
* Revert "Fix formatting."
This reverts commit b5209eaef4 .
* formatting fix
* fixed header guard issues
* updated header guards
* updated data_type for new types
* fixed redefinition error
* Add codegen test for batched_gemm_softmax_gemm.
Signed-off-by: Mirza Halilcevic <mirza.halilcevic@amd.com >
* formatting fix
---------
Signed-off-by: Mirza Halilcevic <mirza.halilcevic@amd.com >
Co-authored-by: Dino Musić <dino.music@htecgroup.com >
Co-authored-by: Mirza Halilcevic <mirza.halilcevic@htecgroup.com >
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
Co-authored-by: arai713 <67439843+arai713@users.noreply.github.com >
Co-authored-by: Astha Rai <astha.rai713@gmail.com >
Co-authored-by: Mirza Halilcevic <mirza.halilcevic@amd.com >
2025-02-20 18:58:14 -08:00
Christopher Millette
ceaed8e097
Fixes small memory leak from missing hipEventDestroy ( #1554 )
2024-10-09 09:41:35 +02:00
Illia Silin
1274861a9d
replace the ENV macro with CK_ENV ( #1296 )
2024-05-17 10:42:51 -07:00
Illia Silin
fdbf8ccbd7
fix the output formatting ( #1282 )
2024-05-08 16:11:54 -07:00
Illia Silin
bf42097646
Enable logging in CK with environment variable. ( #1278 )
...
* enable logging using environment variable
* update ck.hpp header
* fix typo
* fix clang format
* Update include/ck/utility/env.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
2024-05-07 16:26:43 -07:00
Illia Silin
886d9eeb99
Add an option to change the number of warm-up cycles and iterations. ( #1124 )
...
* allow setting the number of warmup cycles and iterations for profiler
* fix the gemm_splitk and grouped_gemm examples
2024-01-09 09:43:08 -08:00
zjing14
e8cddfdc3b
Improve 4k gemm perf ( #1047 )
...
* improve 4k gemm perf
* add f8 instances
* format
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-11-17 07:06:24 -06:00
Illia Silin
bc1108bb3e
Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. ( #951 )
...
* Added error check after kernel launch (#919 )
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
* remove M=0 test cases for test_gemm_splitk
---------
Co-authored-by: Xiaodong Wang <xdwang@meta.com >
Co-authored-by: Xiaodong Wang <xw285@cornell.edu >
2023-09-27 15:19:33 -07:00
carlushuang
e7dca79d27
initial stream-k implementation with example ( #699 )
...
* initial stream-k implementation with example
* fix unexpected change in err
* improve a little bit performance by reorganize pipeline.
* improve perf a little bit by swizzle block idx
* add profiler
* update example
* fix spelling
* shrink karg for streamk
* support dynamic buffer using memory coherence glc_slc bit from template
* control memory coherence while construct dynamic buffer
* update reduction for streamk(not ready yet)
* Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting
* fix build issue
* fix several bug
* now result is correct, everything works (but has scratch)
* remove scratch by manually reset coordinate
* update device code
* fix a bug in final reduce
* fix something in example
* update async memset
* fix enum as camel case
* modify coherence enum name
* clean code and use atomic streamk by default
* remove unused var
* throw exception if have empty pointer
* fix format
* fix CI warning
* fix type in init
* modify CI error
* filter out on gfx10+
* restore changed example code
---------
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com >
2023-07-26 14:18:15 -05:00
Illia Silin
b94fd0b227
update copyright headers ( #726 )
2023-05-31 18:46:57 -05:00
Illia Silin
19490ac4f7
Clean up kernel launch output ( #569 )
...
* clean up output from kernel_launch
* set RUN_WARMUP to 0 by default
* split the warm-up into a separate issue
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
2023-02-15 12:07:21 -06:00
Chao Liu
500fa99512
Clean up conv example, Instances, profiler and test ( #324 )
...
* convnd_fwd fp16 example
* update example
* update example
* update instance
* updating refernce conv
* update reference conv
* update conv fwd profiler
* update conv 1d and 3d instance
* update include path
* clean
* update profiler for conv bwd data and weight
* update conv bwd weight
* clean
* update conv example
* update profiler for conv bwd weight
* update ckprofiler for conv bwd data
* fix reference conv bwd data bug; update conv bwd data test
* update examples
* fix initialization issue
* update test for conv fwd
* clean
* clean
* remove test case too sensitive to error threshhold
* fix test
* clean
* fix build
* adding conv multiple d
* adding conv multiple D
* add matrix padder
* add gemm padding to convnd
* adding group conv
* update gemm multi-d
* refactor
* refactor
* refactor
* clean
* clean
* refactor
* refactor
* reorg
* add ds
* add bias
* clean
* add G
* adding group
* adding group
* adding group
* update Tensor
* clean
* update example
* update DeviceGemmMultipleD_Xdl_CShuffle
* update conv bwd-data and bwd-weight
* upate contraction example
* update gemm and batch gemm with e permute
* fix example build
* instance for grouped conv1d
* update example
* adding group conv instance
* update gemm bilinear instance
* update gemm+add+add+fastgelu instance
* update profiler
* update profiler
* update test
* update test and client example
* clean
* add grouped conv into profiler
* update profiler
* clean
* add test grouped conv, update all conv test to gtest
* update test
2022-07-29 18:19:25 -05:00