This website requires JavaScript.
Explore
Help
Register
Sign In
ROCm
/
composable_kernel
Watch
1
Star
0
Fork
0
You've already forked composable_kernel
mirror of
https://github.com/ROCm/composable_kernel.git
synced
2026-05-11 17:00:18 +00:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
Files
87d8740bf5d8030f9e4e54c9b7e64f353a6f944e
composable_kernel
/
driver
History
Chao Liu
87d8740bf5
added lds double buffer (on C dimension) for implicit gemm v1r3, as a result, it should achieve 90% of peak for all filter sizes, on CHWN format
2019-04-19 17:49:25 -05:00
..
CMakeLists.txt
update build
2019-02-15 02:06:34 -06:00
device_convolution_implicit_gemm_v1_chwn_cyxk_khwn.hpp
added implicit gemm v1r3, refactored decomposition of wei tensor (loop over y, x first, and C second) to allow easy lds double buffer on C
2019-04-19 16:46:29 -05:00
device_convolution_implicit_gemm_v1_nchw_cyxk_khwn.hpp
added implicit gemm v1r3, refactored decomposition of wei tensor (loop over y, x first, and C second) to allow easy lds double buffer on C
2019-04-19 16:46:29 -05:00
device_convolution_implicit_gemm_v2_chwn_cyxk_khwn.hpp
refactor ConstantTensorDescriptor and functional
2019-04-16 17:36:18 -05:00
device_direct_convolution_1.hpp
experimenting
2019-03-24 12:09:57 -05:00
device_direct_convolution_2_nchw_kcyx_nkhw.hpp
experimenting
2019-03-24 12:09:57 -05:00
device_direct_convolution_2_vectorized_nchw_kcyx_nkhw.hpp
refactor ConstantTensorDescriptor and functional
2019-04-16 17:36:18 -05:00
device_implicit_gemm_convolution_1_chwn_cyxk_khwn_padded.hpp
experimenting
2019-03-24 12:09:57 -05:00
driver.cu
update build
2019-02-15 02:06:34 -06:00
driver.hip.cpp
added lds double buffer (on C dimension) for implicit gemm v1r3, as a result, it should achieve 90% of peak for all filter sizes, on CHWN format
2019-04-19 17:49:25 -05:00