Illia Silin
9ce65cae0e
[gfx110x] support Navi3x architectures. ( #628 )
...
* enable building on Nav31
* fix syntax
* replace GPU_TARGETS with offload-arch
* add gfx1102 rachitecture
* fix typo
* update changelog
[ROCm/composable_kernel commit: 0ccecc7c31 ]
2023-03-09 07:56:40 -06:00
Haocong WANG
789c15d703
[Navi3x] Add Device Operations ( #567 )
...
* wmma_op + unit test
* add arch limitation to wmma test
* change arch limitation
* Refactor + Add all type unit test(int4 compile failed)
* Add f32_16x16x16_bf16 unit test
* tempsave
* tempsave
* tempsave
* runtime bug, cannot find symbol
* workaround for incorrect HIP warpSize return value
* debugging
* tempsave
* Correctness OK, waiting for optimization
* Tidy up + format
* temp save
* temp save, reproduce the v_bfi_b32 issue
* add inline asm for wmmaop test
* tidy up
* clean some debug purpose code
* discard some codes
* clang format
* clang format
* compiler issue fixed + increase tile size
* navi3x_multipleD+example
* temp save
* workable
* batchedgemm[OK], groupconv[debug]
* groupconv: Sanity check[OK], Performance[Bad]
* navi3x_groupconv_need_optimization
* format
* Add arch limitation to all wmma examples
* fix bug: example30 input conv args
[ROCm/composable_kernel commit: 0cfda84d05 ]
2023-02-15 11:50:51 -06:00
Po Yen Chen
a4776782a5
Rangify constructor of HostTensorDescriptor & Tensor<> ( #445 )
...
* Rangify STL algorithms
This commit adapts rangified std::copy(), std::fill() & std::transform()
* Rangify check_err()
By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.
* Allow constructing Tensor<> like a HostTensorDescriptor
* Simplify Tensor<> object construction logics
* Remove more unnecessary 'HostTensorDescriptor' objects
* Re-format example code
* Re-write more HostTensorDescriptor ctor call
[ROCm/composable_kernel commit: 4a2a56c22f ]
2022-11-11 11:36:01 -06:00
Adam Osewski
c747be612f
Refactor device op implementations into impl subdirectory. ( #420 )
...
* Move kernel implementation files under impl directory.
* Update examples paths.
* Update device kernel impl include paths.
* Update tensor operation instances include paths.
* Update profiler and tests include paths.
* Clang-format
* Update include paths for batched gemm reduce
* Refactor UnitTest ConvNDBwdWeight.
* Refactor fwd and bwd data convND UT.
* Fix used test macro.
* Fix include path.
* Fix include paths.
* Fix include paths in profiler and tests.
* Fix include paths.
Co-authored-by: Adam Osewski <aosewski@amd.com >
[ROCm/composable_kernel commit: 3048028897 ]
2022-10-13 09:05:08 -05:00
Chao Liu
236f946292
Clean up conv example, Instances, profiler and test ( #324 )
...
* convnd_fwd fp16 example
* update example
* update example
* update instance
* updating refernce conv
* update reference conv
* update conv fwd profiler
* update conv 1d and 3d instance
* update include path
* clean
* update profiler for conv bwd data and weight
* update conv bwd weight
* clean
* update conv example
* update profiler for conv bwd weight
* update ckprofiler for conv bwd data
* fix reference conv bwd data bug; update conv bwd data test
* update examples
* fix initialization issue
* update test for conv fwd
* clean
* clean
* remove test case too sensitive to error threshhold
* fix test
* clean
* fix build
* adding conv multiple d
* adding conv multiple D
* add matrix padder
* add gemm padding to convnd
* adding group conv
* update gemm multi-d
* refactor
* refactor
* refactor
* clean
* clean
* refactor
* refactor
* reorg
* add ds
* add bias
* clean
* add G
* adding group
* adding group
* adding group
* update Tensor
* clean
* update example
* update DeviceGemmMultipleD_Xdl_CShuffle
* update conv bwd-data and bwd-weight
* upate contraction example
* update gemm and batch gemm with e permute
* fix example build
* instance for grouped conv1d
* update example
* adding group conv instance
* update gemm bilinear instance
* update gemm+add+add+fastgelu instance
* update profiler
* update profiler
* update test
* update test and client example
* clean
* add grouped conv into profiler
* update profiler
* clean
* add test grouped conv, update all conv test to gtest
* update test
[ROCm/composable_kernel commit: 500fa99512 ]
2022-07-29 18:19:25 -05:00
Chao Liu
7a98e9fa34
N-D Tensor Contraction example, instance, and client example ( #270 )
...
* adding contraction
* add contraction example
* update examle
* update example
* format
* update readme
* clean header
* clean header
* contraction with multiple D
* rename
* fix naming issue; add instances for contraction+bilinear
* change assumed virtual layout of contraction; add client example
* update example
* update
* contraction+scale
* use type_convert
* rename
[ROCm/composable_kernel commit: 4fe9c393b8 ]
2022-07-07 14:31:11 -05:00
Chao Liu
aca6de2e5a
Gemm+Bilinear ( #316 )
...
* refactor
* update example
* update example
* gemm bilinear
* clean
* update
[ROCm/composable_kernel commit: 9e4429f9c3 ]
2022-07-02 09:15:38 -05:00