Bartlomiej Wroblewski
627054b941
Add basic support for direct loads from global to LDS ( #999 )
...
* Add basic support for direct loads from global to LDS
* Clean the code and comments
* Add support for fp16
* Add comments
* Add check for thread cluster lengths
* Align non-direct-load fp16 example
* Small fixes
* Extend IsSupported to check for supported GPU gens
* Build examples only on the supported HW
* Do not throw when instance not supported in 04 example
* Review: Apply review suggestions
* Review: small fix
* Review: small fix
2023-11-25 13:35:22 +01:00
Illia Silin
b94fd0b227
update copyright headers ( #726 )
2023-05-31 18:46:57 -05:00
Raman R jana
1cfa87608a
Wavelet (inter-wave consumer-producer) GEMM ( #310 )
...
* wavelet gemm programming model support for CK
* GEMM pipeline update for wavelet progrmmaing model
* Updated wavelet programming pipeline
* fixes for global-write for math-wave
* fixed bug in global writes
* Updated comments for better readability
* fixed clang format errors
* added block_lds without barrier sync
* clean
* clean
* clean
* clean
* refactor
* prototype
4 layouts
fix default stride
all problem sizes
tidy
move file
update build script
restore old file
fix build
* refactor standalone test to use gemm test harness
* simplify gemm test
* update build script
* remove redundant
* early return when cmd arg doesn't match
* tidy
* report failure when result not validated
* tidy
* Add comment depicting B2C mapping pattern.
* Formatting & comments.
* Comparison with custom B2C mapping pattern.
* Example for wavelet gemm.
* Add wavelet to Gemm standalone test.
* Remove debug code.
* Remove dangling #endif directive.
Co-authored-by: root <Raman Jana>
Co-authored-by: Chao Liu <chao.liu2@amd.com >
Co-authored-by: Adam Osewski <aosewski@amd.com >
Co-authored-by: Anthony Chang <ac.chang@outlook.com >
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2023-01-18 12:00:02 -06:00
ltqin
10b3278b05
Skip lds of b matrix ( #326 )
...
* start
* read for gridwise gemm
* add MakeBGridDescriptor_K0_N0_N1_N2_N3_K1
* add thread copy desc and register buffer
* add K0PerBlock dim
* add read global data
* finish gridwise gemm
* finish blockwise gemm
* add print data
* add smallest config
* add compare code for gridwis gemm
* fix NXdlPerWave
* fix k0perthread and gridewis gemm main loop
* remove b matrix lds alloc
* fix name
* add test code
* create b_grid_desc_k0_k1_k2_n0_n1_n2_n3_k3 from parameter
* add double register
* modify b_thread_desc_
* add float
* fp16 tag
* add tail for pipeline
* finish main loop
* optimize main loop
* start clear gridwise gemm
* clear code
* clear redundant code
* change file name
* change file name
* fix bug after merge develop
* fix input parameters
* using MultiK0 control b load data loop
* fix some config
* 4 buffer
* fix bug
* one can use
* change read order
* change buffer array to tuple
* change to 8 buffer
* interleave buffer load
* change to 16
* read 8 buffer
* add data buffer to template
* fix after merge develop(head file)
* format
* change to 4 buffer
* remove unnecessary lambda fun
2022-08-13 01:35:49 -05:00
Chao Liu
d3051d7517
add license in file ( #303 )
2022-06-24 23:32:43 -05:00
Chao Liu
d1db6a0c3e
Absolute include path ( #281 )
...
* ad gelu and fast_gelu
* added GeLU and fast GeLU
* clean up
* add gemm+fastgelu example
* add gemm+gelu instances
* update profiler
* clean up
* clean up
* adding gemm+bias+activation
* clean
* adding bias
* clean
* adding gemm multiple d
* debugging
* add gemm bias add fastgelu
* rename, clean
* refactoring; add readme
* refactor
* refactor
* refactor
* refactor
* refactor
* refactor
* fix
* fix
* update example
* update example
* rename
* update example
* add ckProfiler
* clean
* clean
* clean
* clean
* add client app example
* update readme
* delete obselete files
* remove old client app
* delete old file
* cleaning
* clean
* remove half
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path for all examples
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* revert client app example
* clean build
* fix build
* temporary disable client test on Jenkins
* clean
* clean
* clean
2022-06-24 20:51:04 -05:00
Chao Liu
cd167e492a
Compile for gfx908 and gfx90a ( #130 )
...
* adding compilation for multiple targets
* fix build
* clean
* update Jekinsfile
* update readme
* update Jenkins
* use ck::half_t instead of ushort for bf16
* rename enum classes
* clean
* rename
* clean
2022-03-31 12:33:34 -05:00
Chao Liu
5d37d7bff4
Reorganize files, Part 1 ( #119 )
...
* delete obselete files
* move files
* build
* update cmake
* update cmake
* fix build
* reorg examples
* update cmake for example and test
2022-03-08 21:46:36 -06:00