Illia Silin
504b101da3
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
2025-07-28 11:34:07 -07:00
Bartłomiej Kocot
c2e4898b4b
Grouped conv bwd data NGCHW ( #1967 )
...
* Grouped conv bwd data NGCHW
* fixes
* fix
* Improvements
* Fix
* Fix
* add client example
2025-03-17 13:32:00 +01:00
Bartłomiej Kocot
85d6fcd30a
Add Grouped Convolution and GEMM documentation ( #1719 )
...
* Add Grouped Convolution docs
* Add gemm docs
* Update docs
* fix
2025-02-04 16:41:49 +01:00
Haocong WANG
3049b5467c
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-14 10:42:30 +08:00
amoskvic
a776978cbe
Style improvement: improving type alias usage consistency in gemm-related client examples. Also copyright year update for all client examples. ( #1180 )
...
Co-authored-by: Arseny Moskvichev <amoskvic@amd.com >
2024-02-28 16:39:03 -08:00
Illia Silin
7965d66a81
Split the static library into several files. ( #1044 )
...
* spolit the static library into several
* update lib paths and fix client example
* do not use device_mha_operarions for client examples
* use appropriate libs to link to client examples
* remove the gpu/transpose path from the list
* try fixing clinet examples 3,4,9
* add necessary libs for client examples
* fix the layernorm client example
* fix the client examples 23 and 24
* fix typo
* add interface library and refresh clang format
2023-11-28 11:17:37 -08:00
zjing14
04f93aadb8
Grouped conv bwd data with fp16 input and bf8fp8 comp ( #962 )
...
* Add f8 bf8 gemm example
* Add element-wise ops
* Add intrinsics
* Update reference calculation
* Add an additional type option for xdlops gemm
* Fix build process
* Add bf8 to buffer addressing
* Update blockwise op, split typeA and typeB
* Update for compatibility
* Uppdate naming to f8->fp8
* Update naming
* Format
* Update naming (#937 )
* Add a client example
* Add computetypes to device and gridwise ops
* Add instances, update instance factory
* Format
* Fix a flag
* Add ckProfiler mode
* Fix typos
* Add an example
* Add bf8 generator
* add bf8 mfma; fixed type_convert for bf8
* move verfication ahead of timing
* Update reference calculation
* Fix reference
* Narrow down float init range
* Fix bf8 bf8 mfma
* Add bf8 @ fp8 mfma
* Update example
* Update instances
* Update profiler api
* Update for compatibility
* Format
* Remove extra example
* Clean up
* workaround convert
* added instance of f16_bf8f8, and client example
* fixed mfma selector
* format
---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-04 18:04:27 -05:00