Haocong WANG
3049b5467c
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-14 10:42:30 +08:00
Rostyslav Geyyer
204da9c522
Move grouped conv fwd client examples ( #1299 )
...
* Move grouped conv fwd client examples
* Update existing examples
* Format
2024-05-21 09:52:41 -05:00
Rostyslav Geyyer
bbefc12a26
Add instances for conv_scale with bf8@fp8->fp8 ( #1231 )
...
* Add instances
* Add example
* Add profiler mode
* Add client example
2024-04-11 10:35:00 -05:00
Rostyslav Geyyer
a61e73bc56
Add instances for conv_scale with fp8@bf8->fp8 ( #1220 )
...
* Update device op api to support BComputeType
* Add example
* Add instances
* Add profiler mode
* Add client example
* Update copyright year
* Add BComputeType check
* Fix compute types
2024-04-03 09:08:08 -05:00
Rostyslav Geyyer
fd0d093e78
Add instances for conv_scale with bf8 in / fp8 out ( #1200 )
...
* Add bf8 conv fwd instances
* Add example
* Add profiler mode
* Add client example
* Fix copyright headers
* Format
2024-03-21 13:57:34 -05:00
Rostyslav Geyyer
e626d5202a
Add instances for conv_scale with fp8 in/out ( #1193 )
...
* Add fp8 conv instances and client example
* Format
* Add example
* Update cmakelists
* Add profiler mode
* Format
* Fix copyright headers
2024-03-15 09:50:03 -07:00
amoskvic
a776978cbe
Style improvement: improving type alias usage consistency in gemm-related client examples. Also copyright year update for all client examples. ( #1180 )
...
Co-authored-by: Arseny Moskvichev <amoskvic@amd.com >
2024-02-28 16:39:03 -08:00
Illia Silin
7965d66a81
Split the static library into several files. ( #1044 )
...
* spolit the static library into several
* update lib paths and fix client example
* do not use device_mha_operarions for client examples
* use appropriate libs to link to client examples
* remove the gpu/transpose path from the list
* try fixing clinet examples 3,4,9
* add necessary libs for client examples
* fix the layernorm client example
* fix the client examples 23 and 24
* fix typo
* add interface library and refresh clang format
2023-11-28 11:17:37 -08:00
Bartłomiej Kocot
f2398f612d
Introduce multiABD api and deprecate multiD ( #1035 )
...
* Introduce multiABD api and deprecate multiD
* Replace multiD with multiABD
* Mark structures as deprecated
* Change doxygen deprecated to note to avoid warnings
2023-11-14 17:00:40 +01:00
zjing14
e921e1f08d
3d grouped conv fwd with input/output fp16 and comp fp8 ( #931 )
...
* add f8 comp instance
* fixed
* fixed comments
* rename
* fixed dtype
* format
* fixed CI
* fixed ci
* add missing ComputeType
* fixed cit
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 20:04:26 -05:00
zjing14
309b1c6461
Fixed Weight layout of grouped_conv 3d fwd ( #743 )
...
* Changed wei layout
* changed layout for examples
* fixed client example
---------
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
2023-06-15 10:19:33 -05:00
Adam Osewski
e9fd122889
Conv3D FWD BWD WRW fp16 fp32 client examples ( #559 )
...
* Conv3d bwd weight client example.
* Update year in license
* Convolution bwd data 3D fp16/fp32 client example.
* Client example for convnd fwd fp16 fp32
* clang-format
* Review remarks.
* Fix compiler err.
* Update data layout to standard one.
* Add conv 3d fwd NDHWGC instances
* clang-format
* Conv3d fwd NDHWGC instances.
---------
Co-authored-by: Adam Osewski <aosewski@amd.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
2023-02-15 11:16:47 -06:00