Commit Graph

9 Commits

Author SHA1 Message Date
Bartłomiej Kocot
4ec5c52a0c Add Grouped Conv Fwd Large Tensor kernel (#1432)
* Support 64 bit indexing

* Add new grouped conv fwd kernel for large tensors

* Add instances large tensor

* Fixes for transform conv to gemm

* Fixes

* fixes

* Remove not needed instances

* examples fixes

* Remove not need ds arrays

* Fix tests

* Add 2GB check in gridwise dl

* Fixes
2024-08-06 10:06:10 +02:00
Rostyslav Geyyer
bbefc12a26 Add instances for conv_scale with bf8@fp8->fp8 (#1231)
* Add instances

* Add example

* Add profiler mode

* Add client example
2024-04-11 10:35:00 -05:00
Rostyslav Geyyer
a61e73bc56 Add instances for conv_scale with fp8@bf8->fp8 (#1220)
* Update device op api to support BComputeType

* Add example

* Add instances

* Add profiler mode

* Add client example

* Update copyright year

* Add BComputeType check

* Fix compute types
2024-04-03 09:08:08 -05:00
Rostyslav Geyyer
fd0d093e78 Add instances for conv_scale with bf8 in / fp8 out (#1200)
* Add bf8 conv fwd instances

* Add example

* Add profiler mode

* Add client example

* Fix copyright headers

* Format
2024-03-21 13:57:34 -05:00
Rostyslav Geyyer
e626d5202a Add instances for conv_scale with fp8 in/out (#1193)
* Add fp8 conv instances and client example

* Format

* Add example

* Update cmakelists

* Add profiler mode

* Format

* Fix copyright headers
2024-03-15 09:50:03 -07:00
Illia Silin
b94fd0b227 update copyright headers (#726) 2023-05-31 18:46:57 -05:00
Po Yen Chen
8784a72e23 Modularize ckProfiler operations (#514)
* Re-structure ckProfiler source files

* Rename profiler.cpp to main.cpp

* Modularize ckProfiler operations

* Add description for profiler operations

* Use longer name to avoid name collision

* Use macro to delay expansion

* Use std::move() to avoid object copying

* Prohibit users from calling dtor

* Use macro to eliminate redundant code

* Make friend function hidden

* Add missing include directive <iostream>

* Fix wrong include directives

* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test

Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>
2022-12-01 15:15:02 -06:00
Chao Liu
146972f447 fix bug in gemm profiler (#344) 2022-08-07 12:23:32 -05:00
Chao Liu
500fa99512 Clean up conv example, Instances, profiler and test (#324)
* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test
2022-07-29 18:19:25 -05:00