yinglu
cec66a4b18
ck: add tf32 in DTYPES to control instances build( #3317 )
...
[ROCm/composable_kernel commit: 8fec8054b2 ]
2025-12-08 16:24:20 +08:00
Aviral Goel
4cbde83cb6
chore(copyright): update copyright header for profiler directory ( #3205 )
...
* chore(copyright): update copyright header for tile_engine directory
* chore(copyright): update copyright header for script directory
* chore(copyright): update copyright header for test_data directory
* chore(copyright): update copyright header for python directory
* chore(copyright): update copyright header for profiler directory
[ROCm/composable_kernel commit: 0aadb4b2c4 ]
2025-11-14 11:19:25 -08:00
yinglu
126a2a4cf4
Simulate TF32 with BF16x3 ( #3142 )
...
* tf32:bf16x3:use bf16x3 emulate tf32 gemm
* change blockwiseGemm to demo bf16x3
* temp push
* self review
* self review
* fix multi-device compile error
* bug fix
* code refactor
* limit to gfx950
* enhance gemm gfx942 threshold
* lower change from blockwise to warpwise
* refact codes
* refact codes
* error fix
* change threshold
* bug fix
* fix threshold error
* change host reference implement to same as device
* bug fix
* bug fix
* code refact
* fix clang-format fail
* code refine
[ROCm/composable_kernel commit: 2a73eb3bc0 ]
2025-11-13 16:21:09 -08:00
yinglu
7fd5de4ec4
Conv:TF32: add more instances - 2 ( #2879 )
...
* add instances of device_grouped_conv_fwd_xdl_f32_comp_instances
* add instances of device_grouped_conv_fwd_xdl_f32_tf32_mem_instances
* add instances of device_grouped_conv_fwd_xdl_large_tensor_f32_tf32_instances
* tf32:conv:add instances for base class DeviceConvFwd
* tf32:conv:add instances for base class DeviceGroupedConvBwdDataMultipleD
* tf32:conv:add instances for base class DeviceGroupedConvBwdWeight
* add tf32 in profiler
* remove gnhwc/ngchw/ngcdhw instances
* remove non-ndhwgc/nhwgc/nhwc instances
* add check in IsSupportedArgument()
[ROCm/composable_kernel commit: fada1a3cae ]
2025-10-10 15:28:17 +08:00
yinglu
19463895a8
TF32 POC in Conv3d on MI30x platform #2763 (second attempt) ( #2852 )
...
* Revert "Revert "feature:tf32:add initial conv3d fwd kernel support (#2763 )" (#2848 )"
This reverts commit 954db22b39 .
* fix compile error on gf12x
* only run tf32 example on gfx942
* only build tf32 instance on gfx942
* ckProfiler:only support tf32 in gfx942
* delete unuseful messages
[ROCm/composable_kernel commit: dd7af118d7 ]
2025-09-17 14:50:15 -07:00
Illia Silin
954db22b39
Revert "feature:tf32:add initial conv3d fwd kernel support ( #2763 )" ( #2848 )
...
This reverts commit d4dbf93119 .
[ROCm/composable_kernel commit: 03b59f8c76 ]
2025-09-15 08:27:04 -07:00
lym
d4dbf93119
feature:tf32:add initial conv3d fwd kernel support ( #2763 )
...
[ROCm/composable_kernel commit: c51102144f ]
2025-09-15 21:03:00 +08:00
aledudek
c26c2b1fdc
Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 ( #2069 )
...
* Part1
* Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16
* Add missing coma
* Add missing cpp instance files
* Fix 3d layout
* Add missing closing bracket
* Add missing comp x2 and part2 instances
* Fix typo in instance name
* fix
* Fix
---------
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: 7c32652e03 ]
2025-04-16 11:00:55 +02:00
Bartłomiej Kocot
6ccfb817e4
Add support for GKCYX grouped conv fwd ( #2015 )
...
* Add support for GKCYX grouped conv fwd
* fixes
* fix
* changelog
* Fixes
[ROCm/composable_kernel commit: 54c81a1fcf ]
2025-03-26 21:13:38 +01:00
Bartłomiej Kocot
e4f4e04add
Add support for NGCHW in grouped conv fwd ( #1499 )
...
* Support NGCHW in grouped conv fwd
* Remove not needed variable
* Fixes
[ROCm/composable_kernel commit: 4ba52b35dc ]
2024-09-20 10:45:46 +02:00
Bartłomiej Kocot
458d8bef26
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
[ROCm/composable_kernel commit: 4ec5c52a0c ]
2024-08-06 10:06:10 +02:00
Rostyslav Geyyer
36e4675cc5
Add instances for conv_scale with bf8@fp8->fp8 ( #1231 )
...
* Add instances
* Add example
* Add profiler mode
* Add client example
[ROCm/composable_kernel commit: bbefc12a26 ]
2024-04-11 10:35:00 -05:00
Rostyslav Geyyer
0b8e766e55
Add instances for conv_scale with fp8@bf8->fp8 ( #1220 )
...
* Update device op api to support BComputeType
* Add example
* Add instances
* Add profiler mode
* Add client example
* Update copyright year
* Add BComputeType check
* Fix compute types
[ROCm/composable_kernel commit: a61e73bc56 ]
2024-04-03 09:08:08 -05:00
Rostyslav Geyyer
4fe9769987
Add instances for conv_scale with bf8 in / fp8 out ( #1200 )
...
* Add bf8 conv fwd instances
* Add example
* Add profiler mode
* Add client example
* Fix copyright headers
* Format
[ROCm/composable_kernel commit: fd0d093e78 ]
2024-03-21 13:57:34 -05:00
Rostyslav Geyyer
17be33ccf9
Add instances for conv_scale with fp8 in/out ( #1193 )
...
* Add fp8 conv instances and client example
* Format
* Add example
* Update cmakelists
* Add profiler mode
* Format
* Fix copyright headers
[ROCm/composable_kernel commit: e626d5202a ]
2024-03-15 09:50:03 -07:00
Illia Silin
b57fbee2f1
update copyright headers ( #726 )
...
[ROCm/composable_kernel commit: b94fd0b227 ]
2023-05-31 18:46:57 -05:00
Po Yen Chen
02db748e74
Modularize ckProfiler operations ( #514 )
...
* Re-structure ckProfiler source files
* Rename profiler.cpp to main.cpp
* Modularize ckProfiler operations
* Add description for profiler operations
* Use longer name to avoid name collision
* Use macro to delay expansion
* Use std::move() to avoid object copying
* Prohibit users from calling dtor
* Use macro to eliminate redundant code
* Make friend function hidden
* Add missing include directive <iostream>
* Fix wrong include directives
* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com >
[ROCm/composable_kernel commit: 8784a72e23 ]
2022-12-01 15:15:02 -06:00
Chao Liu
be8f189a9e
fix bug in gemm profiler ( #344 )
...
[ROCm/composable_kernel commit: 146972f447 ]
2022-08-07 12:23:32 -05:00
Chao Liu
236f946292
Clean up conv example, Instances, profiler and test ( #324 )
...
* convnd_fwd fp16 example
* update example
* update example
* update instance
* updating refernce conv
* update reference conv
* update conv fwd profiler
* update conv 1d and 3d instance
* update include path
* clean
* update profiler for conv bwd data and weight
* update conv bwd weight
* clean
* update conv example
* update profiler for conv bwd weight
* update ckprofiler for conv bwd data
* fix reference conv bwd data bug; update conv bwd data test
* update examples
* fix initialization issue
* update test for conv fwd
* clean
* clean
* remove test case too sensitive to error threshhold
* fix test
* clean
* fix build
* adding conv multiple d
* adding conv multiple D
* add matrix padder
* add gemm padding to convnd
* adding group conv
* update gemm multi-d
* refactor
* refactor
* refactor
* clean
* clean
* refactor
* refactor
* reorg
* add ds
* add bias
* clean
* add G
* adding group
* adding group
* adding group
* update Tensor
* clean
* update example
* update DeviceGemmMultipleD_Xdl_CShuffle
* update conv bwd-data and bwd-weight
* upate contraction example
* update gemm and batch gemm with e permute
* fix example build
* instance for grouped conv1d
* update example
* adding group conv instance
* update gemm bilinear instance
* update gemm+add+add+fastgelu instance
* update profiler
* update profiler
* update test
* update test and client example
* clean
* add grouped conv into profiler
* update profiler
* clean
* add test grouped conv, update all conv test to gtest
* update test
[ROCm/composable_kernel commit: 500fa99512 ]
2022-07-29 18:19:25 -05:00