mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
Change 1d,2d,... to 1D,2D,... (#997)
[ROCm/composable_kernel commit: 0abc0f87db]
This commit is contained in:
committed by
GitHub
parent
815ed3a1f9
commit
d2049bc4e7
@@ -17,7 +17,7 @@ None
|
||||
- Support for 3D grouped convolution on RDNA 3 GPUs (#935, #950, #985)
|
||||
- Grouped convolution support for small K and C (#822 #879 #897)
|
||||
- Support for NHWGC (2D and 3D) grouped convolution backward weight (#769 #804)
|
||||
- Support for bf16/f32/f16 and NHWGC (2D and 3d) grouped convolution backward data (#757 #799)
|
||||
- Support for bf16/f32/f16 and NHWGC (2D and 3D) grouped convolution backward data (#757 #799)
|
||||
- Support for Batched Gemm DL (#732)
|
||||
|
||||
### Changes
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
## Run ```example_reduce_blockwise```
|
||||
```bash
|
||||
# -D <xxx> : input 3d/4d/5d tensor lengths
|
||||
# -D <xxx> : input 3D/4D/5D tensor lengths
|
||||
# -R <xxx> : reduce dimension ids
|
||||
# -v <x> : verification (0=no, 1=yes)
|
||||
#arg1: data type (0: fp16, 1: fp32, 3: int8, 5: bp16, 6: fp64, 7: int4)
|
||||
@@ -22,7 +22,7 @@ Perf: 0.238063 ms, 264.285 GB/s, DeviceReduceBlockWise<256,M_C4_S1,K_C64_S1,InSr
|
||||
|
||||
## Run ```example_reduce_multiblock_atomic_add```
|
||||
```bash
|
||||
# -D <xxx> : input 3d/4d/5d tensor lengths
|
||||
# -D <xxx> : input 3D/4D/5D tensor lengths
|
||||
# -R <xxx> : reduce dimension ids
|
||||
# -v <x> : verification (0=no, 1=yes)
|
||||
#arg1: data type (0: fp32, 1: fp64)
|
||||
|
||||
@@ -4,7 +4,7 @@ arg1: verification (0=no, 1=yes)
|
||||
arg2: initialization (0=no init, 1=integer value, 2=decimal value)
|
||||
arg3: time kernel (0=no, 1=yes)
|
||||
Following arguments (depending on number of spatial dims):
|
||||
Number of spatial dimensions (1=Conv1d, 2=Conv2d, 3=Conv3d)
|
||||
Number of spatial dimensions (1=Conv1D, 2=Conv2D, 3=Conv3D)
|
||||
G, N, K, C,
|
||||
<filter spatial dimensions>, (ie Y, X for 2D)
|
||||
<input image spatial dimensions>, (ie Hi, Wi for 2D)
|
||||
|
||||
@@ -22,7 +22,7 @@ c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
Best Perf: 1.1933 ms, 107.977 TFlops, 79.0848 GB/s
|
||||
```
|
||||
|
||||
## Profile 2d forward convolution kernels
|
||||
## Profile 2D forward convolution kernels
|
||||
```bash
|
||||
#arg1: tensor operation (conv=Convolution)
|
||||
#arg2: data type (0=fp32, 1=fp16)
|
||||
@@ -115,7 +115,7 @@ Best Perf: 58.0306 ms, 37.8942 TFlops, 27.7545 GB/s
|
||||
# arg6: print tensor value (0: no; 1: yes)
|
||||
# arg7: time kernel (0: no, 1: yes)
|
||||
# Following arguments (depending on number of spatial dims):
|
||||
# Number of spatial dimensions (1=Conv1d, 2=Conv2d, 3=Conv3d)
|
||||
# Number of spatial dimensions (1=Conv1D, 2=Conv2D, 3=Conv3D)
|
||||
# G, N, K, C,
|
||||
# <filter spatial dimensions>, (ie Y, X for 2D)
|
||||
# <input image spatial dimensions>, (ie Hi, Wi for 2D)
|
||||
@@ -158,7 +158,7 @@ GB/s: 127.947
|
||||
# arg6: print tensor value (0: no; 1: yes)
|
||||
# arg7: time kernel (0: no, 1: yes)
|
||||
# Following arguments (depending on number of spatial dims):
|
||||
# Number of spatial dimensions (1=Conv1d, 2=Conv2d, 3=Conv3d)
|
||||
# Number of spatial dimensions (1=Conv1D, 2=Conv2D, 3=Conv3D)
|
||||
# G, N, K, C,
|
||||
# <filter spatial dimensions>, (ie Y, X for 2D)
|
||||
# <input image spatial dimensions>, (ie Hi, Wi for 2D)
|
||||
@@ -201,7 +201,7 @@ Note: This kernel use atomic add, this will cause output buffer to be accumulate
|
||||
# arg7: time kernel (0: no, 1: yes)
|
||||
# arg8: operation type (0: ImageToColumn, 1: ColumnToImage)
|
||||
# Following arguments (depending on number of spatial dims):
|
||||
# Number of spatial dimensions (1=Conv1d, 2=Conv2d, 3=Conv3d)
|
||||
# Number of spatial dimensions (1=Conv1D, 2=Conv2D, 3=Conv3D)
|
||||
# G, N, K, C,
|
||||
# <filter spatial dimensions>, (ie Y, X for 2D)
|
||||
# <input image spatial dimensions>, (ie Hi, Wi for 2D)
|
||||
|
||||
Reference in New Issue
Block a user