Clean up conv example, Instances, profiler and test (#324)

* convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test [ROCm/composable_kernel commit: 500fa99512]
2026-05-20 12:59:49 +00:00 · 2022-07-29 18:19:25 -05:00
parent 1450273dc5
commit 236f946292
373 changed files with 17544 additions and 17013 deletions
--- a/profiler/src/profile_batched_gemm_reduce.cpp
+++ b/profiler/src/profile_batched_gemm_reduce.cpp
@@ -24,9 +24,9 @@ int profile_batched_gemm_reduce(int argc, char* argv[])
        F16_F16_F16_F32_F32, // 1
    };

-    if(!(argc == 15 || argc == 16))
+    if(argc != 15)
    {
-        printf("arg1: tensor operation (batched_gemm: BatchedGEMM+Reduce)\n");
+        printf("arg1: tensor operation (batched_gemm_reduce: BatchedGEMM+Reduce)\n");
        printf("arg2: data type (0: fp32; 1: fp16)\n");
        printf("arg3: matrix layout (0: A[m, k] * B[k, n] = C[m, n];\n");
        printf("                     1: A[m, k] * B[n, k] = C[m, n];\n");
@@ -37,7 +37,6 @@ int profile_batched_gemm_reduce(int argc, char* argv[])
        printf("arg6: print tensor value (0: no; 1: yes)\n");
        printf("arg7: time kernel (0=n0, 1=yes)\n");
        printf("arg8 to 14: M, N, K, StrideA, StrideB, StrideC, BatchCount\n");
-        printf("arg15: split k into  mulitiple batch\n");
        exit(1);
    }