changed long_index_t to index_t when computing memory offset uncomment other ops in profiler added test for batched_gemm [ROCm/composable_kernel commit: cb87b049de]
cb87b049de