composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-21 13:29:20 +00:00

Author	SHA1	Message	Date
Anthony Chang	8bb6c6e120	use single threaded tensor generator (#161 ) [ROCm/composable_kernel commit: `f015c77687`]	2022-03-30 22:28:30 -05:00
Jianfeng Yan	297ef9795d	batched_gemm: use profiler in ctest (#163 ) [ROCm/composable_kernel commit: `c8f3acf9c0`]	2022-03-30 21:32:49 -05:00
Jianfeng Yan	cb97ce68d8	Batched gemm and reduction (#156 ) * adding batched_gemm_and_reduction * batched_gemm_reduce works with bactch_count=1 * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1 * adding profiler for batched_gemm_fp16 * fixed a bug in declaration of d1 and d0; both example and profiler work * clang-format * cleanup * batched_gemm_reduce: add test * minor change * fixed some typo in function names [ROCm/composable_kernel commit: `34c661e71c`]	2022-03-30 11:21:18 -05:00
Jianfeng Yan	0d02cb3dfe	Batched gemm bf16 (#142 ) * add bf16 for batched gemm * batched_gemm_bf16 works * recover accidently changed files [ROCm/composable_kernel commit: `d91f9f119c`]	2022-03-22 18:18:43 -05:00
Jianfeng Yan	4ddc016c60	refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler (#120 ) changed long_index_t to index_t when computing memory offset uncomment other ops in profiler added test for batched_gemm [ROCm/composable_kernel commit: `cb87b049de`]	2022-03-21 16:45:14 -05:00
zjing14	e57c9a886f	Batched GEMM for fp16 (#79 ) * prepare host for batched_gemm * init commit of batched kernels * fixed * refine transform with freeze * m/n padding * fixed a bug; clean * add small tiles * clean * clean code * clean code * add nt, tn, tt layout * add missing file * use StaticBufferTupleOfVector instead * add reference_batched_gemm * fixed a macro [ROCm/composable_kernel commit: `b53e9d08ed`]	2022-02-11 09:36:52 -06:00

6 Commits