mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-20 06:49:15 +00:00
Add a new gemm pipeline based on ComputeV4 which utilizes async copy API (#2949)
* check in pipeline and policy for async load in mi350, need to make sure TileAccessPattern is warp_raked or block_raked solve merge conflicts * fix cmakelists * make it build * fix? buffer async fence * relax fences; it appears it only is needed between pairs of ping-pongs * remove fences * remove fences * cleanup and reformat * add steps annotations * comment all pipeline steps / remove unexplainable syncs * clang-format * add comment * cleanup kernel types for test * fix comment * fix hardcoded warp size * faithfully copy block gemm from compute v4 policy to async policy * make async test gfx950 only * fix cmake logic * set separate compile options for async * refine comment in policy * try update hotloop scheduler * cleanup comments * test more K block sizes * unhardcode Ks, sort of * add large odd test case * fix build for quant * add comment to hot loop scheduler and rename enum * reformat * reword the pipeline description * reformat * address review / add static asserts / typo fix * update changelog
This commit is contained in:
@@ -275,4 +275,20 @@ CK_TILE_DEVICE static constexpr auto get_device_arch()
|
||||
return gfx12_t{};
|
||||
#endif
|
||||
}
|
||||
|
||||
enum LLVMSchedGroupMask : int32_t
|
||||
{
|
||||
NONE = 0,
|
||||
ALU = 1 << 0,
|
||||
VALU = 1 << 1,
|
||||
SALU = 1 << 2,
|
||||
MFMA = 1 << 3,
|
||||
VMEM = 1 << 4,
|
||||
VMEM_READ = 1 << 5,
|
||||
VMEM_WRITE = 1 << 6,
|
||||
DS = 1 << 7,
|
||||
DS_READ = 1 << 8,
|
||||
DS_WRITE = 1 << 9,
|
||||
ALL = (DS_WRITE << 1) - 1,
|
||||
};
|
||||
} // namespace ck_tile
|
||||
|
||||
Reference in New Issue
Block a user