composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-12 17:26:00 +00:00

Author	SHA1	Message	Date
Chao Liu	5c7cec1115	Code clean up (#20 ) * tuning para, * testing on v100 * add fp16 * remove deprecated tensor descriptor * sync with miopen * update build script Co-authored-by: Jing Zhang <jizhan@amd.com>	2020-06-23 20:31:27 -05:00
Chao Liu	c5da0377fb	Added bwd data v3r1 v4r1, tweaking v1 (#10 ) * Added bwd data v3r1: breaking down compute into a series of load balanced GEMM, and launch in a single kernel * Added bwd data v4r1: like v3r1, but launch GEMMs in multiple kernels * Tweaked v1r1 and v1r2 (atomic) on AMD GPU	2020-01-20 10:20:03 -06:00
Chao Liu	efd419ecbe	refactored implicit gemm v1r3	2019-07-29 15:01:01 -05:00
Chao Liu	c82b833d8e	change build	2019-06-12 10:47:25 -05:00
Chao Liu	e6c86f81b5	add cuda extract_asm script	2019-04-02 20:26:58 -05:00
Chao Liu	bdbc0eaad1	cleaning up dead code	2019-04-02 17:58:44 -05:00