composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 08:50:17 +00:00

Author	SHA1	Message	Date
Chao Liu	c234741045	add a missing file	2019-04-02 20:19:24 -05:00
Chao Liu	6290e0b080	puting gridwise convolution into its own class	2019-04-02 20:18:01 -05:00
Chao Liu	0b41ca2d9e	puting gridwise convolution into its own class	2019-04-02 19:37:02 -05:00
Chao Liu	bdbc0eaad1	cleaning up dead code	2019-04-02 17:58:44 -05:00
Chao Liu	7c098ddc0e	refactori	2019-04-01 15:24:27 -05:00
Chao Liu	e43d7bc63c	refactor	2019-04-01 15:17:22 -05:00
Jing Zhang	d058d16407	merged ds_read and gemm, but register allocation is mess	2019-03-29 14:37:36 -05:00
Jing Zhang	d700ce86d3	in press	2019-03-29 14:27:49 -05:00
Jing Zhang	1d9b970186	inline gemm 8x8	2019-03-29 11:09:03 -05:00
Jing Zhang	57a8ccf3ba	in progress	2019-03-28 21:53:24 -05:00
Jing Zhang	66d5e5b344	in progress	2019-03-28 21:08:29 -05:00
Jing Zhang	f7498d66f9	fixed conflict	2019-03-28 20:01:53 -05:00
Jing Zhang	5fbf4f33d3	inline	2019-03-28 20:00:31 -05:00
Chao Liu	d6d9a8e4ce	Jing's ds_read inline asm	2019-03-28 19:46:29 -05:00
Jing Zhang	2058bec8cf	fused functions	2019-03-28 18:47:32 -05:00
Chao Liu	766b0a9eaf	experimenting	2019-03-24 12:09:57 -05:00
Chao Liu	f35c64eb78	experimenting	2019-03-23 19:35:31 -05:00
Chao Liu	22114959da	adding inline asm for 16x4 gemm	2019-03-22 19:42:38 -05:00
Chao Liu	68ae0731f1	experiment with hip compiler	2019-03-22 19:40:01 -05:00
Chao Liu	52ae168b17	adding inline asm	2019-03-22 17:05:23 -05:00
Chao Liu	fdaaaa500c	Merge branch 'direct_fp16'	2019-03-22 16:46:41 -05:00
Chao Liu	18a81e356e	adding assembly	2019-03-22 16:33:04 -05:00
Chao Liu	8c923db423	hip build	2019-03-22 14:22:58 -05:00
Chao Liu	e72eece8fc	added int8x4	2019-03-21 09:59:40 -05:00
Chao Liu	02d72160dc	adding int8 direct that reads pre-vectorized data	2019-03-19 01:30:28 -05:00
Chao Liu	050a1a6890	adding int8 direct that reads pre-vectorized data	2019-03-19 00:05:41 -05:00
Chao Liu	18ffbd6802	adding fp16 direct that reads pre-vectorized data	2019-03-18 18:16:16 -05:00
Chao Liu	79d9b1084b	adding fp16 direct that reads pre-vectorized data	2019-03-18 18:16:02 -05:00
Chao Liu	2832520418	adding fp16 direct that reads pre-vectorized data	2019-03-18 15:09:52 -05:00
Chao Liu	4f0fc72e91	adding fp16 direct that reads pre-vectorized data	2019-03-18 15:03:17 -05:00
Chao Liu	7faf269c99	refactor	2019-03-17 21:48:46 -05:00
Chao Liu	03eef73c5b	refactoring block copy	2019-03-17 15:36:38 -05:00
Chao Liu	a0584426ff	refactoring ConstantTensorDescriptor	2019-03-17 03:22:41 -05:00
Chao Liu	fd8de38417	refactor	2019-03-16 10:50:46 -05:00
Chao Liu	2c9b8c2432	update hip build	2019-03-12 17:20:11 -05:00
Chao Liu	0c88a3d891	update	2019-03-09 13:51:08 -06:00
Chao Liu	ce0182ce05	Merge branch 'master' into implicit_gemm_fp16	2019-03-09 13:46:47 -06:00
Chao Liu	f54cad7d4f	refactor	2019-03-09 13:39:24 -06:00
Chao Liu	7a97087713	refactor	2019-03-09 12:59:47 -06:00
Chao Liu	43cd8529c2	refactor	2019-03-09 12:52:16 -06:00
Chao Liu	8edbc659b8	refactor	2019-03-06 12:34:31 -06:00
Chao Liu	04c5527d07	refactor	2019-03-04 17:09:20 -06:00
Chao Liu	5fd40ad768	clean up	2019-03-02 17:27:37 -06:00
Chao Liu	4543d17a71	refactor	2019-02-19 22:07:15 -06:00
Chao Liu	b2b622e8b2	refactor	2019-02-19 20:34:21 -06:00
Chao Liu	a65ef90308	device_implicit_gemm_convolution_1_chwn_csrk_khwn: use tensor copy (instead of pointwise) for writing output, 3x3 increased from 78% to 84%, 5x5 from 80% to 84%	2019-02-19 11:47:46 -06:00
Chao Liu	50b96745c6	gridwise_implicit_gemm_convolution_1_chwn_csrk_khwn use khwn for thread C data now	2019-02-17 02:28:20 -06:00
Chao Liu	1cb9885058	add anther verision of batch gemm	2019-02-17 01:50:57 -06:00
Chao Liu	9f2e8f8bb4	2-type implicit gemm using chwn	2019-02-15 22:51:51 -06:00
Chao Liu	d7c84daf66	delete useless code	2019-02-15 22:24:18 -06:00

1 2 3

139 Commits