composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 17:00:18 +00:00

Author	SHA1	Message	Date
Chao Liu	7a251a0922	debugged: CUDA should use its own float4 definition	2019-04-06 15:44:53 -05:00
Chao Liu	f6cb5b846d	debugging	2019-04-06 15:10:40 -05:00
Chao Liu	0983d205ad	debugging	2019-04-05 18:54:26 -05:00
Chao Liu	dabfa77fc6	clipboard float4 copy and paste C++ code	2019-04-05 02:13:29 -05:00
Chao Liu	605afd0fb6	Merge branch 'master' into inline_asm_v2	2019-04-04 18:40:23 -05:00
Chao Liu	19b4179798	unroll even-odd loop	2019-04-04 17:36:02 -05:00
Chao Liu	fbc7817bbb	add asm into lds_double_buffer version	2019-04-04 10:38:49 -05:00
Chao Liu	05d7a0875c	enable 128x128 block gemm	2019-04-03 19:04:17 -05:00
Jing Zhang	6a3f3f951d	add	2019-04-03 17:08:14 -05:00
Jing Zhang	0d6aa311e9	inline asm	2019-04-03 14:34:41 -05:00
Chao Liu	6290e0b080	puting gridwise convolution into its own class	2019-04-02 20:18:01 -05:00
Chao Liu	0b41ca2d9e	puting gridwise convolution into its own class	2019-04-02 19:37:02 -05:00
Chao Liu	bdbc0eaad1	cleaning up dead code	2019-04-02 17:58:44 -05:00
Jing Zhang	114fdb58af	4x4	2019-04-01 17:02:02 -05:00
Chao Liu	85c1ff1cea	change perf config for debuggging	2019-04-01 16:04:02 -05:00
Chao Liu	23c626a941	changed to dynamics lds allocation	2019-04-01 16:01:43 -05:00
Chao Liu	7c098ddc0e	refactori	2019-04-01 15:24:27 -05:00
Chao Liu	e43d7bc63c	refactor	2019-04-01 15:17:22 -05:00
Chao Liu	d6d9a8e4ce	Jing's ds_read inline asm	2019-03-28 19:46:29 -05:00
Chao Liu	766b0a9eaf	experimenting	2019-03-24 12:09:57 -05:00
Chao Liu	f35c64eb78	experimenting	2019-03-23 19:35:31 -05:00
Chao Liu	68ae0731f1	experiment with hip compiler	2019-03-22 19:40:01 -05:00
Chao Liu	52ae168b17	adding inline asm	2019-03-22 17:05:23 -05:00
Chao Liu	fdaaaa500c	Merge branch 'direct_fp16'	2019-03-22 16:46:41 -05:00
Chao Liu	8c923db423	hip build	2019-03-22 14:22:58 -05:00
Chao Liu	e72eece8fc	added int8x4	2019-03-21 09:59:40 -05:00
Chao Liu	02d72160dc	adding int8 direct that reads pre-vectorized data	2019-03-19 01:30:28 -05:00
Chao Liu	050a1a6890	adding int8 direct that reads pre-vectorized data	2019-03-19 00:05:41 -05:00
Chao Liu	79d9b1084b	adding fp16 direct that reads pre-vectorized data	2019-03-18 18:16:02 -05:00
Chao Liu	4f0fc72e91	adding fp16 direct that reads pre-vectorized data	2019-03-18 15:03:17 -05:00
Chao Liu	03eef73c5b	refactoring block copy	2019-03-17 15:36:38 -05:00
Chao Liu	a0584426ff	refactoring ConstantTensorDescriptor	2019-03-17 03:22:41 -05:00
Chao Liu	fd8de38417	refactor	2019-03-16 10:50:46 -05:00
Chao Liu	2c9b8c2432	update hip build	2019-03-12 17:20:11 -05:00
Chao Liu	0c88a3d891	update	2019-03-09 13:51:08 -06:00
Chao Liu	ce0182ce05	Merge branch 'master' into implicit_gemm_fp16	2019-03-09 13:46:47 -06:00
Chao Liu	f54cad7d4f	refactor	2019-03-09 13:39:24 -06:00
Chao Liu	7a97087713	refactor	2019-03-09 12:59:47 -06:00
Chao Liu	43cd8529c2	refactor	2019-03-09 12:52:16 -06:00
Chao Liu	8edbc659b8	refactor	2019-03-06 12:34:31 -06:00
Chao Liu	04c5527d07	refactor	2019-03-04 17:09:20 -06:00
Chao Liu	5fd40ad768	clean up	2019-03-02 17:27:37 -06:00
Chao Liu	4543d17a71	refactor	2019-02-19 22:07:15 -06:00
Chao Liu	b2b622e8b2	refactor	2019-02-19 20:34:21 -06:00
Chao Liu	a65ef90308	device_implicit_gemm_convolution_1_chwn_csrk_khwn: use tensor copy (instead of pointwise) for writing output, 3x3 increased from 78% to 84%, 5x5 from 80% to 84%	2019-02-19 11:47:46 -06:00
Chao Liu	50b96745c6	gridwise_implicit_gemm_convolution_1_chwn_csrk_khwn use khwn for thread C data now	2019-02-17 02:28:20 -06:00
Chao Liu	1cb9885058	add anther verision of batch gemm	2019-02-17 01:50:57 -06:00
Chao Liu	9f2e8f8bb4	2-type implicit gemm using chwn	2019-02-15 22:51:51 -06:00
Chao Liu	d7c84daf66	delete useless code	2019-02-15 22:24:18 -06:00
Chao Liu	b2888adfbe	change file extension to hip.hpp and hip.cpp	2019-02-15 02:13:21 -06:00

1 2 3

122 Commits