composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 17:00:18 +00:00

Author	SHA1	Message	Date
Chao Liu	87d8740bf5	added lds double buffer (on C dimension) for implicit gemm v1r3, as a result, it should achieve 90% of peak for all filter sizes, on CHWN format	2019-04-19 17:49:25 -05:00
Chao Liu	6d066ede00	added implicit gemm v1r3, refactored decomposition of wei tensor (loop over y, x first, and C second) to allow easy lds double buffer on C	2019-04-19 16:46:29 -05:00
Chao Liu	5ce19234a4	added GridwiseConvolutionImplicitGemm_v1r2_nchw_cyxk_khwn	2019-04-19 14:22:02 -05:00
Chao Liu	19f17df47a	implicit gemm v1r2: adding support for nchw	2019-04-18 11:49:09 -05:00
Chao Liu	17f3d2d4bc	refactor ConstantTensorDescriptor and functional	2019-04-16 17:36:18 -05:00
Chao Liu	3ffb824ed3	refactor	2019-04-13 16:21:44 -05:00
Chao Liu	7d8daba741	tuning	2019-04-13 15:53:03 -05:00
Chao Liu	00899f191b	implicit gemm v1r2: only load 1d filter	2019-04-13 11:19:17 -05:00
Chao Liu	96ee9571e2	tuned implicit gemm v1 for 3x3 on AMD to 82%. Fixed a bug in 4d tensor blockwise copy.	2019-04-10 18:10:18 -05:00
Chao Liu	71434918bf	add 3x3 28x28 case	2019-04-10 11:33:57 -05:00
Chao Liu	e624df922d	enabled ds_read_b128 and ds_write_b128 on hip c++	2019-04-09 19:05:44 -05:00
Chao Liu	1bd880a674	refactor	2019-04-09 16:13:14 -05:00
Chao Liu	796f72e26e	load smaller weight tensor	2019-04-08 23:57:13 -05:00
Chao Liu	5b36aeadeb	refactor	2019-04-08 16:13:17 -05:00
Chao Liu	cc0fa73acd	added implicit_gemm_v1 lds double_buffer	2019-04-08 14:11:55 -05:00
Chao Liu	c075d3f7d9	add more assertion	2019-04-08 12:02:56 -05:00
Chao Liu	268d1c717c	tidy up	2019-04-08 10:48:29 -05:00
Chao Liu	c9fa46af0b	debugging implicit gemm v1: use 10d tensor output	2019-04-08 10:27:32 -05:00
Chao Liu	b57d60c0b7	refactor	2019-04-06 19:01:59 -05:00
Chao Liu	5245a0162b	clean up	2019-04-06 16:27:07 -05:00
Chao Liu	7a251a0922	debugged: CUDA should use its own float4 definition	2019-04-06 15:44:53 -05:00
Chao Liu	0983d205ad	debugging	2019-04-05 18:54:26 -05:00
Chao Liu	605afd0fb6	Merge branch 'master' into inline_asm_v2	2019-04-04 18:40:23 -05:00
Jing Zhang	6a3f3f951d	add	2019-04-03 17:08:14 -05:00
Chao Liu	bdbc0eaad1	cleaning up dead code	2019-04-02 17:58:44 -05:00
Jing Zhang	114fdb58af	4x4	2019-04-01 17:02:02 -05:00
Chao Liu	e43d7bc63c	refactor	2019-04-01 15:17:22 -05:00
Chao Liu	766b0a9eaf	experimenting	2019-03-24 12:09:57 -05:00
Chao Liu	f35c64eb78	experimenting	2019-03-23 19:35:31 -05:00
Chao Liu	68ae0731f1	experiment with hip compiler	2019-03-22 19:40:01 -05:00
Chao Liu	52ae168b17	adding inline asm	2019-03-22 17:05:23 -05:00
Chao Liu	fdaaaa500c	Merge branch 'direct_fp16'	2019-03-22 16:46:41 -05:00
Chao Liu	8c923db423	hip build	2019-03-22 14:22:58 -05:00
Chao Liu	e72eece8fc	added int8x4	2019-03-21 09:59:40 -05:00
Chao Liu	050a1a6890	adding int8 direct that reads pre-vectorized data	2019-03-19 00:05:41 -05:00
Chao Liu	4f0fc72e91	adding fp16 direct that reads pre-vectorized data	2019-03-18 15:03:17 -05:00
Chao Liu	03eef73c5b	refactoring block copy	2019-03-17 15:36:38 -05:00
Chao Liu	fd8de38417	refactor	2019-03-16 10:50:46 -05:00
Chao Liu	2c9b8c2432	update hip build	2019-03-12 17:20:11 -05:00
Chao Liu	0c88a3d891	update	2019-03-09 13:51:08 -06:00
Chao Liu	ce0182ce05	Merge branch 'master' into implicit_gemm_fp16	2019-03-09 13:46:47 -06:00
Chao Liu	7a97087713	refactor	2019-03-09 12:59:47 -06:00
Chao Liu	8edbc659b8	refactor	2019-03-06 12:34:31 -06:00
Chao Liu	04c5527d07	refactor	2019-03-04 17:09:20 -06:00
Chao Liu	5fd40ad768	clean up	2019-03-02 17:27:37 -06:00
Chao Liu	4543d17a71	refactor	2019-02-19 22:07:15 -06:00
Chao Liu	b2b622e8b2	refactor	2019-02-19 20:34:21 -06:00
Chao Liu	a65ef90308	device_implicit_gemm_convolution_1_chwn_csrk_khwn: use tensor copy (instead of pointwise) for writing output, 3x3 increased from 78% to 84%, 5x5 from 80% to 84%	2019-02-19 11:47:46 -06:00
Chao Liu	1cb9885058	add anther verision of batch gemm	2019-02-17 01:50:57 -06:00
Chao Liu	9f2e8f8bb4	2-type implicit gemm using chwn	2019-02-15 22:51:51 -06:00

1 2

53 Commits