composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-18 09:38:17 +00:00

Author	SHA1	Message	Date
Chao Liu	5b7a18c506	experimenting global and buffer load/store	2019-09-18 02:05:42 -05:00
Chao Liu	c7a6545ec4	experimenting global and buffer load/store	2019-09-18 01:37:28 -05:00
Chao Liu	9f46cdf5fa	experimenting global and buffer load/store	2019-09-18 00:15:57 -05:00
Chao Liu	e1a67b693e	refactor	2019-09-17 11:19:15 -05:00
Chao Liu	bf97542846	add lds doble buffer to nchw padded v4r1 and v4r4	2019-09-15 16:58:16 -05:00
Chao Liu	d4878d99f9	initial padding support for nchw	2019-09-13 23:30:48 -05:00
Chao Liu	bd7a230006	clean up	2019-09-12 14:55:46 -05:00
Chao Liu	1f70524471	padding for chwn is functional	2019-09-12 01:12:08 -05:00
Chao Liu	724e984bff	enabling padding for chwn format	2019-09-11 01:13:13 -05:00
Chao Liu	238d58c2f5	adding tensor_view	2019-08-20 17:29:54 -05:00
Chao Liu	08bf57b01c	bug fix: BlockwiseGenericTensorSliceCopy_v2::MoveDstSlicingWindow	2019-08-15 15:12:13 -05:00
Chao Liu	740149fcf1	clean up	2019-08-13 17:26:00 -05:00
Chao Liu	8bdaba51f8	clean up	2019-08-13 00:37:23 -05:00
Chao Liu	fab2f10a55	clean up	2019-08-12 15:48:35 -05:00
Chao Liu	4908fe3fdc	tweak on amd	2019-08-08 12:14:06 -05:00
Chao Liu	a9b2b1dcd7	added ThreadwiseGenericTensorSliceCopy_v2r1	2019-08-08 02:42:52 -05:00
Chao Liu	701b7341f0	clean up	2019-08-07 19:25:54 -05:00
Chao Liu	bc9ea646f8	use ford/for instead of static_ford/static_for in threadwise copy, somehow register spill is greatly reduced on AMD	2019-08-07 19:09:13 -05:00
Chao Liu	5636576f9b	bug fix in ford, forgot to reorder lengths	2019-08-07 18:27:10 -05:00
Chao Liu	9d99a58072	adding ThreadwiseGenericTensorSliceCopy_v1r2	2019-08-07 16:51:14 -05:00
Chao Liu	1b3c2e4035	reworked ThreadwiseGenericTensorSliceCopy_v1	2019-08-07 00:52:13 -05:00
Chao Liu	41cdde99e5	add looping Orders into ford and static_ford	2019-08-06 20:23:11 -05:00
Chao Liu	0271338ed4	added ReorderGiveOld2New() in Sequence and ConstantTensorDescriptor	2019-08-06 18:48:05 -05:00
Chao Liu	fdcfae3a62	reimplement threadwise copy	2019-08-06 17:41:58 -05:00
Chao Liu	adc1008836	tweak	2019-08-03 15:05:25 -05:00
Chao Liu	4a1e97cf86	tweak	2019-08-03 14:33:39 -05:00
Chao Liu	c01af89928	added new tensor copy operator	2019-08-03 00:02:24 -05:00
Chao Liu	b9663356ff	experimenting new merged tensor copy	2019-08-02 01:57:01 -05:00
Chao Liu	08cbac98cc	added (1x4)x(2x4) threadwise gemm	2019-07-30 18:20:55 -05:00
Chao Liu	cd8de11218	experimenting new merged tensor copy	2019-07-30 09:35:54 -05:00
Chao Liu	efd419ecbe	refactored implicit gemm v1r3	2019-07-29 15:01:01 -05:00
Chao Liu	9ba3b49158	adding implicit gemm v4r4	2019-07-28 19:39:57 -05:00
Chao Liu	ce4ec7dcaa	update build	2019-07-05 16:33:48 -05:00
Chao Liu	96d73c2154	Merge remote-tracking branch 'origin/build_0705' into implicit_gemm_v4r2	2019-07-05 16:29:20 -05:00
Chao Liu	df29a7e097	enabling vector load on merged dim	2019-06-24 11:20:19 -05:00
Chao Liu	37b82b7e54	refactor	2019-06-19 22:26:45 -05:00
Chao Liu	21f7e9f103	refactor	2019-06-19 17:43:56 -05:00
Chao Liu	23f633cdc5	clean up for miopen	2019-06-17 20:14:18 -05:00
Chao Liu	33d1e0e2e5	refactoring for miopen	2019-06-17 14:58:44 -05:00
Chao Liu	1566b31736	reorginzed files	2019-06-13 15:12:12 -05:00

40 Commits