composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-29 19:28:33 +00:00

Author	SHA1	Message	Date
Alan Turner	3adf36beaa	Move error throwing to default case	2024-11-18 18:10:11 +00:00
Alan Turner	57cdd70b7c	Add gfx942 to supported architectures	2023-10-23 17:13:33 +00:00
Alan Turner	70eefcf4f2	Change config_header to literal ""	2023-10-10 23:15:03 +00:00
Alan Turner	37c3bc1a44	Add empty config.h to headers	2023-10-10 23:06:58 +00:00
Alan Turner	7d602254ff	Add build_interface to ck_headers	2023-10-10 22:30:40 +00:00
Alan Turner	d01af027c1	Update embed.cmake	2023-10-10 20:05:46 +00:00
Alan Turner	36674bdc8a	Changes to embed.cmake	2023-10-10 16:39:22 +00:00
Umang Yadav	ba251e4a11	Formatting and put find_package(hip) behind JIT_LIB flag	2023-09-29 15:14:55 +00:00
Umang Yadav	000c8bcf79	Merge branch 'migx-jit-lib' into migraphx	2023-09-28 14:06:00 +00:00
Alan Turner	8f9c0243c7	Merge branch 'develop' into migx-jit-lib	2023-09-22 23:27:30 +00:00
Alan Turner	181ea79a3d	Avoid pipeline version 2 when k % kpb != 0	2023-09-22 20:09:41 +00:00
Alan Turner	8a5e3fb02b	Fix sequence regex	2023-09-21 22:14:10 +00:00
Alan Turner	d967621546	Add Descriptor and Run to device op	2023-09-20 00:52:47 +00:00
Alan Turner	611196d598	Fix cmake, constexpr issupported	2023-08-29 17:26:15 -07:00
Alan Turner	45ff21e156	Add jit lib for batched_gemm_softmax_gemm	2023-08-29 12:48:41 -07:00
Alan Turner	e8b54cb376	Update parse_instance_strings	2023-08-29 11:04:29 -07:00
Alan Turner	4f7d9bbed8	Add descriptor class and run method	2023-08-28 16:12:50 -07:00
Alan Turner	4100d1d821	Merge remote-tracking branch 'origin/develop' into migx-flash-attn	2023-08-23 13:28:55 -07:00
Jun Liu	c8a8385fdd	[HotFix] add config and version files to pass on build info (#856 ) * experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file	2023-08-23 11:36:17 -07:00
zjing14	8ebea3a56e	add generic instances (#858 ) Co-authored-by: Jing Zhang <jizha@amd.com>	2023-08-23 09:18:10 -05:00
zjing14	ca3115e7e8	Ck profiler splitk (#857 ) * updated regular gemm * update ckProfiler * fixed gtests --------- Co-authored-by: Jing Zhang <jizha@amd.com>	2023-08-22 16:54:34 -07:00
Rostyslav Geyyer	eac50708d9	Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm (#853 ) * Add ComputeType arg to splitk device and gridwise ops * Update for gridwise op compatibility * Update bf16 and int8 splitk gemm examples with ComputeType * Add instances * Update ckProfiler for mixed precision cases * Add a mixed precision splitK gemm client example --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-08-22 09:34:49 -05:00
Bartlomiej Wroblewski	d4c84256f7	Implement DPP8 based GEMM for Navi21 (#826 )	2023-08-14 15:46:27 -05:00
rocking	f60f0a5e03	Refactor pool fwd (#815 ) * Do not hardcode stride * devicePool2DFwd Inherit devicePool3DFwd * Move instance declaration out of common * Add dilation * use the pool3d rank, because pool2d inherit pooo3d * calculate Do Ho Wo for the dilation * Fix header name * Modify ckProfiler * Remove pool2d instance * Remove pool2d in profiler * Remove pool2d and add dilation * In to client example, this commit revise following: 1. Add dilation. 2. Use pool3d to implement pool2d * Refine naming and IsSupportedArgument() * Add dilation to maxpool bwd example * clang format * 1. Remove useless header 2. Fix copyright 3. Refine naming * Add layout parameter to pool fwd * clang format * Fix merge error * Fix compile error * Remove layout parameter in derived class * Refine changlog * Fix compile error * Fix compiler error * Add layout to external api and profiler	2023-08-15 02:25:28 +08:00
rocking	03b8119e2e	Add Normalization splitk instances (#829 ) * Add normalization splitK to layernorm and groupnorm instances * Fix bug of GetKPerThread() * Refine naming * clang format	2023-08-12 01:31:31 +08:00
Bartłomiej Kocot	472fa029ba	Enable grouped conv with small K or C (#822 ) * Enable grouped conv with small K or C * Add missing instances * Refactor grouped conv fwd instances * Fix fp16 instances since it supports src_per_vec %2 = 0 * Add generic instances	2023-08-09 10:40:55 -05:00
Illia Silin	08eb176929	Allow building CK for specific data types and split off last remaining DL instances. (#830 ) * properly split conv_nd_bwd_data instances * split conv2d_fwd instance data types * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm * split the tests by data types where possible * filter examples by DTYPES * split few remaining examples by DTYPES * filter most instances by DTYPES * add new lines at end of headers, fix grouped_gemm profiler * fix syntax * split the ckprofiler instances by DTYPES * split the conv2d and quantization DL and XDL instances * fix the splitting of conv2d DL instances * split softmax and pool_fwd tests for fp16 and fp32 types * fix syntax * fix the dl_int8 quantization instances isolation	2023-08-07 14:56:10 -07:00
Po Yen Chen	f7cc8c3b03	Update tuning parameter & compilation options of DeviceGemmXdl<> instance (layout=TT) (#819 ) * Enable pipeline v2 opt for layout=TT instance * Use better thread mapping for reading A tile * Conditionally enable pipeline v2 opt * Allow enabling only fp16 gemm instances in profiler * Fix formatting error * Fix compilation error if we enable fp32 in profiler	2023-08-02 10:32:22 -05:00
carlushuang	e7dca79d27	initial stream-k implementation with example (#699 ) * initial stream-k implementation with example * fix unexpected change in err * improve a little bit performance by reorganize pipeline. * improve perf a little bit by swizzle block idx * add profiler * update example * fix spelling * shrink karg for streamk * support dynamic buffer using memory coherence glc_slc bit from template * control memory coherence while construct dynamic buffer * update reduction for streamk(not ready yet) * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting * fix build issue * fix several bug * now result is correct, everything works (but has scratch) * remove scratch by manually reset coordinate * update device code * fix a bug in final reduce * fix something in example * update async memset * fix enum as camel case * modify coherence enum name * clean code and use atomic streamk by default * remove unused var * throw exception if have empty pointer * fix format * fix CI warning * fix type in init * modify CI error * filter out on gfx10+ * restore changed example code --------- Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>	2023-07-26 14:18:15 -05:00
Illia Silin	9195435c77	Disable DL kernels by default. (#816 )	2023-07-26 11:06:45 -05:00
Po Yen Chen	f4ea560112	Speed-up global memory reading for GEMM instances (#813 ) * Use better ThreadClusterLengths to speed up * Update B tile reading pattern for layout=NN instance	2023-07-25 18:54:47 -05:00
ltqin	50643dd555	Add bias scalar vectorload = 1 for gemm bias gemm (#791 ) * first change bias load * add bias dim and scalervector parameter * make CDE0BlockTransferSrcVectorDim not work * changse toinstance * add limit for CDE0BlockTransferSrcScalarPerVector	2023-07-24 20:08:15 -05:00
Bartłomiej Kocot	10732847e7	Grouped conv bwd wei NDHWGC/NDHWGK (#804 )	2023-07-21 12:00:55 -05:00
Bartłomiej Kocot	49180fd60b	Grouped 3d conv backward data support (#799 ) * Grouped 3d conv backward data support * Fix comments	2023-07-18 11:01:33 -05:00
Illia Silin	189ea3b9aa	Add mechanism to build CK for select data types, add Navi3x CI. (#790 ) * allow building CK for specific data types * add CI build and test stage on Naiv3x without some int8 instances * add missing gemm fp16 instances * add the changes to the missed cmake file * add empty lines at end of source files * Do not build quantization client example on navi3 in CI * disable batched_gemm_multi_d_int8 instances with DTYPES * disable device_conv2d_bwd_data_instance with DTYPES * fix ckprofiler for conv_bwd_data for int8 * properly isolate the conv_bwd_data int8 instances * remove empty line	2023-07-17 18:02:42 -07:00
Bartłomiej Kocot	1ee99dcaa6	Support NHWGC conv2d_bwd_weight (#769 ) * Support NHWGC conv2d_bwd_weight * Fix client example * Fix client example * Fix comments * Redesign grouped_conv_bwd_weight instances * Clang format fix --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-07-12 08:25:02 -05:00
Po Yen Chen	850144a0d3	Split GEMM instance library & enable pipeline v2 optimization (#783 ) * Move source file into sub-directories * Add missing include directive * Split DeviceGemmXdl<> fp16 instances * Fix format * Remove unnecessary CMakeLists.txt * Add macros to toggle new features * Remove debug message * Turn off GEMM v2 pipeline optimization by default * Fix format * Extract duplicated string as list * Enlarge indent in CMakeLists.txt	2023-07-06 10:59:35 -05:00
Adam Osewski	f4dfc060b7	Move Device Ops implementations into impl directory. (#777 ) Co-authored-by: Adam Osewski <aosewski@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-07-06 16:15:51 +02:00
Bartlomiej Kocot	2b0b6d9f46	Fix copyrights for DeviceBatchedGemmMultipleD_Dl	2023-07-06 15:50:27 +02:00
Bartłomiej Kocot	63388e84ab	Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757 ) * Support bf16/f32/f16 and NHWGC conv2d_bwd_data * Add interface test * clang format * Comment fixes * Add more friendly error message	2023-06-21 08:20:31 -05:00
Qianfeng	0d9118226b	Padded Generic Kernel Instance (#730 ) * Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting * Move the generic kernel instance to be the first of the instance list for elementwise op of normalization * Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax * Add testing of GetGenericInstance() in client_example of Softmax * Revert "Add testing of GetGenericInstance() in client_example of Softmax" This reverts commit `f629cd9a93`. * Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax" This reverts commit `a9f0d000eb`. * Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm * Move generic kernel instance to separate tuple for elementwise op of normalization * Remove un-used files for softmax instance * Store generic kernel instance to separate tuple for softmax * Add IsSupported checking for generic instance to client example of softmax * Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization * clang-format fix * Remove int8 from softmax instances --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-06-16 23:43:11 -05:00
Alan Turner	ac580f77a8	Merge remote-tracking branch 'origin/develop' into migx-jit-lib	2023-06-16 08:29:24 -07:00
Alan Turner	707d626161	Add int32 to fused types with lowered scalarspervector	2023-06-16 08:14:30 -07:00
zjing14	309b1c6461	Fixed Weight layout of grouped_conv 3d fwd (#743 ) * Changed wei layout * changed layout for examples * fixed client example --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com>	2023-06-15 10:19:33 -05:00
Rostyslav Geyyer	54b68eb343	Add generic kernel instances for ck::tensor_operation::device::DeviceGemmMultipleD (#741 ) * Add generic instance gemm_add_add_fastgelu * Add a client example for generic gemm_add_add_fastgelu * Update CMakeLists * Format * Format * Add generic instance gemm_add_fastgelu * Format * Add a gemm_add_fastgelu client example * Format * Add generic instance gemm_fastgelu * Format * Fix argument order * Add gemm_fastgelu client example * Add exceptions if argument is not supported	2023-06-14 16:06:56 -05:00
Bartłomiej Kocot	fc9f97568f	Add DeviceBatchedGemmMultipleD_Dl (#732 ) * Add DeviceBatchedGemmMultipleD_Dl * Fix batched_gemm tests * Fix comments * test_batched_gemm_multi_d fixes * Fix args for isSupported batchedGemmMultipleDDl * Disable tests for gfx90a	2023-06-12 08:37:15 -05:00
ltqin	0ede66de54	Fix flash attn mask bug (#733 ) * add check input parameter * add instance for vector load = 1 * move gerneral instance to first pos * fix read bias code * regular code for bias load --------- Co-authored-by: zjing14 <zhangjing14@gmail.com>	2023-06-12 08:35:31 -05:00
Alan Turner	d7173bc68f	Add missing xdlop archs to supported set	2023-06-09 08:29:27 -07:00
Alan Turner	84c5bec1d6	Reduce scalars per vector for non-int8 return type	2023-06-07 16:11:54 -07:00
Alan Turner	b9e4254b42	Add system to include directories	2023-06-07 11:17:11 -07:00

1 2 3 4 5

201 Commits