Commit Graph

212 Commits

Author SHA1 Message Date
Bartłomiej Kocot
f4999cd99a Add grouped conv bwd weight dl instances and new layout (#897)
* Add grouped conv bwd weight dl instances and new layout

* Add M and N padding

* Remove todo comment

* Enable grouped conv fwd dl k,c=1 generic instance

* Comment fixes

[ROCm/composable_kernel commit: 475188ca2e]
2023-09-13 10:14:31 -05:00
zjing14
5bb25a9688 fixed fp8 issues (#894)
* fixed fp8 init; and reference gemm

* Update host_tensor_generator.hpp

* fixed convert

* fixed reference gemm

* fixed comments

* fixed comments

* fixed ci

* fixed computeType

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: a66d14edf2]
2023-09-12 22:17:56 -05:00
Rostyslav Geyyer
0752117077 Refactor f8_t, add bf8_t (#792)
* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments

[ROCm/composable_kernel commit: 62d4af7449]
2023-09-12 17:04:27 -05:00
Bartlomiej Wroblewski
b4064d1401 Add new instances and support for small cases in DPP8 GEMM (#896)
[ROCm/composable_kernel commit: 547dbcfbc2]
2023-09-12 10:05:23 -05:00
Haocong WANG
c2866bb432 [Navi3x] Add fp16/int8 wmma conv forward instances (#746)
* fix wmma gemm int8; add grouped conv int8 example

* Add int8 gemm-bilinear instances

* compile sanity check unknown

* Sanity pass + clang-format

* add int8 conv profiler instances

* solve merge conflict

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: 562b4cec48]
2023-09-07 21:59:26 -05:00
Bartlomiej Wroblewski
02f8f707e8 Redesign the DPP8 GEMM kernel to use warp-wise component (#863)
* Redesign the DPP8 GEMM kernel to use warp-wise component

* Review: Improve error messages

* Review: Remove unnecessary empty lines

* Review: Fix M, N per thread names

* Review: Rename mfma_input_type to dpp_input_type

* Review: Fix tensor adaptor; remove unnecessary element

* Review: Remove calls to dpp_gemm's MakeCDescriptor

* Review: Add blockwise doc, change function names to include dimension names

* Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file

* Review: Add __restrict__ keywords

* Review: Use MatrixPadder for padding A, B, C matrices

* Review: Remove hardcoded datatypes

* Review: Change names from FloatX to XDataType

* Review: Introduce AK0 and BK0 instead of a single K0

* Review: Remove construction of dpp_datatypes object

* Review: Rename DppInstrRunner to DppLanegroupGemm

[ROCm/composable_kernel commit: 37a8c1f756]
2023-09-06 11:44:09 -05:00
zjing14
29daafc158 added padding of K into gemm_v2r3 (#887)
* added kpad support into v2r3

* add generic instances

* fixed comments

* fixed mnk padding

* Update device_batched_gemm_xdl.hpp

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: 3786bfe1cc]
2023-09-06 10:15:52 -05:00
zjing14
762d558a06 Fixed fp8 gemm (#882)
* add generic instances; fixed initi with fp8

* fixed comment

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: a61b8b785e]
2023-09-06 09:59:20 -05:00
Bartłomiej Kocot
d79b1c5dd0 Add image to column kernel (#867)
* Add image to column kernel

* Add instances, tests, profiler, example

* Add client example

* Several fixes of image to column

* Fix variable name in device_image_to_column_impl

* Several fixes of image to column profiler

* Fix num_btype calculation

* Make new mesaurements for correct bytes calculation

[ROCm/composable_kernel commit: 0077eeb3be]
2023-09-05 10:11:40 -05:00
Bartłomiej Kocot
562be55437 Add nhwgc dl generic instances for grouped conv fwd (#879)
[ROCm/composable_kernel commit: 0c9a1d25b3]
2023-09-05 10:07:56 -05:00
zjing14
c79ecbccfb Grouped Gemm with Fixed K and N with SplitK (#818)
* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: f5ec04f091]
2023-08-31 09:22:12 -05:00
rocking
0b07461518 MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861)
* Add maxpool instances

* Rename index pool to max pool.

* Add maxpool bwd bf16 instances

* Add avg pool bwd instances

* Rename avgpool and maxpool to avg_pool3d and max_pool

* Add bf16 pool fwd instances

* Add max pool bwd to ckProfiler

* Add avg pool3d bwd to ckProfiler

* Add avg pool bwd test

* Fix bug of reference pool fwd (dilation)

* Fix bug of max pool bwd  (dilation and initZero)

* Support bf16 compute data type

* Force compute type be f32. Because atomicAdd only support f32

* Add max pool bwd test

* Rename folder

* Rename pool

* Add max pool bwd client example

* Add avg pool bwd client example

* Add missing workspace

* clang format

* Rename macro

* remove useless header

* remove useless layout

[ROCm/composable_kernel commit: 866377de18]
2023-08-31 21:01:50 +08:00
Bartłomiej Kocot
63466279a6 Add number of error when fail (#868)
[ROCm/composable_kernel commit: 9e86ebd62d]
2023-08-30 10:33:11 -05:00
zjing14
f8bcfe60ac add an example of customized type convert - bfp16_rtn (#869)
* add an example of customized bfp16_rtn

* fixed threadwise_copy

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: 38ada109ea]
2023-08-29 12:31:24 -05:00
zjing14
90c7e28a8d Fp16/fp8 mixed-precision Gemm with multiply+add fusion (#865)
* add compute_type

* add multiply_add ckProfiler

* add f8_fp16 support

* clean

* clean

* fixed lds size calc

* format

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: 31ea132aa2]
2023-08-28 16:27:32 -05:00
Jun Liu
2fb9a37881 [HotFix] add config and version files to pass on build info (#856)
* experiment with config file

* experiment with version.h config

* add more info to version.h

* minor updates

* minor updates

* fix case where DTYPE is not used

* large amount of files but minor changes

* remove white space

* minor changes to add more MACROs

* fix cmakedefine01

* fix issue with CK internal conflict

* fix define and define value

* fix clang-format

* fix formatting issue

* experiment with cmake

* clang format v12 to be consistent with miopen

* avoid clang-format for config file

[ROCm/composable_kernel commit: c8a8385fdd]
2023-08-23 11:36:17 -07:00
zjing14
d0632afcee add generic instances (#858)
Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: 8ebea3a56e]
2023-08-23 09:18:10 -05:00
zjing14
af751a689c Ck profiler splitk (#857)
* updated regular gemm

* update ckProfiler

* fixed gtests

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: ca3115e7e8]
2023-08-22 16:54:34 -07:00
Bartłomiej Kocot
87b3186c09 Fix transform and instances for grouped conv bwd data (#848)
* Fix transform and instances for grouped conv bwd data

* Add instances for small K and small C

* Remove workaround after fix

* Fix interface tests

[ROCm/composable_kernel commit: 595d23be14]
2023-08-22 11:25:41 -05:00
Rostyslav Geyyer
9ef85a359d Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm (#853)
* Add ComputeType arg to splitk device and gridwise ops

* Update for gridwise op compatibility

* Update bf16 and int8 splitk gemm examples with ComputeType

* Add instances

* Update ckProfiler for mixed precision cases

* Add a mixed precision splitK gemm client example

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: eac50708d9]
2023-08-22 09:34:49 -05:00
Bartlomiej Wroblewski
d4888118a5 Implement DPP8 based GEMM for Navi21 (#826)
[ROCm/composable_kernel commit: d4c84256f7]
2023-08-14 15:46:27 -05:00
rocking
4ee825732f Refactor pool fwd (#815)
* Do not hardcode stride

* devicePool2DFwd Inherit devicePool3DFwd

* Move instance declaration out of common

* Add dilation

* use the pool3d rank, because pool2d inherit pooo3d

* calculate Do Ho Wo for the dilation

* Fix header name

* Modify ckProfiler

* Remove pool2d instance

* Remove pool2d in profiler

* Remove pool2d and add dilation

* In to client example, this commit revise following:
1. Add dilation.
2. Use pool3d to implement pool2d

* Refine naming and IsSupportedArgument()

* Add dilation to maxpool bwd example

* clang format

* 1. Remove useless header
2. Fix copyright
3. Refine naming

* Add layout parameter to pool fwd

* clang format

* Fix merge error

* Fix compile error

* Remove layout parameter in derived class

* Refine changlog

* Fix compile error

* Fix compiler error

* Add layout to external api and profiler

[ROCm/composable_kernel commit: f60f0a5e03]
2023-08-15 02:25:28 +08:00
rocking
759fb65188 Add Normalization splitk instances (#829)
* Add normalization splitK to layernorm and groupnorm instances

* Fix bug of GetKPerThread()

* Refine naming

* clang format

[ROCm/composable_kernel commit: 03b8119e2e]
2023-08-12 01:31:31 +08:00
rocking
59ccb49898 Average pool backward deviceOP and example (#797)
* Add avgpool bwd reference code

* Refine naming

* Fix invalid in_element op in ref_conv

* Add example (only reference now)

* Add the full example of avgpool bwd

* Fix copyright

* Imitate MakeDescriptor from  transform_conv_bwd_data_to_gemm_v1.hpp

* rename channel to c from k

* Arrange the code

* Imitate the argument from conv bwd

* Implement invoker

* Fix order of parameter in example

* Refactor reference code for different dimension

* Support different stride

* Check if argument is valid

* Fix kernel parameter for NDHWC, fastest dimension C is not reduced

* Add more data type in example

* Fix bug in example

* calculate Do Ho Wo according to the dilation

* Remove useless header

* Add comment in reference code

* Add layout parameter

* Remove layout in derived class

* Refine reference comment

[ROCm/composable_kernel commit: 578142db3a]
2023-08-10 12:04:35 +08:00
Bartłomiej Kocot
6493db2758 Enable grouped conv with small K or C (#822)
* Enable grouped conv with small K or C

* Add missing instances

* Refactor grouped conv fwd instances

* Fix fp16 instances since it supports src_per_vec %2 = 0

* Add generic instances

[ROCm/composable_kernel commit: 472fa029ba]
2023-08-09 10:40:55 -05:00
Illia Silin
25efecc13e Allow building CK for specific data types and split off last remaining DL instances. (#830)
* properly split conv_nd_bwd_data instances

* split conv2d_fwd instance data types

* split the gemm, conv2d_fwd and batched_gemm_softamx_gemm

* split the tests by data types where possible

* filter examples by DTYPES

* split few remaining examples by DTYPES

* filter most instances by DTYPES

* add new lines at end of headers, fix grouped_gemm profiler

* fix syntax

* split the ckprofiler instances by DTYPES

* split the conv2d and quantization DL and XDL instances

* fix the splitting of conv2d DL instances

* split softmax and pool_fwd tests for fp16 and fp32 types

* fix syntax

* fix the dl_int8 quantization instances isolation

[ROCm/composable_kernel commit: 08eb176929]
2023-08-07 14:56:10 -07:00
Bartlomiej Kocot
26d32e1bb2 Change to github_issue prefix
[ROCm/composable_kernel commit: aac65a031e]
2023-08-03 16:38:28 +02:00
Bartlomiej Kocot
4818023b6c Rename the workaround to a proper issue name
[ROCm/composable_kernel commit: e6a826d35a]
2023-08-03 16:38:28 +02:00
Po Yen Chen
b6e54f589e Update tuning parameter & compilation options of DeviceGemmXdl<> instance (layout=TT) (#819)
* Enable pipeline v2 opt for layout=TT instance

* Use better thread mapping for reading A tile

* Conditionally enable pipeline v2 opt

* Allow enabling only fp16 gemm instances in profiler

* Fix formatting error

* Fix compilation error if we enable fp32 in profiler

[ROCm/composable_kernel commit: f7cc8c3b03]
2023-08-02 10:32:22 -05:00
carlushuang
92726361e0 initial stream-k implementation with example (#699)
* initial stream-k implementation with example

* fix unexpected change in err

* improve a little bit performance by reorganize pipeline.

* improve perf a little bit by swizzle block idx

* add profiler

* update example

* fix spelling

* shrink karg for streamk

* support dynamic buffer using memory coherence glc_slc bit from template

* control memory coherence while construct dynamic buffer

* update reduction for streamk(not ready yet)

* Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting

* fix build issue

* fix several bug

* now result is correct, everything works (but has scratch)

* remove scratch by manually reset coordinate

* update device code

* fix a bug in final reduce

* fix something in example

* update async memset

* fix enum as camel case

* modify coherence enum name

* clean code and use atomic streamk by default

* remove unused var

* throw exception if have empty pointer

* fix format

* fix CI warning

* fix type in init

* modify CI error

* filter out on gfx10+

* restore changed example code

---------

Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

[ROCm/composable_kernel commit: e7dca79d27]
2023-07-26 14:18:15 -05:00
Illia Silin
f83a1c84c3 Disable DL kernels by default. (#816)
[ROCm/composable_kernel commit: 9195435c77]
2023-07-26 11:06:45 -05:00
Po Yen Chen
b49076fed0 Speed-up global memory reading for GEMM instances (#813)
* Use better ThreadClusterLengths to speed up

* Update B tile reading pattern for layout=NN instance

[ROCm/composable_kernel commit: f4ea560112]
2023-07-25 18:54:47 -05:00
ltqin
79a9ecdc4b Add bias scalar vectorload = 1 for gemm bias gemm (#791)
* first change bias load

* add bias dim and scalervector parameter

* make CDE0BlockTransferSrcVectorDim not work

* changse toinstance

* add limit for CDE0BlockTransferSrcScalarPerVector

[ROCm/composable_kernel commit: 50643dd555]
2023-07-24 20:08:15 -05:00
Bartłomiej Kocot
4687f4eb2a Grouped conv bwd wei NDHWGC/NDHWGK (#804)
[ROCm/composable_kernel commit: 10732847e7]
2023-07-21 12:00:55 -05:00
Bartłomiej Kocot
b795a0f549 Grouped 3d conv backward data support (#799)
* Grouped 3d conv backward data support

* Fix comments

[ROCm/composable_kernel commit: 49180fd60b]
2023-07-18 11:01:33 -05:00
Illia Silin
74c83ffe26 Add mechanism to build CK for select data types, add Navi3x CI. (#790)
* allow building CK for specific data types

* add CI build and test stage on Naiv3x without some int8 instances

* add missing gemm fp16 instances

* add the changes to the missed cmake file

* add empty lines at end of source files

* Do not build quantization client example on navi3 in CI

* disable batched_gemm_multi_d_int8 instances with DTYPES

* disable device_conv2d_bwd_data_instance with DTYPES

* fix ckprofiler for conv_bwd_data for int8

* properly isolate the conv_bwd_data int8 instances

* remove empty line

[ROCm/composable_kernel commit: 189ea3b9aa]
2023-07-17 18:02:42 -07:00
Bartłomiej Kocot
dd6d0dd3b9 Support NHWGC conv2d_bwd_weight (#769)
* Support NHWGC conv2d_bwd_weight

* Fix client example

* Fix client example

* Fix comments

* Redesign grouped_conv_bwd_weight instances

* Clang format fix

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 1ee99dcaa6]
2023-07-12 08:25:02 -05:00
Po Yen Chen
aff6040b5b Split GEMM instance library & enable pipeline v2 optimization (#783)
* Move source file into sub-directories

* Add missing include directive

* Split DeviceGemmXdl<> fp16 instances

* Fix format

* Remove unnecessary CMakeLists.txt

* Add macros to toggle new features

* Remove debug message

* Turn off GEMM v2 pipeline optimization by default

* Fix format

* Extract duplicated string as list

* Enlarge indent in CMakeLists.txt

[ROCm/composable_kernel commit: 850144a0d3]
2023-07-06 10:59:35 -05:00
Adam Osewski
da8a7b63ec Move Device Ops implementations into impl directory. (#777)
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: f4dfc060b7]
2023-07-06 16:15:51 +02:00
Bartlomiej Kocot
27c7825316 Fix copyrights for DeviceBatchedGemmMultipleD_Dl
[ROCm/composable_kernel commit: 2b0b6d9f46]
2023-07-06 15:50:27 +02:00
Bartłomiej Kocot
8e7e512358 Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757)
* Support bf16/f32/f16 and NHWGC conv2d_bwd_data

* Add interface test

* clang format

* Comment fixes

* Add more friendly error message

[ROCm/composable_kernel commit: 63388e84ab]
2023-06-21 08:20:31 -05:00
Rostyslav Geyyer
09bc04e7a4 FP8 enablement - add a pseudorandom number generator, add conversion methods (#708)
* Add basic fp8 definitions and prn-generator

* Format

* Add fp8<->fp32 type_convert

* Format

* Split type_convert and cast_to/from_f8

* Format

* Minor fix

* Minor fix

* Move fp8 utils to a separate header

* Add elementwise ops

* Add fp8_convert_sr

* Format

* Add element op

* Eliminate magic numbers

* Split f8_convert_sr in host and device

* Format

* Add some constexpr

* Add a datatype test

* Format

* Another format

* Add fp8<->fp16 tests

* Update type_converts

* Format

* Add fp16 casting functions

* Format

* Use seed as a runtime arg

* Use element location for PRNG

* Format

* Add fp8<->fp16 to PassThrough element op

* Clean up

* Merge host and device implementations

* Add comments on rounding modes

* Remove leftover code

* Put type_converts into a separate header

* Put random number gen to a separate header

* Rearrange f8_utils' namespaces

* Refactor type_convert.hpp

* Move f8_t definition

[ROCm/composable_kernel commit: f0c620c42e]
2023-06-19 11:20:35 -05:00
rocking
9c2487d2a0 Maxpool bwd (#750)
* Add maxpool f32 kernel and example

* Revise copyright

* Add device pool bwd device op

* Support f16 and bf16

* Add compute datatype for reference code.
Prevent error in bf16

* Fix type error

* Remove layout

* Fix bf16 error

* Add f16 and bf16 example

* Add more operations

* Implement IsSupportedArgument

* Add changelog

* Add comment

* Add comment

* Remove useless header

* Move initialize of workspace to the run

* Move set din zero to the device operator

* Save din_length_raw

* Remove useless header

* Calculate gridsize according to the number of CU

* Calculate gridSize according to the number of CU.
Remove useless header

* Add put example

* Remove useless header

* Fix CI fail

[ROCm/composable_kernel commit: 341ad95665]
2023-06-19 09:44:22 -05:00
Qianfeng
d6f690d361 Padded Generic Kernel Instance (#730)
* Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting

* Move the generic kernel instance to be the first of the instance list for elementwise op of normalization

* Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax

* Add testing of GetGenericInstance() in client_example of Softmax

* Revert "Add testing of GetGenericInstance() in client_example of Softmax"

This reverts commit f629cd9a93.

* Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax"

This reverts commit a9f0d000eb.

* Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm

* Move generic kernel instance to separate tuple for elementwise op of normalization

* Remove un-used files for softmax instance

* Store generic kernel instance to separate tuple for softmax

* Add IsSupported checking for generic instance to client example of softmax

* Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization

* clang-format fix

* Remove int8 from softmax instances

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 0d9118226b]
2023-06-16 23:43:11 -05:00
zjing14
973fc655fd Fixed Weight layout of grouped_conv 3d fwd (#743)
* Changed wei layout

* changed layout for examples

* fixed client example

---------

Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: 309b1c6461]
2023-06-15 10:19:33 -05:00
Rostyslav Geyyer
f0c9daa292 Add generic kernel instances for ck::tensor_operation::device::DeviceGemmMultipleD (#741)
* Add generic instance gemm_add_add_fastgelu

* Add a client example for generic gemm_add_add_fastgelu

* Update CMakeLists

* Format

* Format

* Add generic instance gemm_add_fastgelu

* Format

* Add a gemm_add_fastgelu client example

* Format

* Add generic instance gemm_fastgelu

* Format

* Fix argument order

* Add gemm_fastgelu client example

* Add exceptions if argument is not supported

[ROCm/composable_kernel commit: 54b68eb343]
2023-06-14 16:06:56 -05:00
Bartłomiej Kocot
1405a4906b Add DeviceBatchedGemmMultipleD_Dl (#732)
* Add DeviceBatchedGemmMultipleD_Dl

* Fix batched_gemm tests

* Fix comments

* test_batched_gemm_multi_d fixes

* Fix args for isSupported batchedGemmMultipleDDl

* Disable tests for gfx90a

[ROCm/composable_kernel commit: fc9f97568f]
2023-06-12 08:37:15 -05:00
ltqin
8c5f5f1293 Fix flash attn mask bug (#733)
* add check input parameter

* add instance for vector load = 1

* move gerneral instance to first pos

* fix read bias code

* regular code for bias load

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 0ede66de54]
2023-06-12 08:35:31 -05:00
Illia Silin
d40b8d5e2c update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
Adam Osewski
b145984ea1 Multiple fixes to GroupedGemm+SplitK (#707)
* Add license header.

* Reduce number of logged output. Add constant initialization.

* Add functional tests for grouped_gemm with different kbatch value.

* Add debug log informations + remove unused code.

* Don't pass kbatch to CalculateKPadded.

* Turn on logging in grouped gemm and gemm splitk profiler

* Debug: limit number of test cases to run;

* Log more information and initialize with constant value.

* Turn on DEBUG_LOG

* Add more debug log informations.

* Limit the number of instances to compile.

* Use GridwiseGemmPipeline

* Use KBatch to calculate K0

* Multiple DebugLog messages.

* Unit tests for multiple KBatch values.

* Refactoring

* Disable logging
* extract out of if statement KBatch update.

* Uncomment instances.

* Disable DebugLog.

* Use Kbatch when calculate KPadded.

* Fix CGridDesc padding.

* Use available helper functions.

* Uncomment code commented for debuggin.

* Remove unnecessary debug log messages.

* Uncomment previously commented code for debug purposes.

* Add KBatch info to profiler output summary log.

* Add gtests for gemm splitk using ckProfiler API.

* Add more test-cases for different data layout.

* Add more test cases for gemm splitk

* Remove old test.

* Unit tests for MKNK ggemm interface.

* Fix and add more unit-tests.

* Constepxr everything!

* Increase error threshold for fp16 and splitk.

Since we're using fp16 atomic add for splitk there's a
known precision loss.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 70e4eb567f]
2023-05-30 07:09:06 -05:00