Commit Graph

863 Commits

Author SHA1 Message Date
Jun Liu
b4df986264 [gtest] suppress unsafe buffer warn (#670)
ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912

[ROCm/composable_kernel commit: f532988713]
2023-04-11 15:41:49 -05:00
Sam Wu
e5a82c403a Add dependabot config and pin rocm-docs-core (#663)
[ROCm/composable_kernel commit: fd497f0e79]
2023-04-11 09:18:38 -06:00
zjing14
53b28d2146 fixed quant example (#672)
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: c203bf6711]
2023-04-11 07:46:46 -05:00
zjing14
b18d739672 add a marco to turn on/off denorm fix (off by default) (#673)
* add a marco to turn off denorm fix by default

* expose the marco

---------

Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: c54f8bcc25]
2023-04-11 07:44:43 -05:00
rocking5566
356c1cc17b Groupnorm + swish external api (#668)
* Rename to proper naming

* Add example of groupnorm + swish

* Extract duplicate code in example

* Add groupnorm + swish instances

* Ractor instance generation, split into multiple cpp file

* Add external api and client example

* Refine profiler message

* Use ck math version of exp

* Refine problem size in example

* Add host version of exp

[ROCm/composable_kernel commit: ed3a2e5226]
2023-04-10 08:02:17 -05:00
Jun Liu
89d6f8a65f Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
This reverts commit 1108f64591.

[ROCm/composable_kernel commit: 3248387bbb]
2023-04-06 17:14:11 -07:00
zjing14
696991c923 add fp64 instances (#658)
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: fde6d2742b]
2023-03-30 13:30:43 -05:00
Haocong WANG
37f95442f9 fix 3rd dword of buffer source descriptor (#659)
[ROCm/composable_kernel commit: 091570f594]
2023-03-29 19:03:55 -05:00
carlushuang
1108f64591 simplify karg in device/grid of split-k op (#644)
* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout

[ROCm/composable_kernel commit: bb5530af91]
2023-03-29 19:03:07 -05:00
Rostyslav Geyyer
15ac3fc064 Add a denorm test fix (#603)
* Add type_convert implementations for bf16

* Add the fix for conv_fwd

* Add the fix for conv_bwd_data

* Add the fix for conv_bwd_weight

* Format

* Format

* Another format

* Add a macro to use workaround on MI200 only

* Format

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: dbd8f94bef]
2023-03-29 15:05:32 -05:00
rocking5566
cbce8b77da Conv + quantization + tanh (#645)
* Rename file. Prepare to support another activation

* Add comment for quantization

* Extract out_elementop

* Add tanh example

* Add conv + bias + tanh quantization instance

* Add missing parameter

* Refine cmake

* Add external api and client example

* Extract variable in example

* Fix the comment

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 389e84a83b]
2023-03-29 14:50:23 -05:00
Haocong WANG
8a984b4e3f Add CMake Option "USE_OPT_NAVI3X" (#647)
* Add CMake Option "USE_OPT_NAVI3X"

* remove navi3x opt compile option from cmake script

[ROCm/composable_kernel commit: 4e097ad283]
2023-03-29 14:07:33 -05:00
Sam Wu
5a8db87383 Separate bibtex requirement from rocm-docs-core (#656)
* separate bibtex requirement from rocm-docs-core

* point requirements to source rocm-docs-core repo

[ROCm/composable_kernel commit: 88d474323b]
2023-03-27 17:14:36 -06:00
Sam Wu
2268a29786 standardize docs (#655)
[ROCm/composable_kernel commit: f80776d937]
2023-03-23 20:58:59 -07:00
Haocong WANG
84f096c844 [Navi3x] Fix Gridwise_multiple_d operation (#649)
* Add CMake Option "USE_OPT_NAVI3X"

* fix bug

[ROCm/composable_kernel commit: e5376be4ac]
2023-03-23 11:22:10 -05:00
Po Yen Chen
57c8d94bf7 Reduce group & batch of the tested convolutions (#648)
[ROCm/composable_kernel commit: fe96e8fbf2]
2023-03-22 10:49:11 -07:00
Illia Silin
b3c1e83276 Get rid of XDL parameters in WMMA kernel string. (#646)
* remove XDL parameters from WMMA kernel string

* get rid f two more parameters

[ROCm/composable_kernel commit: 36750a5763]
2023-03-22 08:05:48 -07:00
Dan Yao
a84d2f5d81 rtn in ternary way (#632)
* rtn in ternary way

* Check both flags to preserve NaN

* Format

* Rearrange flag1

* Apply suggestions from code review

Co-authored-by: Ronan Keryell <ronan@keryell.fr>

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: Ronan Keryell <ronan@keryell.fr>

[ROCm/composable_kernel commit: 8a659a2e4c]
2023-03-20 14:30:24 -05:00
ltqin
fc10856d4b workaround 637 (#640)
* add workaround 637

* format

* change id

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 6ae12434d2]
2023-03-20 11:49:31 -05:00
Rostyslav Geyyer
5c8eb78a25 Update cmake-ck-dev.sh script (#641)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

[ROCm/composable_kernel commit: fa998675fc]
2023-03-15 18:38:11 -05:00
rocking5566
6a1403d82d gemm/Conv xdlops + dlops quantization (#625)
* Add conv perlayer quantization

* Add gemm_dlops quantization

* Support int8 for innerproduct

* Refine gemm dlops int8 kernel parameter

* Support gfx908(MI100) and gfx90a(MI200)

* clang-format

* Rename example number

* Support different layout for d tensor

* Add conv dlops perchannel quantization example

* Move to example 40

* Extract the common code for different platform (dlops and xdlops)

* Move ot subfolder. Prepare to add other op of quantization

* Refine the quantization instance library

* Add conv dl instances and client example

* Remove unnecessary type

* Add gemm quantization instance

* Add external api and client example

* Refine num_bytes

* Separete different layout to different cpp

* Add more xdl instances

* Revert "Remove unnecessary type"

This reverts commit 820869182f.

* Remove CShuffleDataType in dlops
Let acc and CShuffleDataType be the same in xdlops

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 16dc18e0f9]
2023-03-15 15:29:40 -05:00
Adam Osewski
512ec3ac4d Device Op GroupedGemmMultipleD + example fp16 (#633)
* Pass shared mem pointer as pointer to void.

* Device Op GroupedGEMM Multiple D

* Example for grouped gemm multiple d.

* Add MI200 to supported archs.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: a2d5ca8e95]
2023-03-15 11:22:59 -05:00
Rostyslav Geyyer
6e6482b9cd Add layout check to IsSupportedArgument (#627)
* Add layout check to IsSupportedArgument

* Format

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: c10a6e8293]
2023-03-15 11:12:12 -05:00
Illia Silin
87113ad617 Update GetTypeString function to generate unique kernel IDs. (#638)
* make conv_fwd_bias_activation kernel id unique

* add more parameters to conv and gemm kernel names

* update GetTypeString for conv and gemm kernels

* fix two more kernel strings

[ROCm/composable_kernel commit: 14b3504d95]
2023-03-15 10:44:42 -05:00
Haocong WANG
459469f66a Fix arch limitation bug (#639)
[ROCm/composable_kernel commit: ea028ac65a]
2023-03-15 07:44:13 -07:00
Rostyslav Geyyer
b78f3ba805 Remove debug asserts (#629)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

[ROCm/composable_kernel commit: 5b57ab96a8]
2023-03-10 17:34:44 -06:00
Haocong WANG
9687ad0b61 [Navi3x] Multiple issue fix (#612)
* Change gridwise gemm mD blockwise gemm to naive

* RRR Gemm fix

* Fix RCR gemm bug

* Isolate wmma instructions

* Update amd_inline_asm.hpp

* Update amd_wmma.hpp

* Update amd_wmma.hpp

* fix syntax and update Jenkinsfile

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 087e310589]
2023-03-10 17:04:28 -06:00
carlushuang
ca7b3a4f58 fix a bug with non-dword-aligned offset when OOB, in case crash (#616)
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 76fcdc60e9]
2023-03-09 08:07:24 -06:00
Illia Silin
9ce65cae0e [gfx110x] support Navi3x architectures. (#628)
* enable building on Nav31

* fix syntax

* replace GPU_TARGETS with offload-arch

* add gfx1102 rachitecture

* fix typo

* update changelog

[ROCm/composable_kernel commit: 0ccecc7c31]
2023-03-09 07:56:40 -06:00
Adam Osewski
0d23b0d1c9 GroupedGEMM + Gelu client example/instances/profiler (#614)
* Grouped gemm + Gelu instances.

* Device Instance Factory for GroupedGemm+Gelu

* Client example

* Rangify fill helper functions.

* Fix name clash.

* Profiler for grouped_gemm+gelu

* No need to use full namespace name.

* Add check for MRaw divisible by vector load.

* Ugly fix for big errors.

* Add grouped_gemm+gelu to profiler CMakelists.

* Store in argument additional info.

* Information about Mraw, Nraw, Kraw values.

* Use FastGelu instead of Gelu.

* Change client ex to use FastGelu

* Remove relaxed error precision.

* Remove duplicate output elementwise-op

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 9096b1c7b2]
2023-03-07 22:06:56 -06:00
Rostyslav Geyyer
2cf1f440a3 Add descriptions to avoid build issues (#619)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

[ROCm/composable_kernel commit: 1e59eb3be5]
2023-03-06 13:11:58 -08:00
pmaybank
9080b984cb Generate output using Doxygen / Breathe (#598)
* Modify Doxygen config to pick up include directories recursively

* Add DeviceMem struct to API Reference guide

* Add classes that are used in Flash Attention kernel

* Add a reference and config for generating bibliography

Co-authored-by: Philip Maybank <Philip.Maybank@amd.com>

[ROCm/composable_kernel commit: e4bf6d422e]
2023-03-06 11:39:16 -06:00
Illia Silin
ef3be1d9cf Change the CI workflow. (#611)
* add new parallel stage on navi node

* dont run performance tests on navi, get rid of 9110 compiler

* only run navi build when not doing QA

* fix syntax

* use navi21 label

* dont stash profiler on navi nodes, scp deb package to ginger

* disable tests on navi nodes

* test posting a binary to ginger

* add sshpass and use it to copy deb package

* fix the scp example

* fix syntax

* debug the scp issues

* add jenkins user to docker

* dont try whoami

* change jenkins uid and add user with uid=1002

* try scp from the last stage on micimaster

* rename and stash the package, scp from micimaster

[ROCm/composable_kernel commit: e6cda9f8ff]
2023-03-02 11:24:31 -06:00
Illia Silin
6a06747736 Suppress reserved-identifier warning and catch all warnings. (#608)
* suppress the reserved-identifier warnings

* keep BUILD_DEV=On and use -Werror by default

[ROCm/composable_kernel commit: 59cbb20c7c]
2023-03-01 12:08:13 -06:00
Haocong WANG
d33b8f9152 [Navi3x Bug Fix] fix typo to accept MNKPadding flag correctly. (#597)
* fix a bug blocking wmma_gemm_multipleD

* Utilize matrix padder in device_wmma_op

* cosmetic change for gemmpadding format

* clang format

* Change gridwise gemm from FIFO to KMN loop fashion

[ROCm/composable_kernel commit: 68dbf40a79]
2023-03-01 12:07:42 -06:00
Chao Liu
c72e448b2a Fast GeLU using built-in function (#587)
* clean up

* fast gelu using builtin function

* clean

* clean

* clean

* clean:

* clean

* fix compilation

* clean

* clean

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 8f455615a8]
2023-02-26 23:19:11 -06:00
zjing14
0bede6cabd disable tensor contraction f64 on MI100 (#602)
[ROCm/composable_kernel commit: 209baee299]
2023-02-23 16:59:37 -08:00
Rostyslav Geyyer
f52b71c693 Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505)
* Add DeviceOp and examples

* Format DeviceOp template arguments

* Remove bf16 example

* Format

* Format

* Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N

* Refactor argument preparation

* Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl

* Rename device op file

* Update include directive in the example file

* Update descriptor preparation for grouped op

* Update the argument

* Update batch handling

* Add gridwise gemm supporting batched input

* Update blockwise indexing, working version

* Update copyright year

* Update check if argument is supported

* Refactor and make consistent with xdl examples

* Update check if argument is supported

* Add changelog entry

* Added comments on Dl op split_k>1 support

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 246ceee49e]
2023-02-22 11:59:53 -06:00
ltqin
4d10c937f0 Grouped conv1d client example (#589)
* add conv1d fwd client example

* change 07_grouped_conv2d_fwd to 07_grouped_convnd_fwd

* add conv1d bwd weight

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 830d37a7d5]
2023-02-22 11:55:21 -06:00
Illia Silin
fdd525e21c fix a bug when building for gfx1030 target. (#591)
* fix a bug while building for gfx1030 and add gfx1030 to targets

* fix syntax

[ROCm/composable_kernel commit: bef0cb20db]
2023-02-16 13:54:08 -06:00
Illia Silin
3bf1f6045f Build and archive deb packages. (#590)
* build and archive deb packages

* fix syntax

* run QA to test building packages

* apply cron to develop branch again

[ROCm/composable_kernel commit: 584d233cfe]
2023-02-16 13:11:23 -06:00
pmaybank
38d88f87d5 Sphinx doc (#581)
* New docs directory with minimal config

* Based on docs directory of rocBLAS

* Config for running Doxygen then Sphinx to generate HTML

* Add minimal content - intro to doc

* Add some boilerplate sections to doc

* content still needs to be done,
* e.g., need to generate API documentation using Doxygen
* need to write contributor guide

* Start Softmax section of Support Primitives doc

* Written as a test bed for typesetting math content

* Need to decide how much detail to go into

* add doc directories to git ignore file.

* Minor edits - new line at EOF, change year in copyright notices

* Port Markdown files to ReStructuredText

* Copy Markdown files from pre-existing doc directory to docs directory

* Convert to reStructured Text (rst) - section headings, links, tables
  have a different syntax in rst

* New rst files added to index - can generate HTML with same style as
  HTML generated from rst files in previous commits

* Intention is to make all the content in doc redundant and use rst
  throughout rather than mix of md and rst

* Extend Softmax section of Primitives Guide

* rename l to z

* add material on applying softmax row-wise to matrix

* define macro for diag operator (represents diagonal matrix)

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: cb3fac4d2a]
2023-02-15 17:17:46 -06:00
Illia Silin
c1efabf921 Clean up kernel launch output (#569)
* clean up output from kernel_launch

* set RUN_WARMUP to 0 by default

* split the warm-up into a separate issue

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 19490ac4f7]
2023-02-15 12:07:21 -06:00
zjing14
7335ebded0 Add contraction_fp64 example (#570)
* add contraction_bilinear

* add contraction_scale_xdl_fp64

* reduce tile size to avoid register spill

---------

Co-authored-by: root <root@ctr-ubbsmc16.amd.com>

[ROCm/composable_kernel commit: 24c9ee1d22]
2023-02-15 12:00:58 -06:00
rocking5566
9d20a2b6b5 Improve normalization (#580)
* Sync the order of type string with template parameter

* Add more instances

* Check the vector size and remove redundant var

* Extract var to static, prepare to separate sweep once kernel

* Separate sweeponce flow and optimize the flow

* 1. Rename AccDatatype in normalization to computeData
2. Rename AccElementwiseOperation to YElementwiseOperation in normalization

* Remove useless code

* Update naive variance kernel

* Refine string

* Fix typo

* Support naive variance for device_normalization

* Check the blocksize

* Share the VGPR of x and y

* Share the VGPR of gamma and beta

* Add more instances

* Support fp16 sqrt for experiment

* Add CHANGELOG

* Fix typo

* clang-format

[ROCm/composable_kernel commit: 6a6163a3d1]
2023-02-15 11:59:35 -06:00
Haocong WANG
789c15d703 [Navi3x] Add Device Operations (#567)
* wmma_op + unit test

* add arch limitation to wmma test

* change arch limitation

* Refactor + Add all type unit test(int4 compile failed)

* Add f32_16x16x16_bf16 unit test

* tempsave

* tempsave

* tempsave

* runtime bug, cannot find symbol

* workaround for incorrect HIP warpSize return value

* debugging

* tempsave

* Correctness OK, waiting for optimization

* Tidy up + format

* temp save

* temp save, reproduce the v_bfi_b32 issue

* add inline asm for wmmaop test

* tidy up

* clean some debug purpose code

* discard some codes

* clang format

* clang format

* compiler issue fixed + increase tile size

* navi3x_multipleD+example

* temp save

* workable

* batchedgemm[OK], groupconv[debug]

* groupconv: Sanity check[OK], Performance[Bad]

* navi3x_groupconv_need_optimization

* format

* Add arch limitation to all wmma examples

* fix bug: example30 input conv args

[ROCm/composable_kernel commit: 0cfda84d05]
2023-02-15 11:50:51 -06:00
Adam Osewski
23c45ec25e Conv3D FWD BWD WRW fp16 fp32 client examples (#559)
* Conv3d bwd weight client example.

* Update year in license

* Convolution bwd data 3D fp16/fp32 client example.

* Client example for convnd fwd fp16 fp32

* clang-format

* Review remarks.

* Fix compiler err.

* Update data layout to standard one.

* Add conv 3d fwd NDHWGC instances

* clang-format

* Conv3d fwd NDHWGC instances.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: e9fd122889]
2023-02-15 11:16:47 -06:00
Illia Silin
6a4cfe125b Remove the workaround for bf16 attention tests. (#586)
* remove workanround in bf16 attention test

* clean up another workaround

[ROCm/composable_kernel commit: 06f1fc864c]
2023-02-14 18:06:24 -06:00
Adam Osewski
40022e6ec6 GroupedGEMM more bigger tiles. (#577)
* Adding more bigger tiles.

* Remove failing instance.

* Remove instances which that don't improve perf.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 8f42780fd6]
2023-02-13 10:06:24 -06:00
Illia Silin
8a015a4acb enable batched_gemm_softmax_bf16 tests (#582)
[ROCm/composable_kernel commit: 0ac0f51ad6]
2023-02-10 13:00:37 -06:00