Commit Graph

885 Commits

Author SHA1 Message Date
illsilin
55a6b4e3bc merge down from the public repo 2023-04-21 10:49:39 -07:00
Illia Silin
9afa44d40b Switch to the new rocm5.6 compiler. (#681)
* switch to the new rocm5.6 compiler and docker

* fix syntax
2023-04-21 07:59:26 -07:00
Sam Wu
938a5e0e41 Update dependabot config (#682)
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
2023-04-20 21:55:56 -06:00
Illia Silin
aaf4defaa3 Merge pull request #9 from ROCmSoftwarePlatform/mergedown-from-public
Merge down from public repo
2023-04-19 13:44:41 -07:00
Illia Silin
bb0b772da9 Allow using ROCm release candidate compilers. (#679)
* enable use of rocm5.5 release candidate 4

* upgrade to ROCM5.5 RC5

* try fix the PUB_KEY error, remove the cmake-data package

* upgrade to latest cmake version

* use private dockerhub repo for rocm5.5 rc5

* add missing bracket
2023-04-18 09:22:49 -07:00
rocking5566
fd11a4a12a Add (#677) 2023-04-17 10:12:10 -05:00
Haocong WANG
fc26d42a2e Fix a typo (#676) 2023-04-15 21:57:34 -05:00
Rostyslav Geyyer
03eaee6ae6 Add more macros to turn on/off denorm fix (#678)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
2023-04-15 21:56:07 -05:00
Haocong WANG
e85178b4ca Add memory index guard in wmma device ops (#667) 2023-04-11 15:42:47 -05:00
Jun Liu
f532988713 [gtest] suppress unsafe buffer warn (#670)
ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
2023-04-11 15:41:49 -05:00
Sam Wu
fd497f0e79 Add dependabot config and pin rocm-docs-core (#663) 2023-04-11 09:18:38 -06:00
zjing14
c203bf6711 fixed quant example (#672)
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
2023-04-11 07:46:46 -05:00
zjing14
c54f8bcc25 add a marco to turn on/off denorm fix (off by default) (#673)
* add a marco to turn off denorm fix by default

* expose the marco

---------

Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
2023-04-11 07:44:43 -05:00
rocking5566
ed3a2e5226 Groupnorm + swish external api (#668)
* Rename to proper naming

* Add example of groupnorm + swish

* Extract duplicate code in example

* Add groupnorm + swish instances

* Ractor instance generation, split into multiple cpp file

* Add external api and client example

* Refine profiler message

* Use ck math version of exp

* Refine problem size in example

* Add host version of exp
2023-04-10 08:02:17 -05:00
Jun Liu
3248387bbb Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
This reverts commit bb5530af91.
2023-04-06 17:14:11 -07:00
illsilin
e51806ccf5 merge down from public repo 2023-04-06 10:41:03 -07:00
Illia Silin
e0c2a70fa1 Merge pull request #8 from ROCmSoftwarePlatform/fix_ci
modify Jenkinsfile to the internal repo
2023-04-05 09:34:04 -07:00
illsilin
151c22394a modify Jenkinsfile to the internal repo 2023-04-04 17:29:19 -07:00
zjing14
fde6d2742b add fp64 instances (#658)
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
2023-03-30 13:30:43 -05:00
Haocong WANG
091570f594 fix 3rd dword of buffer source descriptor (#659) 2023-03-29 19:03:55 -05:00
carlushuang
bb5530af91 simplify karg in device/grid of split-k op (#644)
* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout
2023-03-29 19:03:07 -05:00
Rostyslav Geyyer
dbd8f94bef Add a denorm test fix (#603)
* Add type_convert implementations for bf16

* Add the fix for conv_fwd

* Add the fix for conv_bwd_data

* Add the fix for conv_bwd_weight

* Format

* Format

* Another format

* Add a macro to use workaround on MI200 only

* Format

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-03-29 15:05:32 -05:00
rocking5566
389e84a83b Conv + quantization + tanh (#645)
* Rename file. Prepare to support another activation

* Add comment for quantization

* Extract out_elementop

* Add tanh example

* Add conv + bias + tanh quantization instance

* Add missing parameter

* Refine cmake

* Add external api and client example

* Extract variable in example

* Fix the comment

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-03-29 14:50:23 -05:00
Haocong WANG
4e097ad283 Add CMake Option "USE_OPT_NAVI3X" (#647)
* Add CMake Option "USE_OPT_NAVI3X"

* remove navi3x opt compile option from cmake script
2023-03-29 14:07:33 -05:00
Sam Wu
88d474323b Separate bibtex requirement from rocm-docs-core (#656)
* separate bibtex requirement from rocm-docs-core

* point requirements to source rocm-docs-core repo
2023-03-27 17:14:36 -06:00
Sam Wu
f80776d937 standardize docs (#655) 2023-03-23 20:58:59 -07:00
Haocong WANG
e5376be4ac [Navi3x] Fix Gridwise_multiple_d operation (#649)
* Add CMake Option "USE_OPT_NAVI3X"

* fix bug
2023-03-23 11:22:10 -05:00
Po Yen Chen
fe96e8fbf2 Reduce group & batch of the tested convolutions (#648) 2023-03-22 10:49:11 -07:00
Illia Silin
36750a5763 Get rid of XDL parameters in WMMA kernel string. (#646)
* remove XDL parameters from WMMA kernel string

* get rid f two more parameters
2023-03-22 08:05:48 -07:00
Dan Yao
8a659a2e4c rtn in ternary way (#632)
* rtn in ternary way

* Check both flags to preserve NaN

* Format

* Rearrange flag1

* Apply suggestions from code review

Co-authored-by: Ronan Keryell <ronan@keryell.fr>

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: Ronan Keryell <ronan@keryell.fr>
2023-03-20 14:30:24 -05:00
ltqin
6ae12434d2 workaround 637 (#640)
* add workaround 637

* format

* change id

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-03-20 11:49:31 -05:00
Rostyslav Geyyer
fa998675fc Update cmake-ck-dev.sh script (#641)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
2023-03-15 18:38:11 -05:00
rocking5566
16dc18e0f9 gemm/Conv xdlops + dlops quantization (#625)
* Add conv perlayer quantization

* Add gemm_dlops quantization

* Support int8 for innerproduct

* Refine gemm dlops int8 kernel parameter

* Support gfx908(MI100) and gfx90a(MI200)

* clang-format

* Rename example number

* Support different layout for d tensor

* Add conv dlops perchannel quantization example

* Move to example 40

* Extract the common code for different platform (dlops and xdlops)

* Move ot subfolder. Prepare to add other op of quantization

* Refine the quantization instance library

* Add conv dl instances and client example

* Remove unnecessary type

* Add gemm quantization instance

* Add external api and client example

* Refine num_bytes

* Separete different layout to different cpp

* Add more xdl instances

* Revert "Remove unnecessary type"

This reverts commit 820869182f.

* Remove CShuffleDataType in dlops
Let acc and CShuffleDataType be the same in xdlops

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-03-15 15:29:40 -05:00
Adam Osewski
a2d5ca8e95 Device Op GroupedGemmMultipleD + example fp16 (#633)
* Pass shared mem pointer as pointer to void.

* Device Op GroupedGEMM Multiple D

* Example for grouped gemm multiple d.

* Add MI200 to supported archs.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-03-15 11:22:59 -05:00
Rostyslav Geyyer
c10a6e8293 Add layout check to IsSupportedArgument (#627)
* Add layout check to IsSupportedArgument

* Format

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-03-15 11:12:12 -05:00
Illia Silin
14b3504d95 Update GetTypeString function to generate unique kernel IDs. (#638)
* make conv_fwd_bias_activation kernel id unique

* add more parameters to conv and gemm kernel names

* update GetTypeString for conv and gemm kernels

* fix two more kernel strings
2023-03-15 10:44:42 -05:00
Haocong WANG
ea028ac65a Fix arch limitation bug (#639) 2023-03-15 07:44:13 -07:00
zjing14
4ac2606cef Merge pull request #5 from ROCmSoftwarePlatform/mi300
Add support for gfx940 targets.
2023-03-13 15:05:49 -05:00
Illia Silin
f8a6c69c12 Merge branch 'develop' into mi300 2023-03-13 10:16:56 -07:00
illsilin
56599d6720 Merge branch 'mi300' of github.com:ROCmSoftwarePlatform/composable_kernel-internal into mi300 2023-03-13 10:14:47 -07:00
illsilin
9f5bf00589 restore gitignore file 2023-03-13 10:13:08 -07:00
Illia Silin
194c9f68ca Update cmake-ck-dev.sh 2023-03-13 10:08:26 -07:00
Rostyslav Geyyer
5b57ab96a8 Remove debug asserts (#629)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
2023-03-10 17:34:44 -06:00
Haocong WANG
087e310589 [Navi3x] Multiple issue fix (#612)
* Change gridwise gemm mD blockwise gemm to naive

* RRR Gemm fix

* Fix RCR gemm bug

* Isolate wmma instructions

* Update amd_inline_asm.hpp

* Update amd_wmma.hpp

* Update amd_wmma.hpp

* fix syntax and update Jenkinsfile

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
2023-03-10 17:04:28 -06:00
carlushuang
76fcdc60e9 fix a bug with non-dword-aligned offset when OOB, in case crash (#616)
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-03-09 08:07:24 -06:00
Illia Silin
0ccecc7c31 [gfx110x] support Navi3x architectures. (#628)
* enable building on Nav31

* fix syntax

* replace GPU_TARGETS with offload-arch

* add gfx1102 rachitecture

* fix typo

* update changelog
2023-03-09 07:56:40 -06:00
Adam Osewski
9096b1c7b2 GroupedGEMM + Gelu client example/instances/profiler (#614)
* Grouped gemm + Gelu instances.

* Device Instance Factory for GroupedGemm+Gelu

* Client example

* Rangify fill helper functions.

* Fix name clash.

* Profiler for grouped_gemm+gelu

* No need to use full namespace name.

* Add check for MRaw divisible by vector load.

* Ugly fix for big errors.

* Add grouped_gemm+gelu to profiler CMakelists.

* Store in argument additional info.

* Information about Mraw, Nraw, Kraw values.

* Use FastGelu instead of Gelu.

* Change client ex to use FastGelu

* Remove relaxed error precision.

* Remove duplicate output elementwise-op

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-03-07 22:06:56 -06:00
Rostyslav Geyyer
1e59eb3be5 Add descriptions to avoid build issues (#619)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
2023-03-06 13:11:58 -08:00
pmaybank
e4bf6d422e Generate output using Doxygen / Breathe (#598)
* Modify Doxygen config to pick up include directories recursively

* Add DeviceMem struct to API Reference guide

* Add classes that are used in Flash Attention kernel

* Add a reference and config for generating bibliography

Co-authored-by: Philip Maybank <Philip.Maybank@amd.com>
2023-03-06 11:39:16 -06:00
Illia Silin
e6cda9f8ff Change the CI workflow. (#611)
* add new parallel stage on navi node

* dont run performance tests on navi, get rid of 9110 compiler

* only run navi build when not doing QA

* fix syntax

* use navi21 label

* dont stash profiler on navi nodes, scp deb package to ginger

* disable tests on navi nodes

* test posting a binary to ginger

* add sshpass and use it to copy deb package

* fix the scp example

* fix syntax

* debug the scp issues

* add jenkins user to docker

* dont try whoami

* change jenkins uid and add user with uid=1002

* try scp from the last stage on micimaster

* rename and stash the package, scp from micimaster
2023-03-02 11:24:31 -06:00