Commit Graph

15 Commits

Author SHA1 Message Date
Bartłomiej Kocot
1405a4906b Add DeviceBatchedGemmMultipleD_Dl (#732)
* Add DeviceBatchedGemmMultipleD_Dl

* Fix batched_gemm tests

* Fix comments

* test_batched_gemm_multi_d fixes

* Fix args for isSupported batchedGemmMultipleDDl

* Disable tests for gfx90a

[ROCm/composable_kernel commit: fc9f97568f]
2023-06-12 08:37:15 -05:00
Illia Silin
d40b8d5e2c update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
Po Yen Chen
3097b77236 Modularize ckProfiler operations (#514)
* Re-structure ckProfiler source files

* Rename profiler.cpp to main.cpp

* Modularize ckProfiler operations

* Add description for profiler operations

* Use longer name to avoid name collision

* Use macro to delay expansion

* Use std::move() to avoid object copying

* Prohibit users from calling dtor

* Use macro to eliminate redundant code

* Make friend function hidden

* Add missing include directive <iostream>

* Fix wrong include directives

* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test

Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

[ROCm/composable_kernel commit: 8784a72e23]
2022-12-01 15:15:02 -06:00
Chao Liu
4be57e5afa Gemm+Bilinear (#316)
* refactor

* update example

* update example

* gemm bilinear

* clean

* update

[ROCm/composable_kernel commit: 9e4429f9c3]
2022-07-02 09:15:38 -05:00
zjing14
668891d234 add batch_stride into batched gemm (#314)
* add batch_stride

* fixed test

Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: 1c8126a4c2]
2022-07-01 01:35:37 -05:00
Chao Liu
675e7b7956 External Interface (#304)
* add client example

* clean

* clean

* reorg

* clean up profiler

* reorg

* clea

* fix profiler

* function for getinstances

* update client example

* update client example

* update client example

* update

* update example

* update Jenkins file

* update cmake

* update Jenkins

[ROCm/composable_kernel commit: aebd211c36]
2022-06-26 19:39:02 -05:00
Chao Liu
2ef299e0ad add license in file (#303)
[ROCm/composable_kernel commit: d3051d7517]
2022-06-24 23:32:43 -05:00
Chao Liu
9df0a11a51 Absolute include path (#281)
* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add client app example

* update readme

* delete obselete files

* remove old client app

* delete old file

* cleaning

* clean

* remove half

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path for all examples

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* revert client app example

* clean build

* fix build

* temporary disable client test on Jenkins

* clean

* clean

* clean

[ROCm/composable_kernel commit: d1db6a0c3e]
2022-06-24 20:51:04 -05:00
Illia Silin
4fa2ef676a Add performance tests as a stage of CI. (#247)
* modify ckProfiler_gemm output

* fix syntax

* change ckProfiler output and return 0

* fix syntax

* output datatype

* fix syntax

* output datatype in another way

* fix syntax

* fix syntax

* test return values of ckProfiler

* add layout info and tests, make sure ckprofiler returns 0

* fix syntax

* change layout output

* fix syntax

* fix syntax again

* update script to process perf results

* rearrange jenkins stages

* fix typo

* add python packages to Docker file

* adding setuptools-rust package

* modify parsing for new test parameters

* test db credentials on jenkins

* fix syntax

* update python script to handle incomplete lines

* ungrade python to 3.8 and write the gemm_params table

* add sqlalchemy package to docker

* move perf data processing to master node

* move the master node inside a steps region

* add new stage for result processing

* move results processing to separate stage

* reduce number of tests to speedup debugging

* pass config to processPerfResults stage

* run script on master in a docker container

* replace show_node_info

* try loading docker on master node again

* use ansible node instead of master

* get rid of pymysql package

* try ssh connection using paramiko

* put back pymysql

* put the perf data processing back on the gpu node

* put back artifact definition

* archive the perf_log before parsing

* clean up jenkinsfile, fix parsing

* fix typo

* enable all perf tests

* put all stages in original order, finalize script

* fix gpu_arch version

* update parsing script

* remove obsolete file causing merge conflict

[ROCm/composable_kernel commit: 1085794df3]
2022-05-24 11:14:50 -05:00
JD
569dd9f47b Add host API (#220)
* Add host API

* manually rebase on develop

* clean

* manually rebase on develop

* exclude tests from all target

* address review comments

* update client app name

* fix missing lib name

* clang-format update

* refactor

* refactor

* refactor

* refactor

* refactor

* fix test issue

* refactor

* refactor

* refactor

* upate cmake and readme

Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: cec69bc3bc]
2022-05-12 09:21:01 -05:00
Jianfeng Yan
8a4806a3dc Fix typo in batched gemm profiler (#176)
* forgot passing BatchedCount in some profiler_batched_gemm

* delete default BatchCount

[ROCm/composable_kernel commit: ac0d806650]
2022-04-07 13:17:15 -05:00
Chao Liu
d27a11cc78 Gemm+Reduce Fusion (#128)
* add gridwise gemm v4r1

* rename

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* use sfc in shuffling

* remove hardcode

* remove hardcode

* refactor

* fix build

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* format

* clean

* adding gemm+reduce

* adding profiler for gemm+reduce

* adding gemm+reduce profiler

* fix build

* clean up

* gemm+reduce

* fix build

* update DeviceGemm_Xdl_CShuffle; update enum to enum class

* clean up

* add test for gemm+reduce

* clean up

* refactor

* fix build

* fix build

[ROCm/composable_kernel commit: f95267f166]
2022-03-23 22:18:42 -05:00
Jianfeng Yan
0d02cb3dfe Batched gemm bf16 (#142)
* add bf16 for batched gemm

* batched_gemm_bf16 works

* recover accidently changed files

[ROCm/composable_kernel commit: d91f9f119c]
2022-03-22 18:18:43 -05:00
Jianfeng Yan
4ddc016c60 refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler (#120)
changed long_index_t to index_t when computing memory offset

uncomment other ops in profiler

added test for batched_gemm

[ROCm/composable_kernel commit: cb87b049de]
2022-03-21 16:45:14 -05:00
zjing14
e57c9a886f Batched GEMM for fp16 (#79)
* prepare host for batched_gemm

* init commit of batched kernels

* fixed

* refine transform with freeze

* m/n padding

* fixed a bug; clean

* add small tiles

* clean

* clean code

* clean code

* add nt, tn, tt layout

* add missing file

* use StaticBufferTupleOfVector instead

* add reference_batched_gemm

* fixed a macro

[ROCm/composable_kernel commit: b53e9d08ed]
2022-02-11 09:36:52 -06:00