Commit Graph

932 Commits

Author SHA1 Message Date
Alan Turner
7719005848 Formatting 2023-06-07 11:11:09 -07:00
Alan Turner
421734ae19 Disable werror for jit_library 2023-06-07 10:36:57 -07:00
Alan Turner
9e85db88c0 Disbale global-sonstructors warning for jit_library 2023-06-07 10:27:29 -07:00
Alan Turner
19d207dfbc Correct namespace for instances 2023-06-07 10:09:36 -07:00
Alan Turner
ffc5c906dc Remove extra namespaces from instace 2023-06-07 09:37:23 -07:00
Alan Turner
1404a5c6b6 Add include dir to test_jit_library 2023-06-06 10:06:21 -07:00
Alan Turner
0a763c3ec5 Merge remote-tracking branch 'origin/develop' into migx-jit-lib 2023-06-06 09:23:56 -07:00
Alan Turner
cb9ccccd7f Add back werror 2023-06-06 09:16:05 -07:00
Alan Turner
9a78dbbb84 Merge branch 'migx-jit-lib' of https://github.com/ROCmSoftwarePlatform/composable_kernel into migx-jit-lib 2023-06-06 08:57:56 -07:00
Alan Turner
8e0beb6542 Add unit tests 2023-06-06 08:56:50 -07:00
Illia Silin
4036590401 fix clang format (#740) 2023-06-02 14:10:02 -07:00
Paul
2470dcd5e4 Return not found for missing component 2023-06-02 15:17:52 -05:00
Paul
33f88fa84e Add missing header 2023-06-02 15:15:53 -05:00
Paul Fultz II
59e2dc294d Updates to ck host library API (#731)
* Move functions to cpp file

* Move another function to cpp file

* Fix semicolon

* Move solution to common.hpp

* Fix compile errors

* Use enum for data types

* Remove -Werror

* Fix header install

* Fix relative path

* Fix header path

* Install all headers
2023-06-01 18:54:52 -05:00
who who who
e2ebc8e795 replace hipMemcpy with hipMemcpyWithStream (#734) 2023-06-01 16:23:41 -05:00
Po Yen Chen
9eae73df9b Simplify kernel argument of device operator Device(Batched)GemmXdl<> (#723)
* Remove M/N/KPad local variables

* Use M/N/KPad to name padded lengths

* Replace duplicated local variable by parameters

* Rename variables M/N/KRaw to M/N/K

* Move AK0/BK0 compute logic into GridwiseGemm

* Use macro to shorten code

* Move CalculateGridSize() logic into GridwiseGemm

* Add comment to credit the implementation source

* Reuse the existing implementation

* Remove no-longer used data members

* Remove elementwise-op objects from interfaces

* Reserve kernel arg as whole object in interfaces

* Remove redundant data member

* Make 3rd type parameter optional

* Remove unnesscary type parameters

* Remove no-longer used descriptor-creation methods

* Move kernel arg type definition into GridwiseGemm

* Add macro to switch between code sections

* Move argument field computing logic into device op side

* Make utility method 'static'

* Declare special methods

* Unify MakeArgument() usage

* Adapt the new GridwiseGemm interface

* Push-down class 'GridwiseGemm::Argument' fields

* Remove no-longer used methods

* Add unused parameters

* Force copying parameters in 'Embed' ctor

* Remove no-longer used descriptors

* Fallback change on BaseArgument

* Remove macro 'INTEGER_DIVIDE_CEIL'

* Make variable naming more consistent

* Make sure methods are only invoked on right place

* Remove tailing underscore in public attribute name

* Remove necessary methods

* Hide computing logic of derived attributes

* Make new 'Embed' ctor only available for device code

* Make sure 'Embed' type args are not references

* Move check for karg.K into CheckValidity()

* Remove more integer division logic form device code

* Undo changes on Embed

* Separate 'Problem' concept out from 'Argument'

* Add overloaded version of __builtin_amdgcn_readfirstlane()

* Remove 'static' specifiers

* Remove more 'static' specifier

* Replace unsigne char by std::byte

* Add 'const' specifier to never changing variable

* Add 'inline' specifier to funcion definition

* Share same name for kernel interfaces

* Fix wrong boundar calculation logic

* Leave the third template arg for compatibility

* Remove unnecessary parameters

* Fix wrong error message (for type name)

* Create descriptor on device side

* Fix wrong debug message

* Remove no-longer used data members

* Rename type trait

* Remove std:: qualifier from standard types

* Replace 'size_t' by 'unsigned'

* Use type alias to hint usage

* Replace static_for<> by ordinary 'for' loop

* Reject unsupported argument

* Rename readfirstlane() to amd_wave_read_first_lane()

* Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp

* Update function calls

* Reorder statements

* Re-format files

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-06-01 16:23:02 -05:00
Alan Turner
fcca330709 Merge remote-tracking branch 'origin/migx-jit-lib2' into migx-jit-lib 2023-06-01 13:40:07 -07:00
Alan Turner
7295e38ddf 2023-06-01 13:38:19 -07:00
Illia Silin
b94fd0b227 update copyright headers (#726) 2023-05-31 18:46:57 -05:00
Po Yen Chen
582e31e88d Add class type support for __builtin_amdgcn_readfirstlane() (#711)
* Add overloaded version of __builtin_amdgcn_readfirstlane()

* Remove 'static' specifiers

* Remove more 'static' specifier

* Replace unsigne char by std::byte

* Add 'const' specifier to never changing variable

* Add 'inline' specifier to funcion definition

* Fix wrong boundar calculation logic

* Rename type trait

* Remove std:: qualifier from standard types

* Replace 'size_t' by 'unsigned'

* Use type alias to hint usage

* Replace static_for<> by ordinary 'for' loop

* Rename readfirstlane() to amd_wave_read_first_lane()

* Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp

* Reorder statements
2023-05-31 10:25:25 -05:00
Paul
9bf51c4c1c Install all headers 2023-05-31 09:59:05 -05:00
Haocong WANG
6eef0755c9 fix wmma gemm int8; add grouped conv int8 example (#716) 2023-05-30 07:18:53 -05:00
Po Yen Chen
1344a0f25b Simplify kernel argument of device operator DeviceGemm_Xdl_CShuffle<> (#696)
* Remove M/N/KPad local variables

* Use M/N/KPad to name padded lengths

* Replace duplicated local variable by parameters

* Rename variables M/N/KRaw to M/N/K

* Move AK0/BK0 compute logic into GridwiseGemm

* Use macro to shorten code

* Move CalculateGridSize() logic into GridwiseGemm

* Add comment to credit the implementation source

* Reuse the existing implementation

* Remove no-longer used data members

* Remove elementwise-op objects from interfaces

* Reserve kernel arg as whole object in interfaces

* Remove redundant data member

* Make 3rd type parameter optional

* Remove unnesscary type parameters

* Remove no-longer used descriptor-creation methods

* Move kernel arg type definition into GridwiseGemm

* Add macro to switch between code sections

* Move argument field computing logic into device op side

* Make utility method 'static'

* Declare special methods

* Unify MakeArgument() usage

* Adapt the new GridwiseGemm interface

* Push-down class 'GridwiseGemm::Argument' fields

* Remove no-longer used methods

* Add unused parameters

* Force copying parameters in 'Embed' ctor

* Remove no-longer used descriptors

* Fallback change on BaseArgument

* Remove macro 'INTEGER_DIVIDE_CEIL'

* Make variable naming more consistent

* Make sure methods are only invoked on right place

* Remove tailing underscore in public attribute name

* Remove necessary methods

* Hide computing logic of derived attributes

* Make new 'Embed' ctor only available for device code

* Make sure 'Embed' type args are not references

* Move check for karg.K into CheckValidity()

* Remove more integer division logic form device code

* Undo changes on Embed

* Separate 'Problem' concept out from 'Argument'

* Share same name for kernel interfaces

* Reject unsupported argument

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-05-30 07:09:55 -05:00
Adam Osewski
70e4eb567f Multiple fixes to GroupedGemm+SplitK (#707)
* Add license header.

* Reduce number of logged output. Add constant initialization.

* Add functional tests for grouped_gemm with different kbatch value.

* Add debug log informations + remove unused code.

* Don't pass kbatch to CalculateKPadded.

* Turn on logging in grouped gemm and gemm splitk profiler

* Debug: limit number of test cases to run;

* Log more information and initialize with constant value.

* Turn on DEBUG_LOG

* Add more debug log informations.

* Limit the number of instances to compile.

* Use GridwiseGemmPipeline

* Use KBatch to calculate K0

* Multiple DebugLog messages.

* Unit tests for multiple KBatch values.

* Refactoring

* Disable logging
* extract out of if statement KBatch update.

* Uncomment instances.

* Disable DebugLog.

* Use Kbatch when calculate KPadded.

* Fix CGridDesc padding.

* Use available helper functions.

* Uncomment code commented for debuggin.

* Remove unnecessary debug log messages.

* Uncomment previously commented code for debug purposes.

* Add KBatch info to profiler output summary log.

* Add gtests for gemm splitk using ckProfiler API.

* Add more test-cases for different data layout.

* Add more test cases for gemm splitk

* Remove old test.

* Unit tests for MKNK ggemm interface.

* Fix and add more unit-tests.

* Constepxr everything!

* Increase error threshold for fp16 and splitk.

Since we're using fp16 atomic add for splitk there's a
known precision loss.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-05-30 07:09:06 -05:00
Bartłomiej Kocot
c2d7a29dec Add instances for fp16/int8 Gemm kernels (Navi21) (#717)
* Add instances for fp16/int8 Gemm kernels (Navi21)

* Extend instances with smaller tiles

* Fix SrcVectorTensor for km_kn_mn int8
2023-05-30 07:07:17 -05:00
Paul
506798ded0 Fix header path 2023-05-25 18:38:54 -05:00
Paul
cddcb85659 Fix relative path 2023-05-25 18:16:04 -05:00
Paul
f89f3440b2 Fix header install 2023-05-25 17:08:05 -05:00
Paul
2e37592520 Remove -Werror 2023-05-25 17:01:27 -05:00
Paul
420c0312c7 Use enum for data types 2023-05-25 16:27:40 -05:00
Paul
3905f4a245 Fix compile errors 2023-05-25 16:10:30 -05:00
Paul
b155a0ac34 Move solution to common.hpp 2023-05-25 15:58:50 -05:00
Paul
e42607a5e6 Fix semicolon 2023-05-25 15:57:54 -05:00
Paul
856419e802 Move another function to cpp file 2023-05-25 15:56:40 -05:00
Paul
dd6fd8bb62 Move functions to cpp file 2023-05-25 15:54:15 -05:00
Alan Turner
61386bf903 Add edatatype and scalars_per_vector workaround 2023-05-25 12:11:09 -07:00
Alan Turner
6289e36f72 Add int8 instances 2023-05-25 11:26:05 -07:00
Alan Turner
6dd246a6f7 Merge remote-tracking branch 'origin/develop' into migx-jit-lib 2023-05-24 11:20:21 -07:00
Alan Turner
dc65f4c65e Use vectors for Ds types and layouts params 2023-05-24 11:06:24 -07:00
Illia Silin
ac9e01e2cc Clean-up the headers (#713)
* fix headers for gpu instances

* remove unused headers

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-05-24 08:11:25 -07:00
rocking
76ec0089fb Pool3d fwd (#697)
* Expand the base class of pool2d, prepare to share base class with pool3d

* Add pool3d device op

* Add pool3d f16 example

* Refactor the base class. implement generic pooling in the future

* clang format

* get original index in max pooling

* Add outputindex to base class

* Fix dimension

* Add pooling instance

* Use indexType instead

* Remove useless header

* Extract IndexDataType to template

* Extract pooling reference code

* clang format

* clang format

* Fix typo

* Add tensor stride

* Add missing header

* Add index stride and output stride

* Refine naming

* Add type to base class

* Rename file

* Use proper size

* Fix typo

* Refine naming

* Modify the argument into vector.

* Add max pool profiler

* Refine naming

* Support f32 pool

* Fix typo

* Add avg pool2d fwd in profiler

* clang format

* Rename AccDatatype to ComputeDatatype

* Fix init

* test pool

* Extract variable

* Add client example

* Check the pooling dim

* clang format

* Connect argv and arg_parser

* Add found check

* Remove useless header

* Refine naming

* Adjust the order of device_pool_fwd
2023-05-24 09:05:04 -05:00
Illia Silin
d821d1e54f Enable gemm_dl and other kernels on Navi3x. (#714)
* enable dl kernels on navi3

* do not build xdl tests and examples on Navi

* run tests before building everything on jenkins

* disable gemm_bilinear on gfx1030

* add gpu targets to installer on Navi

* put tests in the same order as before

* reduce the number of navi targets in CI

* build CI installed for gfx940 as well

* only build for MI300 during QA runs
2023-05-23 11:23:16 -05:00
Sam Wu
3cff340423 Documentation Updates (#710)
* update documentation dependencies

add version number to docs

rename doc config directories

enable more doc formats on rtd

add license section in docs
2023-05-18 11:08:38 -06:00
Alan Turner
e2878e2593 Merge remote-tracking branch 'origin/develop' into migx-jit-lib 2023-05-17 06:54:18 -07:00
Bartłomiej Kocot
642d5e9155 Add contraction profiler and tests (#701)
* Add contraction profiler and tests

* Build and style fixes

* Allow to use any elementwise operator for ref_contraction

* Introduce profile_contraction_scale and profile_contraction_bilinear

* Make ref_contraction generic and extend interface tests

* Stylistic minor fixes

* Extend test_contraction_interface
2023-05-15 09:46:52 -05:00
rocking
a1e344b1ae Normalization/split k (#615) 2023-05-11 07:15:02 -05:00
Rostyslav Geyyer
b076a02ad2 Optimize bf16 conversion (#664)
* Add TypeConvert class and start refactoring

* Refactor TypeConvert as a struct

* Get back to template functions type_convert

* Add a type_convert_bf16_rtn, set rtz as default

* Clean up

* Add UnaryConvertPrecision struct for high-precision workloads

* Format

* Update type_convert to UnaryConvert on threadwise level

* Update UnaryConvertPrecision

* Format

* Fix chmod

* Add a flag to pick converion method

* Format

* Remove the added flag

* Merge elementwise op with type conversion

* Move type_convert to elemwise op, update the op

* Update type_convert_precision -> bf16_convert_rtn

* Clean up

* Update comments

* Update the CK_WORKAROUND_DENORM_FIX flag handling

* Update the unneeded op to work but warn user

* Remove the message

* Use a PassThrough instead of ConvertBF16RTN to calcaulate reference

* Format

* Add missing include
2023-05-04 10:25:47 -05:00
Illia Silin
b8635a25b2 Fix the group of quantization_int8 kernels on MI300. (#695)
* replace amd_buffer_atomic_add with hip_atomic_add

* fix grouped_gemm_splitk kernels on mi300

* fix syntax

* revert experimental atomic_add changes

* fix the group of kernels from ticket 723 on MI300

---------

Co-authored-by: Jing Zhang <jizhan@amd.com>
2023-05-03 18:27:04 -05:00
Illia Silin
4a51d2da9d Fix grouped_gemm_splitk kernels on MI300. (#694)
* replace amd_buffer_atomic_add with hip_atomic_add

* fix grouped_gemm_splitk kernels on mi300

* fix syntax

* revert experimental atomic_add changes

---------

Co-authored-by: Jing Zhang <jizhan@amd.com>
2023-05-03 08:25:25 -07:00
Illia Silin
86e0190ec9 update daily build from rocm 5.4.3 to 5.5 (#693) 2023-05-03 08:18:10 -07:00