Commit Graph

10 Commits

Author SHA1 Message Date
Illia Silin
6d30e20d44 Fix the DL kernel issues on Navi3x. (#998)
* apply the patch for dl kernels on gfx11

* build DL kernels on navi32 CI

[ROCm/composable_kernel commit: f7331c603b]
2023-10-19 09:34:39 -07:00
Bartłomiej Kocot
a13a1cb6d3 Add grouped conv bwd weight dl instances and new layout (#897)
* Add grouped conv bwd weight dl instances and new layout

* Add M and N padding

* Remove todo comment

* Enable grouped conv fwd dl k,c=1 generic instance

* Comment fixes

[ROCm/composable_kernel commit: 475188ca2e]
2023-09-13 10:14:31 -05:00
Bartlomiej Wroblewski
460756b64d Fix datatype in inner_product when V_DOT2 is disabled (#849)
[ROCm/composable_kernel commit: 32fe996da0]
2023-08-17 10:54:11 -05:00
Bartłomiej Kocot
9996b8c375 Add s_nops after v_dot to avoid hazard (#808)
* Add s_nops after v_dot to avoid hazard

* Fix builtin for inner_produxt fp16

* Skip inline version to builtin

* Add comments regarding isa

* Fix comment regarding s_nop

[ROCm/composable_kernel commit: 7761e5232c]
2023-07-27 13:29:44 -05:00
Rostyslav Geyyer
7a70a883fa FP8 enablement - add a pseudorandom number generator, add conversion methods (#708)
* Add basic fp8 definitions and prn-generator

* Format

* Add fp8<->fp32 type_convert

* Format

* Split type_convert and cast_to/from_f8

* Format

* Minor fix

* Minor fix

* Move fp8 utils to a separate header

* Add elementwise ops

* Add fp8_convert_sr

* Format

* Add element op

* Eliminate magic numbers

* Split f8_convert_sr in host and device

* Format

* Add some constexpr

* Add a datatype test

* Format

* Another format

* Add fp8<->fp16 tests

* Update type_converts

* Format

* Add fp16 casting functions

* Format

* Use seed as a runtime arg

* Use element location for PRNG

* Format

* Add fp8<->fp16 to PassThrough element op

* Clean up

* Merge host and device implementations

* Add comments on rounding modes

* Remove leftover code

* Put type_converts into a separate header

* Put random number gen to a separate header

* Rearrange f8_utils' namespaces

* Refactor type_convert.hpp

* Move f8_t definition

[ROCm/composable_kernel commit: f0c620c42e]
2023-06-19 11:20:35 -05:00
Illia Silin
b57fbee2f1 update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
rocking5566
6a1403d82d gemm/Conv xdlops + dlops quantization (#625)
* Add conv perlayer quantization

* Add gemm_dlops quantization

* Support int8 for innerproduct

* Refine gemm dlops int8 kernel parameter

* Support gfx908(MI100) and gfx90a(MI200)

* clang-format

* Rename example number

* Support different layout for d tensor

* Add conv dlops perchannel quantization example

* Move to example 40

* Extract the common code for different platform (dlops and xdlops)

* Move ot subfolder. Prepare to add other op of quantization

* Refine the quantization instance library

* Add conv dl instances and client example

* Remove unnecessary type

* Add gemm quantization instance

* Add external api and client example

* Refine num_bytes

* Separete different layout to different cpp

* Add more xdl instances

* Revert "Remove unnecessary type"

This reverts commit 820869182f.

* Remove CShuffleDataType in dlops
Let acc and CShuffleDataType be the same in xdlops

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 16dc18e0f9]
2023-03-15 15:29:40 -05:00
Chao Liu
31706d4896 add license in file (#303)
[ROCm/composable_kernel commit: d3051d7517]
2022-06-24 23:32:43 -05:00
Jianfeng Yan
81f473ff41 Navi21 gemm (#197)
* start adding navi21 GEMM

* navi_gemm_km_kn_mn_fp32 compiles and passes one test.

* rename variables and functions in gridwise_gemm_dlops_v1r3

* add other 3 layouts; format instance

* adding more tuning parameters

add tuning parameters for other 3 layouts

* add gemm_dlops_f16

* tmp

* add dependence of DeviceGemm::IsSupportedArg() on arch

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* push gemm_dlops into profiler

* minor changes

* if using xdl or dlops is moved into profiler_gemm_impl

* minor changes

* minor changes

* remove is_xdl from profile_gemm_impl

* make IsSupportedArg dependent on arch for other device_gemm

* minor changes

* minor changes

* fix a bug in f_generate_tensor_value

* add 64x64x64 for gemm_dlops_int8

* add 64x64x64 for gemm_dlops_int8

* comment out 3 layouts in gemm_dlops_int8; add 32x32x32 for gemm_dlops_int8; init A values to 1

* fix

* start fixing tuning parameters

* monir

* minor changes

* minor changes

* minor changes

* fixing

* adding example

* adding example

* adding example

* add gemm fp32 example

* clean up

* use 128x128x16 as MNK tile in navi21 gemm example

* bug fix

* fix test

* use new block c tile

* clean

* fix build

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: shaojiewang <wsjmessi@163.com>

[ROCm/composable_kernel commit: 40b59a63cc]
2022-05-24 12:19:27 -05:00
Chao Liu
6203866064 Reorganize files, Part 1 (#119)
* delete obselete files

* move files

* build

* update cmake

* update cmake

* fix build

* reorg examples

* update cmake for example and test

[ROCm/composable_kernel commit: 5d37d7bff4]
2022-03-08 21:46:36 -06:00