Commit Graph

8 Commits

Author SHA1 Message Date
Po Yen Chen
2327f1a640 Add example of Gemm + AddAddFastGelu (data type: int4) (#369)
* Add custom target to bundle examples together

* Add int4 example conditionally (just copy from int8 example)

* Extract common code into common.hpp

* Move ref gemm type alias into data-type-specific sources

* Add #error directive to prevent compile with wrong setting

* Let AddAddFastGelu support int4 parameter type

* Let check_err() support int4 parameter type

* Add wrapper function to hide value conversion while copying memory

* Finish int4 example for GEMM + AddAddFastGelu

* Add new DeviceMem API to copy memory

* Use new DeviceMem API to implement examples

* Fix wrongly use of macro 'CK_EXPERIMENTAL_BIT_INT_EXTENSION_INT4'

* Revert "Add new DeviceMem API to copy memory"

This reverts commit e26e7af71e.

* Add conversion ctor for Tensor<>

* Add 'const' specifier to Tensor<>::CopyAsType()

* Convert Tensor<> values before/after transfer between host & device
2022-08-23 10:38:41 -05:00
Po Yen Chen
a670a5a092 Move literal ""_uz & ""_zu into namespace 'ck::literals' (#354)
* Move literal ""_uz & ""_zu into namespace 'literals'

* Move namespace 'literals' as 'ck::literals'
2022-08-12 17:48:35 -05:00
Po Yen Chen
68b61504a3 Add examples for GEMM + AddAddFastGelu (data type: int8, bf16, fp32) (#340)
* Add always_false<> util to delay symbol resolution

* Use always_false<> to prevent trying instantiate unwanted method

* Add new specializations of AddAddFastGelu::operator() method

* Add GEMM + AddAddFastGelu examples for data types: int8, bf16, fp32

* Use floating point literal to simplify code

* Remove unnecessary capture in lambda expressions

* Extract fast GeLU calculation as standalone method

* Mark methods as 'constexpr'

* Add constraint for HostTensorDescriptor templated ctors

* Simplify HostTensorDescriptor ctor calls

* Add C++23 std::size_t literal suffix

* Use _uz suffix to shorten example code

* Remove unnecessary conversion to std::array<>

* Re-order include directives

* Remove C-style casting by literal suffix

* Remove unnecessary statements in main()

* Remove unused type parameter of always_false<>

* Remove unused include directive

* Exit main() by returning meaningful value

* Use 'if constexpr' to switch example flow

* Use std::is_same_v<> to shorten example code

* Add 'inline' specifier to literal functions

* Unify output methods in example

* Move common codes into .inc file

* Add type check in type_convert<>()

* Add type_convert<float>() before computation

* Merge AddAddFastGelu method specializations

* Remove always_false<>

* Add constraint to AddAddFastGelu::operator() parameter types
2022-08-11 17:31:28 -05:00
Chao Liu
500fa99512 Clean up conv example, Instances, profiler and test (#324)
* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test
2022-07-29 18:19:25 -05:00
Chao Liu
4fe9c393b8 N-D Tensor Contraction example, instance, and client example (#270)
* adding contraction

* add contraction example

* update examle

* update example

* format

* update readme

* clean header

* clean header

* contraction with multiple D

* rename

* fix naming issue; add instances for contraction+bilinear

* change assumed virtual layout of contraction; add client example

* update example

* update

* contraction+scale

* use type_convert

* rename
2022-07-07 14:31:11 -05:00
Chao Liu
d3051d7517 add license in file (#303) 2022-06-24 23:32:43 -05:00
Chao Liu
d1db6a0c3e Absolute include path (#281)
* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add client app example

* update readme

* delete obselete files

* remove old client app

* delete old file

* cleaning

* clean

* remove half

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path for all examples

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* revert client app example

* clean build

* fix build

* temporary disable client test on Jenkins

* clean

* clean

* clean
2022-06-24 20:51:04 -05:00
Chao Liu
56adf7e9cc GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfiler (#241)
* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add comment

* use type_convert

* clean

* clean element wise op
2022-06-19 03:07:28 -05:00