Rostyslav Geyyer
fa753f27ba
Add splitk gemm fp16 @ fp16 with fp8 compute instances ( #983 )
...
* Add ComputeType
* Update for compatibility
* Add instances
* Update profiler api
2023-10-13 16:27:11 -05:00
Rostyslav Geyyer
62d4af7449
Refactor f8_t, add bf8_t ( #792 )
...
* Refactor f8_t to add bf8_t
* Add check_err impl for f8_t
* Update fp8 test
* Format
* Revert the fix
* Update vector_type implementation
* Add bf8 test
* Add bf8, use BitInt types
* Add bf8 conversion methods
* Update type_convert for fp8/bf8
* Add check_err fp8/bf8 support
* Add subnorm fp8 tests
* Add subnorm bf8 tests
* Fix conversion
* Add bf8 cmake bindings
* Add macros to enable build with disabled fp8/bf8
* Remove is_native method
* Update flag combination for mixed precision instances
* Add more flag checks
* Add another flag to a client example
* Add type traits, decouple f8/bf8 casting
* Clean up
* Decouple fp8 and bf8 flags
* Remove more redundant flags
* Remove leftover comments
2023-09-12 17:04:27 -05:00
Rostyslav Geyyer
eac50708d9
Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm ( #853 )
...
* Add ComputeType arg to splitk device and gridwise ops
* Update for gridwise op compatibility
* Update bf16 and int8 splitk gemm examples with ComputeType
* Add instances
* Update ckProfiler for mixed precision cases
* Add a mixed precision splitK gemm client example
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
2023-08-22 09:34:49 -05:00
Illia Silin
b94fd0b227
update copyright headers ( #726 )
2023-05-31 18:46:57 -05:00
Po Yen Chen
8784a72e23
Modularize ckProfiler operations ( #514 )
...
* Re-structure ckProfiler source files
* Rename profiler.cpp to main.cpp
* Modularize ckProfiler operations
* Add description for profiler operations
* Use longer name to avoid name collision
* Use macro to delay expansion
* Use std::move() to avoid object copying
* Prohibit users from calling dtor
* Use macro to eliminate redundant code
* Make friend function hidden
* Add missing include directive <iostream>
* Fix wrong include directives
* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com >
2022-12-01 15:15:02 -06:00
Chao Liu
aebd211c36
External Interface ( #304 )
...
* add client example
* clean
* clean
* reorg
* clean up profiler
* reorg
* clea
* fix profiler
* function for getinstances
* update client example
* update client example
* update client example
* update
* update example
* update Jenkins file
* update cmake
* update Jenkins
2022-06-26 19:39:02 -05:00