Rostyslav Geyyer
cfaaa2114e
Add splitk gemm fp16 @ fp16 with fp8 compute instances ( #983 )
...
* Add ComputeType
* Update for compatibility
* Add instances
* Update profiler api
[ROCm/composable_kernel commit: fa753f27ba ]
2023-10-13 16:27:11 -05:00
Rostyslav Geyyer
2e227b8581
Refactor f8_t, add bf8_t ( #792 )
...
* Refactor f8_t to add bf8_t
* Add check_err impl for f8_t
* Update fp8 test
* Format
* Revert the fix
* Update vector_type implementation
* Add bf8 test
* Add bf8, use BitInt types
* Add bf8 conversion methods
* Update type_convert for fp8/bf8
* Add check_err fp8/bf8 support
* Add subnorm fp8 tests
* Add subnorm bf8 tests
* Fix conversion
* Add bf8 cmake bindings
* Add macros to enable build with disabled fp8/bf8
* Remove is_native method
* Update flag combination for mixed precision instances
* Add more flag checks
* Add another flag to a client example
* Add type traits, decouple f8/bf8 casting
* Clean up
* Decouple fp8 and bf8 flags
* Remove more redundant flags
* Remove leftover comments
[ROCm/composable_kernel commit: 62d4af7449 ]
2023-09-12 17:04:27 -05:00
Rostyslav Geyyer
6f9eeb3190
Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm ( #853 )
...
* Add ComputeType arg to splitk device and gridwise ops
* Update for gridwise op compatibility
* Update bf16 and int8 splitk gemm examples with ComputeType
* Add instances
* Update ckProfiler for mixed precision cases
* Add a mixed precision splitK gemm client example
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
[ROCm/composable_kernel commit: eac50708d9 ]
2023-08-22 09:34:49 -05:00
Illia Silin
b57fbee2f1
update copyright headers ( #726 )
...
[ROCm/composable_kernel commit: b94fd0b227 ]
2023-05-31 18:46:57 -05:00
Po Yen Chen
02db748e74
Modularize ckProfiler operations ( #514 )
...
* Re-structure ckProfiler source files
* Rename profiler.cpp to main.cpp
* Modularize ckProfiler operations
* Add description for profiler operations
* Use longer name to avoid name collision
* Use macro to delay expansion
* Use std::move() to avoid object copying
* Prohibit users from calling dtor
* Use macro to eliminate redundant code
* Make friend function hidden
* Add missing include directive <iostream>
* Fix wrong include directives
* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com >
[ROCm/composable_kernel commit: 8784a72e23 ]
2022-12-01 15:15:02 -06:00
Chao Liu
9096feca63
External Interface ( #304 )
...
* add client example
* clean
* clean
* reorg
* clean up profiler
* reorg
* clea
* fix profiler
* function for getinstances
* update client example
* update client example
* update client example
* update
* update example
* update Jenkins file
* update cmake
* update Jenkins
[ROCm/composable_kernel commit: aebd211c36 ]
2022-06-26 19:39:02 -05:00