Haocong WANG
f83e9701e9
[GEMM] Gemm universal device operation (#1154)
* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances
* clang format + typo fix
* Fix a bug
* initial commit
* Add more instances to irregular splitk
* blkgemm pipeline v1~4 prototype
* Sanity Checked. Known issue:
1. Poor performance of splitk
2. Register spill on blkgemmpipeline v3
* Sanity and Performance fix:
1. fix a bug related to sanity in grouped b2c mapping
2. fix a bug related to sanity and performance in splitk offset
* Sanity and API update:
1. Remove prefetch stage
2. Fix valid check bug
3, Add first gemm_universal instance into ckProfiler
* Add NN instances for gemm universal
* 1. Add NT instances for gemm_universal
2. Fix a bug about Kpadding in gemm_universal
* Fix a bug regarding padding Odd K number
* remove kernel print
* Fix KPadding bug...
* Update safety check
* another try to fix kpadding..
* Sanity checked
* new instances..
* clang format+typo fix
* remove clang format script's change
* Add non-hotloop compile option
* 1. Add fp16xfp8 example
2. pull packed convert f8 from pr1150
* Some miscs.. opt and fix
* Add pipeline description docs
* Split universal gemm instance library to cut profiler compiling time
* uncomment cmakefile
* Fix a bug caused by blockwise_gemm_pipe_v2
* reduce default splitk to 1
* Add 224x256x64 tile size
* update, including:
1. Experiment pipeline 5~7
2. Optimization for pipeline 4
3. Organized instance library
* temp save
* temp save
* Permuted lds layout, sanity and function checked
* clang format
* Move OOB check from RunRead to RunWrite, for better software pipeline.
TODO: agpr spill when NN layout
* clangformat
* A/B splitpipe scheduler for v3
* Fix two bugs
* bug fix
* fix a bug in oob check
* Example for mixed fp16_fp8 gemm
* Clean experimental code blocks
* Add mixed precision gemm into profiler
* tempsave
* optimize m/n major lds layout
* Add RRR GEMM mixed precision instances
* Optimize f8 matrix transpose
* Add test_gemm_universal
* A/B spilt schedule for blkpip v5
* Take ds_read2 into iglp scheduling scheme
* format
* fixed cmake
* Add llvm-option into CI cmake flag
---------
Co-authored-by: Jing Zhang <jizhan@amd.com>
2024-04-13 21:03:18 -05:00
..
2023-08-10 12:04:35 +08:00
2024-03-08 17:11:51 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-03-08 17:11:51 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2023-11-30 15:09:27 -06:00
2023-11-30 15:09:27 -06:00
2023-11-30 15:09:27 -06:00
2023-08-09 08:44:23 -05:00
2023-11-10 15:54:44 +01:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-01-09 08:21:47 -08:00
2023-08-18 11:14:59 +08:00
2023-07-26 07:19:55 -07:00
2023-07-26 07:19:55 -07:00
2023-07-26 07:19:55 -07:00
2023-07-26 07:19:55 -07:00
2023-07-26 07:19:55 -07:00
2023-05-31 18:46:57 -05:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2023-07-26 07:19:55 -07:00
2023-06-15 10:13:59 -05:00
2023-11-29 11:36:40 -06:00
2024-03-22 10:40:43 +01:00
2024-04-04 11:01:33 +02:00
2023-05-31 18:46:57 -05:00
2024-02-20 09:56:54 -08:00
2024-03-08 17:11:51 -08:00
2023-07-26 07:19:55 -07:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-03-08 17:11:51 -08:00
2023-11-25 13:35:22 +01:00
2024-03-05 17:08:43 -08:00
2023-07-26 07:19:55 -07:00
2024-03-08 17:11:51 -08:00
2023-12-03 23:08:47 +01:00
2024-01-19 07:02:22 -06:00
2024-04-13 21:03:18 -05:00
2023-11-07 09:09:58 -06:00
2023-07-26 07:19:55 -07:00
2023-07-26 07:19:55 -07:00
2024-02-07 01:08:34 +01:00
2024-02-12 09:45:42 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-03-08 17:11:51 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-03-08 17:11:51 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-02-02 11:35:26 -08:00
2024-04-03 09:08:08 -05:00
2024-02-02 11:35:26 -08:00
2023-05-31 18:46:57 -05:00
2024-03-08 17:11:51 -08:00
2024-04-03 09:08:08 -05:00
2023-11-10 15:54:44 +01:00
2024-02-02 11:35:26 -08:00
2024-04-04 11:01:33 +02:00
2024-02-02 11:35:26 -08:00
2024-04-02 11:02:52 -05:00
2024-04-04 11:01:33 +02:00
2024-02-02 11:35:26 -08:00
2024-03-08 17:11:51 -08:00
2023-11-10 15:54:44 +01:00
2023-08-31 21:01:50 +08:00
2024-03-08 17:11:51 -08:00
2023-05-31 18:46:57 -05:00
2023-05-31 18:46:57 -05:00
2023-12-19 04:23:11 +08:00
2023-12-19 04:23:11 +08:00
2023-12-19 04:23:11 +08:00
2023-12-19 04:23:11 +08:00
2023-05-31 18:46:57 -05:00
2023-08-15 02:25:28 +08:00
2023-08-15 02:25:28 +08:00
2023-06-19 09:44:22 -05:00
2023-05-31 18:46:57 -05:00
2023-05-31 18:46:57 -05:00
2023-05-31 18:46:57 -05:00
2023-10-11 14:27:29 -05:00
2023-05-31 18:46:57 -05:00
2024-02-02 11:35:26 -08:00