Commit Graph

  • 569640dc70 Revert "Implement device grouped gemm fixed nk multi abd for rdna4 (#3619)" (#3705) Illia Silin 2026-02-03 09:52:14 -08:00
  • e31a68747a bugfixes users/randyspauldingamd/gtest_fixturemap Randy J. Spaulding 2026-02-03 17:45:35 +00:00
  • 7a62c5259e Added changelog entries LWPCK-3549-cleanups Sami Aario 2026-02-03 17:32:25 +00:00
  • 329eabd73b fix strides in mx gemm example Sami Remes 2026-02-03 17:25:47 +00:00
  • 17e8f2fa41 Move static constexpr to structs Enrico Degregori 2026-01-30 18:08:51 +00:00
  • 0655316e04 Fix usage of Problem static member BCastPolicy Enrico Degregori 2026-01-30 16:27:06 +00:00
  • 3dde9eb13b Fix rebase issues Enrico Degregori 2026-01-30 13:12:37 +00:00
  • b6a66c19e8 Finalize cleanup Enrico Degregori 2026-01-30 11:27:27 +00:00
  • afd3d2bd10 Clean up pipeline Enrico Degregori 2026-01-28 17:15:05 +00:00
  • 33f4f876cf Clean up tile_elementwise (casting not needed with new approach) Enrico Degregori 2026-01-28 16:32:54 +00:00
  • e907e1bdf1 Rename pipeline (not limited to fp4 any more) Enrico Degregori 2026-01-28 16:08:18 +00:00
  • 16491f04db Naming convention examples mx blockscale Enrico Degregori 2026-01-28 15:48:45 +00:00
  • e2c1dcd3d3 Refactor pipeline Enrico Degregori 2026-01-21 17:54:44 +00:00
  • 02b81bd868 Fix LDS read/write for 16/8 bit case Enrico Degregori 2026-01-20 14:31:18 +00:00
  • 45063ac39b Fix vectorsize buffer load for 16/8 bit case Enrico Degregori 2026-01-20 14:24:39 +00:00
  • 67c72d3ae8 Fix fp4 * scale -> bf16 to use pk instruction on gfx950 Enrico Degregori 2026-01-20 13:09:14 +00:00
  • aecc8b074d Use pk cvt instruction bf8 to bf16 on gfx950 Enrico Degregori 2026-01-19 16:52:51 +00:00
  • e79e609696 Add tests Enrico Degregori 2026-01-19 16:50:35 +00:00
  • 8684a671fc Generalize implementation to support different types Enrico Degregori 2026-01-16 17:39:35 +00:00
  • c6a8d06138 Merge branch 'develop' into tenpercent/tensor-descriptor-functor-optimization tenpercent/tensor-descriptor-functor-optimization Max Podkorytov 2026-02-03 09:04:14 -08:00
  • f96d74b55d Merge branch 'develop' into aviralgoel/gemm_tutorial Aviral Goel 2026-02-03 20:53:59 +04:00
  • e762171350 Merge branch 'develop' into aviralgoel/python-unbuffered-fix aviralgoel/python-unbuffered-fix Aviral Goel 2026-02-03 20:53:20 +04:00
  • fc1ff7a1f8 Merge commit '8cbd09c84a3010b4b3dbe2604875772363e2396b' into develop assistant-librarian[bot] 2026-02-03 16:29:00 +00:00
  • 4add4af76e [CK_TILE] Stream-K Tile Engine Test Config File Generation (#3662) Emily Martins 2026-02-03 09:12:15 -07:00
  • 1bc181c33f [CK_TILE] Stream-K Tile Engine Test Config File Generation (#3662) Emily Martins 2026-02-03 09:12:15 -07:00
  • 8cbd09c84a [CK_TILE] Stream-K Tile Engine Test Config File Generation (#3662) Emily Martins 2026-02-03 09:12:15 -07:00
  • fe1683aff1 Update CK Tile configs. Ville Pietilä 2026-02-03 10:58:47 -05:00
  • 9cca92f1db Use type string as a backend if instance string is empty. Ville Pietilä 2026-02-03 10:58:22 -05:00
  • dddfd00359 Merge branch 'develop' into vpietila/improved-fwd-merged-conv-group-instances vpietila/improved-fwd-merged-conv-group-instances Ville Pietilä 2026-02-03 17:21:29 +02:00
  • 5e93e62253 Use instance string conditionally in the CK profiler. Ville Pietilä 2026-02-03 10:18:12 -05:00
  • 535759239c Use instance strings in configs. Ville Pietilä 2026-02-03 10:16:29 -05:00
  • 92e8a1035e Built fix remove multi D functionality apoorva 2026-02-03 15:00:11 +00:00
  • a9657e14b8 updated instances jakpiase/conv_bwd_data_direct_loads Jakub Piasecki 2026-02-03 14:47:03 +00:00
  • 06b9553858 Fix instance generation script after relocation of fwd configs. Ville Pietilä 2026-02-03 09:10:10 -05:00
  • 92b03094f7 Add bwd data configs. Ville Pietilä 2026-02-03 08:54:17 -05:00
  • b131d59347 Add bwd wight configs. Ville Pietilä 2026-02-03 08:54:05 -05:00
  • 8e8e8b0216 Update fwd configs. Ville Pietilä 2026-02-03 08:53:32 -05:00
  • 3ec60914ad Add include statements added by remod.py LWPCK-3549 Sami Aario 2026-01-29 08:03:57 +00:00
  • 41299241b3 Add a changelog entry Sami Aario 2026-01-28 14:24:47 +00:00
  • 31c91a9535 Formatting changes Sami Aario 2026-01-28 14:23:26 +00:00
  • ad2d10a633 Switch to an implementation of DetermineWarpPrecType that explicitly defines the A and B types Sami Aario 2026-01-28 10:05:18 +00:00
  • 1167950528 Restrict the range of FillUniformDistributionIntegerValue for A and B to make tests pass Sami Aario 2026-01-26 14:59:28 +00:00
  • 298fd29fba Add and use load_tile_transpose_convert for mixed precision transpose loading Sami Aario 2026-01-26 09:26:59 +00:00
  • 7fef648bca Refactor type conversions out of MakeBLdsBlockDescriptor, WIP! Sami Aario 2025-12-18 09:14:11 +00:00
  • 1b610f4aaf Add type conversions to V4 pipeline, WIP! Sami Aario 2025-10-10 08:39:04 +00:00
  • 3a792017fb Add functionality and tests for fp16 x fp8 and fp8 x fp16 Sami Aario 2025-11-12 15:09:01 +00:00
  • f8c4868a59 Add functionality and tests for bf16 x fp8 and fp8 x bf16 Sami Aario 2025-10-09 09:04:13 +00:00
  • 3f4a85146c Add MFMA warp gemm for float, float, float, 32, 32, 16 Sami Aario 2025-11-12 12:38:04 +00:00
  • 7f22e8c66a Add and use load_with_type_convert Sami Aario 2025-11-12 09:04:15 +00:00
  • b41ed6e371 Introduce DetermineWarpPrecType for determining warp GEMM precision types Sami Aario 2025-10-09 08:07:04 +00:00
  • f2fcc4a461 Add NumAccess as a template parameter to WarpGemmAttributeMfma::get_warp_dstr_encoding Sami Aario 2025-11-28 09:20:19 +00:00
  • 933e09f6c3 Rename the parameters of load_interleaved_pk_type and load_and_convert_tile Sami Aario 2026-01-12 09:42:03 +00:00
  • 1bb2a7a158 Script for creating configs. Ville Pietilä 2026-02-03 08:51:36 -05:00
  • 1b30d45946 Add option to print out the available instances from CK profiler. Ville Pietilä 2026-02-03 08:51:19 -05:00
  • 3ef2978378 Revert "Reverted unused device impl and updated macros" apoorva 2026-02-03 13:08:09 +00:00
  • d514c8fca8 Revert "Reverted unused device impl and updated macros" apoorva 2026-02-03 13:04:31 +00:00
  • 8c8715904e Merge branch 'develop' into LWPCK-3549-cleanups SamiAario-AMD 2026-02-03 13:28:08 +02:00
  • ef6ce49698 Merge commit '3f04d27b687365332d2f1654f169444cab192927' into develop assistant-librarian[bot] 2026-02-03 11:22:52 +00:00
  • 0f8c7cad09 Remove concrete performance numbers from BUILD_TIME_OPTIMIZATION.md (#3702) Max Podkorytov 2026-02-03 02:54:18 -08:00
  • dcb0e63334 Remove concrete performance numbers from BUILD_TIME_OPTIMIZATION.md (#3702) Max Podkorytov 2026-02-03 02:54:18 -08:00
  • 3f04d27b68 Remove concrete performance numbers from BUILD_TIME_OPTIMIZATION.md (#3702) Max Podkorytov 2026-02-03 02:54:18 -08:00
  • 16fa73db63 use proper rtol/atol Sami Remes 2026-02-03 09:57:20 +00:00
  • eed83e838e chore: remove unnecessary conditional check eterpstr/preshuffle-bquant-for-abquant-preshuffleb Erwin Terpstra 2026-02-03 09:42:28 +00:00
  • d54eb1d350 Rename CK Tile fwd configs for builder. Ville Pietilä 2026-02-03 04:26:23 -05:00
  • 235a0c7805 fix: incorrect use of KPerBlock instead of NPerBlock Erwin Terpstra 2026-02-03 09:25:11 +00:00
  • 762398fb7c chore: split up example instances for abquant eterpstr/abquant-transposec-fix Erwin Terpstra 2026-02-03 09:21:59 +00:00
  • d132df2bf5 Finalize conv specialization for filter 3x3, pad 1, stride 1, dilation 1 case. Ville Pietilä 2026-02-03 04:10:09 -05:00
  • 41653735b8 Fixing merge conflict apoorva 2026-02-03 08:50:36 +00:00
  • 6b50755cd2 fix alignment calculation of lds tensor views Sami Remes 2026-02-03 08:24:03 +00:00
  • b47853d3fe enable fp4 for universal gemm - without any scaling Sami Remes 2026-02-03 03:10:35 -05:00
  • 082d22836d Merge branch 'develop' into vpietila/improved-fwd-merged-conv-group-instances Ville Pietilä 2026-02-03 10:02:06 +02:00
  • c093935e0c Split kv_block_descale_ptr into k_descale_ptr and v_descale_ptr to maintain flexibility. batch-prefill-fp8-kvcache-blockscale Jeff Huang 2026-02-03 12:36:00 +08:00
  • 5370472cde Merge branch 'develop' into mpodkory/recursive-to-pack-expansion mpodkory/recursive-to-pack-expansion Max Podkorytov 2026-02-02 19:27:13 -08:00
  • 8b07580161 Rename stride fields in FmhaFwdKVBlockScaleKargs Jeff Huang 2026-02-03 10:34:58 +08:00
  • 4933100b0f use statically_indexed_array instead of c-style array. Jeff Huang 2026-02-03 09:00:42 +08:00
  • e1af9b7afb 1. Relax kv_blockscale page_size restriction from == 1024 to >= kN0 2. Rename QScaleKargsSelector -> GetQScaleKargs for naming consistency 3. Remove unused BlockAttentionQuantScaleEnumToStr<KV_BLOCKSCALE> Jeff Huang 2026-02-02 23:27:10 +08:00
  • dc74e66e7b Add runtime check nullptr for prevent quantization parameters. Jeff Huang 2026-02-02 22:57:29 +08:00
  • 0aa1142bb5 [CK] Add FP8 KV_BLOCKSCALE support for batch prefill Jeff Huang 2026-01-29 09:36:55 +08:00
  • f27120c60e Merge commit '8b56ffb6aea4dd5e3c531912ee6b2258398606ee' into develop assistant-librarian[bot] 2026-02-03 03:12:05 +00:00
  • 2ec116ff79 Merge branch 'develop' into tenpercent/tensor-descriptor-functor-optimization Max Podkorytov 2026-02-02 18:46:06 -08:00
  • a5a7527d76 Fix one more lifetimebound error. (#3703) Illia Silin 2026-02-02 18:25:56 -08:00
  • 8d79fb88eb Fix one more lifetimebound error. (#3703) Illia Silin 2026-02-02 18:25:56 -08:00
  • 8b56ffb6ae Fix one more lifetimebound error. (#3703) Illia Silin 2026-02-02 18:25:56 -08:00
  • 08254d8388 Merge branch 'users/randyspauldingamd/gtest_fixturemap' into users/randyspauldingamd/dep_parser_monorepo Randy J. Spaulding 2026-02-02 21:21:59 -05:00
  • f673e1fe99 implement fixturemap Randy J. Spaulding 2026-02-02 21:20:28 -05:00
  • f9f25e37ec Add multi-file trace parsing and analysis pipeline jshumway/parse-build John Shumway 2026-01-26 18:16:02 -05:00
  • 39ea2de1d7 Fix path to ck tile conv fwd instance generator (#3699) Bartłomiej Kocot 2026-02-03 03:07:33 +01:00
  • 117abb6af4 Fix path to ck tile conv fwd instance generator (#3699) Bartłomiej Kocot 2026-02-03 03:07:33 +01:00
  • f2b9b3a3a6 Fix path to ck tile conv fwd instance generator (#3699) Bartłomiej Kocot 2026-02-03 03:07:33 +01:00
  • 61c7540788 Merge branch 'develop' into congma/ck_tile/fix_preshuffle_b Thomas Ning 2026-02-02 17:42:40 -08:00
  • 9c38bf0527 Merge commit '3e777217551c82a47eb9540791fb5542f2704e63' into develop assistant-librarian[bot] 2026-02-02 23:16:03 +00:00
  • b948026e16 feat: add split_k support for block scale gemm bquant mode. (#3653) Aviral Goel 2026-02-03 02:41:53 +04:00
  • 4ecc7da10e feat: add split_k support for block scale gemm bquant mode. (#3653) Aviral Goel 2026-02-03 02:41:53 +04:00
  • 3e77721755 feat: add split_k support for block scale gemm bquant mode. (#3653) Aviral Goel 2026-02-03 02:41:53 +04:00
  • 839a37780c Implement device grouped gemm fixed nk multi abd for rdna4 (#3619) Zoltán Lakatos 2026-02-02 22:58:11 +01:00
  • 1a8bd3d34b Implement device grouped gemm fixed nk multi abd for rdna4 (#3619) Zoltán Lakatos 2026-02-02 22:58:11 +01:00
  • 301eb5cf08 Implement device grouped gemm fixed nk multi abd for rdna4 (#3619) Zoltán Lakatos 2026-02-02 22:58:11 +01:00
  • b0b7f95d6e Merge commit '069500464de6a55b80e8341c79239b13ac8ef379' into develop assistant-librarian[bot] 2026-02-02 18:23:47 +00:00
  • cffc2f5f38 Use generalized vector_type_traits instead of scalar_type traits. Fixes incorrect slicing calculations from scalar_type::vector_size with f6_pk_t types. refactor_vector_type Chris Millette 2026-02-02 12:56:45 -05:00
  • cc0f42fee2 Workaround adjustment to scalar_type<pk_i4_t>::type. Skips invalid case for pk_i4_t, but should be addressed in the future. Chris Millette 2026-01-30 16:23:00 -05:00