Commit Graph

  • f5188305f3 Fixes aliasing for vector size of 1 Chris Millette 2026-01-30 15:47:19 -05:00
  • 13a5177923 Fixes scalar_type definition for llvm builtin mma type Chris Millette 2026-01-29 13:33:38 -05:00
  • b731dc17d1 Re-home NativeVectorT and ensure partial specialization with scalar_type is visible where needed Chris Millette 2026-01-23 23:01:21 +00:00
  • c3e573dc7a Fixes clang formatting and adjust storage class type check Chris Millette 2026-01-23 19:57:20 +00:00
  • 8b307fd936 Apply suggestions from code review Christopher Millette 2026-01-23 11:03:51 -07:00
  • c1a5bbab88 Refactor vector_type to reduce build time Chris Millette 2026-01-22 20:30:26 +00:00
  • 470f031e58 [Compiler] Addressing new compiler warnings (#3640) Jan Patrick Lehr 2026-02-02 18:39:48 +01:00
  • 4dece9c549 [Compiler] Addressing new compiler warnings (#3640) Jan Patrick Lehr 2026-02-02 18:39:48 +01:00
  • 069500464d [Compiler] Addressing new compiler warnings (#3640) Jan Patrick Lehr 2026-02-02 18:39:48 +01:00
  • bdeef72a3e Merge branch 'develop' into mpodkory/recursive-to-pack-expansion Max Podkorytov 2026-02-02 09:16:59 -08:00
  • fa981847e5 Merge branch 'develop' into tenpercent/tensor-descriptor-functor-optimization Max Podkorytov 2026-02-02 09:15:49 -08:00
  • 0b98283acc Merge branch 'develop' into congma/ck_tile/fix_preshuffle_b Thomas Ning 2026-02-02 08:59:50 -08:00
  • 7d39618a79 Add profiling documentation. vpietila/ck-profiling-documentation Ville Pietilä 2026-02-02 11:15:46 -05:00
  • a814ba15fd Add profiling documentation. Ville Pietilä 2026-02-02 11:15:46 -05:00
  • 0a8c5f523a [Performance] Use N0Sub=16 for trload with softmax pipeline to reduce vgpr spilling Qianfeng Zhang 2026-02-02 15:59:38 +00:00
  • ae834e1a68 editing include files as of renamed files apoorva 2026-02-02 14:04:58 +00:00
  • 845e14d730 Reverted unused device impl and updated macros apoorva 2026-02-02 12:58:49 +00:00
  • 1ea3723655 Fixed clang format apoorva 2026-02-02 11:40:37 +00:00
  • 55204c3ce0 Added instances and fixed test failures in bwd_wei apoorva 2026-02-02 11:38:56 +00:00
  • 21e9dc2ef2 Refactored and fixed formatting of bwd_data instances apoorva 2026-02-02 11:38:37 +00:00
  • cc395ff4fc wave tile support for bwd_data and bwd_wei apoorva 2026-01-27 20:14:53 +00:00
  • 2cbc4ce2a3 Temp changes of bwd_wei(not working) apoorva 2026-01-27 10:16:22 +00:00
  • 6ff3d8b36c Added bwd_data cwave tile transfer support apoorva 2026-01-21 15:22:37 +00:00
  • 4eceb2fc69 Fix a build break Sami Aario 2026-02-02 14:53:58 +00:00
  • ff428f3478 WIP Matti Eskelinen 2026-02-02 09:45:43 -05:00
  • aa247e2d63 Fix a build break Sami Aario 2026-02-02 14:30:39 +00:00
  • fc977c88a2 Add print overload for tile_distributed_index Matti Eskelinen 2026-02-02 05:39:11 -05:00
  • 70be645270 Fix a build break Sami Aario 2026-02-02 10:27:58 +00:00
  • 88a74ec317 WIP Matti Eskelinen 2026-02-02 05:05:06 -05:00
  • 348b555cc3 Merge remote-tracking branch 'origin/develop' into LWPCK-3549-cleanups Sami Aario 2026-02-02 10:00:44 +00:00
  • 659f094300 disabling the gemm tile engine tests arai/ck_tile/tile_engine_restructure Astha Rai 2026-01-23 23:38:20 +00:00
  • 3791cfd71d Adding README back into the gemm directory and integrate new preshuffle functions Astha 2026-01-11 23:15:28 -05:00
  • 523ec7c863 Restructure Tile Engine's profiling process Astha 2026-01-08 04:39:21 -05:00
  • f15996d0f3 Restructure Tile Engine's benchmarking process Astha 2025-11-26 13:55:40 -05:00
  • 6b3e501d83 Merge commit 'e6bcd192d432561642d45ea5b1c759d6f80ace2a' into develop assistant-librarian[bot] 2026-02-02 08:25:21 +00:00
  • c006b10452 Mx fp6 flatmm (#3601) ZheWang 2026-02-02 16:04:40 +08:00
  • 418ee44844 Mx fp6 flatmm (#3601) ZheWang 2026-02-02 16:04:40 +08:00
  • e6bcd192d4 Mx fp6 flatmm (#3601) heuristics-tile-gemm ZheWang 2026-02-02 16:04:40 +08:00
  • bd81d645e2 Add one more specialization. Ville Pietilä 2026-02-02 03:03:13 -05:00
  • 6ea40157f1 Add last steps: activations functions Damien Lejeune 2026-01-29 08:30:45 -05:00
  • c4d95db73d Simplify condition for setting tile values fix_gptoss_sink Linjun-AMD 2026-02-02 15:48:26 +08:00
  • 7f0d5cdcc9 update codegen for sink Linjun-AMD 2026-02-02 01:36:34 -06:00
  • a812b044ae updated fmha_args Linjun-AMD 2026-02-02 01:34:23 -06:00
  • 93ef0b4fad optimized some code for gptoss sink Linjun-AMD 2026-02-02 01:19:16 -06:00
  • be5b26d3d1 chore: empty commit to trigger CI again Erwin Terpstra 2026-02-02 07:19:12 +00:00
  • 869c58e792 Merge branch 'develop' into jograner/bwd-weight-instance jograner/bwd-weight-instance Johannes Graner 2026-02-02 07:59:44 +01:00
  • 7e5b6a9592 Merge branch 'develop' into gemm_blockscale_eightwarps-merge-a4w4 ck_tile/gemm_blockscale_eightwarps Ding, Yi 2026-02-02 05:51:49 +00:00
  • e88f96dce1 Merge branch 'develop' into cshuffle-fix Thomas Ning 2026-02-01 21:13:37 -08:00
  • d306714be6 Merge branch 'develop' into congma/ck_tile/fix_preshuffle_b Thomas Ning 2026-02-01 21:08:10 -08:00
  • a8829efa34 Revert "[CK_Tile] Support for a4w4 (fp4) in block scale gemm AB quant (#3603)" revert-3603-eterpstr/206-block-scale-gemm-fp4-support Yi DING 2026-02-02 11:34:41 +08:00
  • 913ad049eb Merge branch 'develop' into ck_tile/gemm_blockscale_eightwarps Ding, Yi 2026-02-02 03:23:39 +00:00
  • 886f76cf93 Merge branch 'develop' into ck_tile/gemm_blockscale_eightwarps Ding, Yi 2026-02-02 02:30:14 +00:00
  • 83925cd0c0 Merge branch 'develop' into jeonghyun/ckb-almiopen-522-descriptor-init JH-Leon-KIM-AMD 2026-02-01 10:23:48 +00:00
  • 758921f999 [CK TILE] fix bugs of preshuffle_b Cong Ma 2026-01-31 18:40:42 -05:00
  • 2d624e5a9f Merge commit '1ae83137eb444bba1ba8b064eb77c2e486d90d7d' into develop assistant-librarian[bot] 2026-01-31 23:13:17 +00:00
  • cb58ffa4b8 Enable Grouped Conv Tile Fwd Tests daily (#3680) Bartłomiej Kocot 2026-01-31 23:55:25 +01:00
  • c8d112deb5 Enable Grouped Conv Tile Fwd Tests daily (#3680) Bartłomiej Kocot 2026-01-31 23:55:25 +01:00
  • 1ae83137eb Enable Grouped Conv Tile Fwd Tests daily (#3680) Bartłomiej Kocot 2026-01-31 23:55:25 +01:00
  • fbb073f276 Update device_grouped_conv_bwd_data_multiple_d_xdl_cshuffle_v3.hpp features/grouped-conv-perf-uplift Bartłomiej Kocot 2026-01-31 20:46:58 +01:00
  • c729f3992e Update device_grouped_conv_bwd_data_multiple_d_xdl_cshuffle_v3.hpp Bartłomiej Kocot 2026-01-31 20:46:10 +01:00
  • 8e3ec4765d chore: empty commit to trigger CI again Erwin Terpstra 2026-01-31 10:38:03 +00:00
  • 7d182d8628 WIP: Tensor adaptors Andriy Roshchenko 2026-01-31 06:49:26 +00:00
  • 4340c1c399 [CK TILE] fix bugs of preshuffle_b Cong Ma 2026-01-29 22:18:26 -05:00
  • 1522325e99 Merge branch 'develop' into aviralgoel/test_labels aviralgoel/test_labels Aviral Goel 2026-01-31 02:49:31 +04:00
  • 9e0594f272 first instance of bwd data factory Kevin Abraham 2026-01-30 21:00:04 +00:00
  • 9f6e6ad41c [CK Tools] Auto-enable unbuffered output for Python commands AviralGoelAMD 2026-01-30 14:27:01 -06:00
  • 19d77f522e Merge commit '8c1788757a88ee03bc8dbeb69704832c99fa719c' into develop assistant-librarian[bot] 2026-01-30 20:16:06 +00:00
  • 59a132c68d [CK_TILE] Fix incompatible vector type arguments for the intrinsic calls (#3672) Po Yen Chen 2026-01-31 04:02:49 +08:00
  • 4947f0306c [CK_TILE] Fix incompatible vector type arguments for the intrinsic calls (#3672) Po Yen Chen 2026-01-31 04:02:49 +08:00
  • 8c1788757a [CK_TILE] Fix incompatible vector type arguments for the intrinsic calls (#3672) Po Yen Chen 2026-01-31 04:02:49 +08:00
  • 2086516deb fixed building errors Jakub Piasecki 2026-01-30 19:22:34 +00:00
  • ae2d2d9f2c fixed conflicts Jakub Piasecki 2026-01-30 18:29:09 +00:00
  • a7b57187cf Grouped Convolution Backward Data Direct Load Bartlomiej Kocot 2026-01-30 00:05:26 +00:00
  • b1081a3b29 Merge remote-tracking branch 'origin/develop' into jakpiase/conv_bwd_data_direct_loads Jakub Piasecki 2026-01-30 18:29:31 +00:00
  • 07455223a2 added instances and fixes Jakub Piasecki 2026-01-30 18:29:09 +00:00
  • 8bcc4bcacf Merge commit '70d71b1514cc650ef7808d8757097f2d8617d313' into develop assistant-librarian[bot] 2026-01-30 18:22:07 +00:00
  • 4d241289c9 use default scale (no scale) for 16x16x128 mfma scale Sami Remes 2026-01-30 12:55:46 -05:00
  • 407df88c02 enable 32 element for fp4 Sami Remes 2026-01-30 12:47:45 -05:00
  • b8cdea5979 enable fp8 mx gemm too Sami Remes 2026-01-30 12:43:49 -05:00
  • 771c46aa8b add initial version for scale block_gemm, not used yet Sami Remes 2026-01-30 12:42:45 -05:00
  • b124a72ff5 revert mostly back to original comp_async Sami Remes 2026-01-30 12:40:48 -05:00
  • 55f0489b03 Test fix for gemm_b_scale_xdl_v3. (#3674) ApoorvaKalyani 2026-01-30 18:34:54 +01:00
  • 629573e3e3 Test fix for gemm_b_scale_xdl_v3. (#3674) ApoorvaKalyani 2026-01-30 18:34:54 +01:00
  • 70d71b1514 Test fix for gemm_b_scale_xdl_v3. (#3674) ApoorvaKalyani 2026-01-30 18:34:54 +01:00
  • 1559a473a8 Merge commit '63df1c0af2b559a6129afb5392fc560d99980926' into develop assistant-librarian[bot] 2026-01-30 17:22:22 +00:00
  • ab24c0ffe9 remove builds on legacy OSs from CI (#3693) Illia Silin 2026-01-30 09:15:09 -08:00
  • 7fbe9af19d remove builds on legacy OSs from CI (#3693) Illia Silin 2026-01-30 09:15:09 -08:00
  • 63df1c0af2 remove builds on legacy OSs from CI (#3693) Illia Silin 2026-01-30 09:15:09 -08:00
  • ce51308aaf [CK_TILE][FMHA] Add sparse attention VSA (#3341) jiangyon.ren 2026-01-31 00:59:47 +08:00
  • f6d2ca82b7 [CK_TILE][FMHA] Add sparse attention VSA (#3341) jiangyon.ren 2026-01-31 00:59:47 +08:00
  • 4d2f8c111e [CK_TILE][FMHA] Add sparse attention VSA (#3341) jiangyon.ren 2026-01-31 00:59:47 +08:00
  • 99264c6908 chore: removed PermuteN override again, as this seemed not to be the issue for 1D block scale Erwin Terpstra 2026-01-30 16:24:11 +00:00
  • 6d1282b943 Merge commit '2377a628373f2c4dd8b92ae9f853b1fb14c55953' into develop assistant-librarian[bot] 2026-01-30 16:20:17 +00:00
  • 65c2e81817 Adding remaining conv, dynamic_op, and scaleadd_scaleadd_relu flavors for grouped conv fwd (#3529) Kiefer van Teutem 2026-01-30 17:02:14 +01:00
  • d34916a7ff Adding remaining conv, dynamic_op, and scaleadd_scaleadd_relu flavors for grouped conv fwd (#3529) Kiefer van Teutem 2026-01-30 17:02:14 +01:00
  • 2377a62837 Adding remaining conv, dynamic_op, and scaleadd_scaleadd_relu flavors for grouped conv fwd (#3529) Kiefer van Teutem 2026-01-30 17:02:14 +01:00
  • c815e734c7 Add good instance Graner, Johannes 2026-01-30 10:45:31 -05:00
  • a56307b07e Add good instance Graner, Johannes 2026-01-30 10:45:31 -05:00
  • 227cb33a93 Testing mh/testing MHYang 2026-01-30 23:35:39 +08:00
  • 486eac508f Merge branch 'develop' into vpietila/add-fwd-conv-v3-instances-for-unit-group-size vpietila/add-fwd-conv-v3-instances-for-unit-group-size Ville Pietilä 2026-01-30 17:32:26 +02:00