Commit Graph

  • de995fea71 Various fixes Tianxing Wu 2025-11-18 13:04:58 +00:00
  • 0f894662a1 Correct the naming from kargs.NumToken to kargs.NumTokens Mohsen Saffari 2025-11-18 11:13:41 +00:00
  • e3f84e7851 [CK_TILE] Vector stores c col layout part3 Aleksander Dudek 2025-11-18 04:59:23 -06:00
  • adb21b46b0 add using builtin_amdgcn_global_atomic_fadd_v2bf16 for bf16 atomic add Mohsen Saffari 2025-11-18 10:50:17 +00:00
  • f05d4a2fed mixed-prec flatmm pipeline improve yadaish 2025-11-18 10:12:11 +00:00
  • 972f10873f Part1. Added helper files for enabling persistent async Kumar 2025-11-18 15:31:23 +05:30
  • c557f19704 Merge commit '3ede8e2a6e9a1c921f27e2d66442829a092cc646' into develop assistant-librarian[bot] 2025-11-18 09:14:09 +00:00
  • 6ef0b9da8c fixing Juuso Korhonen 2025-11-18 08:57:30 +00:00
  • acb3b43bc0 [CK_TILE] Non-K Major from old CK to CK-Tile - fix reverted PR (#3199) Sami Remes 2025-11-18 08:17:02 +00:00
  • 888474139d [CK_TILE] Non-K Major from old CK to CK-Tile - fix reverted PR (#3199) Sami Remes 2025-11-18 08:17:02 +00:00
  • 3ede8e2a6e [CK_TILE] Non-K Major from old CK to CK-Tile - fix reverted PR (#3199) Sami Remes 2025-11-18 08:17:02 +00:00
  • b9e9aa04c2 Merge commit 'b6720531de9cbbe5f6022f173ead11c61860f57f' into develop assistant-librarian[bot] 2025-11-18 06:16:01 +00:00
  • 7336398fb6 [CK_TILE] MX Flatmm Split kernel instances (#3207) Yi DING 2025-11-18 13:46:30 +08:00
  • e2060bd1fb [CK_TILE] MX Flatmm Split kernel instances (#3207) Yi DING 2025-11-18 13:46:30 +08:00
  • b6720531de [CK_TILE] MX Flatmm Split kernel instances (#3207) Yi DING 2025-11-18 13:46:30 +08:00
  • d347998735 Use async for flatmm mxfp4 flatmm-mxfp4-async Ding, Yi 2025-11-10 08:30:55 +00:00
  • 13581e52c3 Fix flatmm example compile Ding, Yi 2025-11-13 03:04:30 +00:00
  • dd388b7294 [CK_TILE] MX Flatmm Split kernel instances Ding, Yi 2025-11-04 08:51:56 +00:00
  • c5fcc2a9ec Partial Progress : Restructure structure ThruptiRajLakshmanaGowda 2025-11-18 00:46:49 +00:00
  • ca68d8728c Merge commit '92498464f6ede6c4b1f990a57193c47b52530030' into develop assistant-librarian[bot] 2025-11-18 00:35:57 +00:00
  • cad9d98976 [CK_Builder] removed direction and elementwise_operation from required parameters … (#3192) kabrahamAMD 2025-11-18 00:23:48 +01:00
  • ada21c86cc [CK_Builder] removed direction and elementwise_operation from required parameters … (#3192) kabrahamAMD 2025-11-18 00:23:48 +01:00
  • 92498464f6 [CK_Builder] removed direction and elementwise_operation from required parameters … (#3192) kabrahamAMD 2025-11-18 00:23:48 +01:00
  • 1002b7ebee Partial Progress : Boiler plate code ThruptiRajLakshmanaGowda 2025-11-17 22:56:43 +00:00
  • 9facd029b8 [CK_TILE] Vector stores c col layout part2 Aleksander Dudek 2025-11-17 15:29:00 -06:00
  • fe399011e1 [CK_TILE] TEST vector stores c col layout part1 Aleksander Dudek 2025-10-23 03:09:39 -05:00
  • 46f27e2ab0 [CK_TILE] working version and tests Aleksander Dudek 2025-10-22 10:22:47 -05:00
  • 7785fdb3a6 [CK_TILE] working version Aleksander Dudek 2025-10-22 09:45:30 -05:00
  • cd7f41fddf Restructuring boiler plate code ThruptiRajLakshmanaGowda 2025-11-17 22:02:01 +00:00
  • c1e3b96609 [CK_TILE] Enable vector stores for C Column Layout part1 Aleksander Dudek 2025-10-21 06:20:59 -05:00
  • e45991b379 [CK_TILE] Enable vector stores for C Column Layout part1 Aleksander Dudek 2025-10-21 06:19:38 -05:00
  • c6712a96ff Merge commit '22a934a2294b778521a85e179c14155b6f72a2e4' into develop assistant-librarian[bot] 2025-11-17 17:13:23 +00:00
  • 41ef9a10f5 chore(copyright): update copyright header for include directory (#3219) Aviral Goel 2025-11-17 11:57:45 -05:00
  • 540c8377cf chore(copyright): update copyright header for include directory (#3219) Aviral Goel 2025-11-17 11:57:45 -05:00
  • 22a934a229 chore(copyright): update copyright header for include directory (#3219) Aviral Goel 2025-11-17 11:57:45 -05:00
  • 8f44fc9593 mem calculation fixed Tianxing Wu 2025-11-17 14:49:09 +00:00
  • ff28bd21ba flops and mem calculation Tianxing Wu 2025-11-17 13:55:54 +00:00
  • d3c5faf47e Assert block_size num_queries_per_kv Tianxing Wu 2025-11-17 12:40:31 +00:00
  • ebbf9f3169 ruff - python linting to fix pre-commit action Philip Maybank 2025-11-17 12:06:33 +00:00
  • 27eb3b347d add another Xdl policy and improve indexing - 1 Philip Maybank 2025-11-17 11:48:06 +00:00
  • 3c270494dd add another Xdl policy and improve indexing Philip Maybank 2025-11-17 11:45:40 +00:00
  • 9b68bbd425 Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention Tianxing Wu 2025-11-17 10:06:05 +00:00
  • 5e2fd848b9 remove unneeded args Tianxing Wu 2025-11-17 10:04:30 +00:00
  • 5d2a9e5f16 deving the test... Juuso Korhonen 2025-11-17 09:46:31 +00:00
  • 5e43fd2dfc refactor to clearer BLOCK Q logic Juuso Korhonen 2025-11-17 08:27:19 +00:00
  • 57a0ec8cc1 add handling for -1 k heads arg Juuso Korhonen 2025-11-17 07:36:42 +00:00
  • 4a13749f7f fix to example Juuso Korhonen 2025-11-17 07:33:10 +00:00
  • b75077475b Remove useless codes in the two trload pipelines Qianfeng Zhang 2025-11-15 13:48:00 +00:00
  • 907dc988dc Merge branch 'develop' into rocking/fmha-fp8-pertensor rocking 2025-11-15 17:29:11 +08:00
  • 9afbb81e57 Add support for full types (not just aliases) in type-print Amir Ghamarian 2025-11-15 08:54:27 +00:00
  • 54282fc7b2 Merge commit 'b38bb492a1a55b5abb0c345962143c0f9c482cfb' into develop assistant-librarian[bot] 2025-11-15 01:40:21 +00:00
  • f8ec330b69 Disable DL kernels on all architectures except gfx103x. (#3218) Illia Silin 2025-11-14 17:39:50 -08:00
  • dbe4c1c957 Disable DL kernels on all architectures except gfx103x. (#3218) Illia Silin 2025-11-14 17:39:50 -08:00
  • b38bb492a1 Disable DL kernels on all architectures except gfx103x. (#3218) Illia Silin 2025-11-14 17:39:50 -08:00
  • f5856af85e Merge branch 'develop' into lwpck-3984 khuagarw 2025-11-14 21:36:36 +00:00
  • b4e313286b Merge commit '0aadb4b2c4114a26147c30abc894f2693795b888' into develop assistant-librarian[bot] 2025-11-14 20:13:54 +00:00
  • 0577b5dd78 chore(copyright): update copyright header for profiler directory (#3205) Aviral Goel 2025-11-14 14:19:25 -05:00
  • 4cbde83cb6 chore(copyright): update copyright header for profiler directory (#3205) Aviral Goel 2025-11-14 14:19:25 -05:00
  • 0aadb4b2c4 chore(copyright): update copyright header for profiler directory (#3205) Aviral Goel 2025-11-14 14:19:25 -05:00
  • ac02ddd324 Merge commit '3aa883b9ffd3dc4c18414b818774d3da94b8b9e1' into develop assistant-librarian[bot] 2025-11-14 17:12:11 +00:00
  • 238b5c4f08 Separate Traits from Problem while being used for defining the pipeline Qianfeng Zhang 2025-11-14 16:08:14 +00:00
  • 90503f7e3d chore(copyright): update copyright header for python directory (#3200) Aviral Goel 2025-11-14 11:21:36 -05:00
  • e7393f3fd7 chore(copyright): update copyright header for python directory (#3200) Aviral Goel 2025-11-14 11:21:36 -05:00
  • 3aa883b9ff chore(copyright): update copyright header for python directory (#3200) Aviral Goel 2025-11-14 11:21:36 -05:00
  • 72dbbc7d77 Add new gemm multiply multiply instances on gfx950 (#3213) jefyang1 2025-11-14 10:20:41 -06:00
  • 452f0ffbe4 Add new gemm multiply multiply instances on gfx950 (#3213) jefyang1 2025-11-14 10:20:41 -06:00
  • d30babbd00 Add new gemm multiply multiply instances on gfx950 (#3213) jefyang1 2025-11-14 10:20:41 -06:00
  • c44215f718 call async pipeline and add 192_128 instance ck_tile/test_qv192_v128 ck_tile/fmha_in_fp8_async_192_128 ltqin 2025-11-14 14:50:05 +00:00
  • f30c38becc fix file paths and add an index page with pseudo-code for each pipeline Philip Maybank 2025-11-14 10:44:25 +00:00
  • cd88a70d26 Merge branch 'develop' into rocking/fmha-fp8-pertensor rocking 2025-11-14 13:23:02 +08:00
  • 32b8a73252 Merge commit 'caadb896f1e01032a9d9a7db8484f9d1f3861f1e' into develop assistant-librarian[bot] 2025-11-14 05:13:13 +00:00
  • b49e30206f 7.2 version bump (#3210) John Afaganis 2025-11-13 22:04:03 -07:00
  • b8f01a2f11 7.2 version bump (#3210) John Afaganis 2025-11-13 22:04:03 -07:00
  • caadb896f1 7.2 version bump (#3210) John Afaganis 2025-11-13 22:04:03 -07:00
  • 897c2bd422 Merge commit '4d629cd2b0bb0b4b210881be0db398bcd382f444' into develop assistant-librarian[bot] 2025-11-14 02:43:22 +00:00
  • 807c297a17 fix build error (#3195) BingYuan.Zhou 2025-11-14 09:46:13 +08:00
  • 3800080d25 fix build error (#3195) BingYuan.Zhou 2025-11-14 09:46:13 +08:00
  • 4d629cd2b0 fix build error (#3195) BingYuan.Zhou 2025-11-14 09:46:13 +08:00
  • fda95832b0 [CK_TILE] Improve device printing (#3198) Yi DING 2025-11-14 09:46:06 +08:00
  • 1c30bad4c7 [CK_TILE] Improve device printing (#3198) Yi DING 2025-11-14 09:46:06 +08:00
  • 4a8b17d1a4 [CK_TILE] Improve device printing (#3198) Yi DING 2025-11-14 09:46:06 +08:00
  • 07700cc4fa fixing CI failures for grouped quant gemm khuagarw 2025-11-14 01:09:09 +00:00
  • a96aded2b1 Merge commit '2a73eb3bc0828db654c73058f20a2b794c16cb01' into develop assistant-librarian[bot] 2025-11-14 00:36:42 +00:00
  • bdbe3e4eb9 Simulate TF32 with BF16x3 (#3142) yinglu 2025-11-14 08:21:09 +08:00
  • 126a2a4cf4 Simulate TF32 with BF16x3 (#3142) yinglu 2025-11-14 08:21:09 +08:00
  • 2a73eb3bc0 Simulate TF32 with BF16x3 (#3142) yinglu 2025-11-14 08:21:09 +08:00
  • 36f2f873de updating cmake file khuagarw 2025-11-13 22:18:20 +00:00
  • 48e75593f2 updating readme and correcponding gemmconfigs khuagarw 2025-11-13 22:16:58 +00:00
  • 903800f9af rebase with develop khuagarw 2025-11-13 21:39:42 +00:00
  • ac204da8f1 Support qscale for dynamic quant, remove static quant rocking 2025-11-13 08:22:52 +08:00
  • acd5abe4f1 Merge commit 'f2cfc6b94ee3154697030c4dfa214040bb4af4c9' into develop assistant-librarian[bot] 2025-11-13 19:11:21 +00:00
  • d49eb1d431 Remove "basic" and universal GEMM tests, and incorporate their test cases into the GEMM pipeline tests (#3094) SamiAario-AMD 2025-11-13 21:01:27 +02:00
  • 376a62fcc1 Remove "basic" and universal GEMM tests, and incorporate their test cases into the GEMM pipeline tests (#3094) SamiAario-AMD 2025-11-13 21:01:27 +02:00
  • f2cfc6b94e Remove "basic" and universal GEMM tests, and incorporate their test cases into the GEMM pipeline tests (#3094) SamiAario-AMD 2025-11-13 21:01:27 +02:00
  • ea1ea41889 initial work on producing a reference that explains each pipeline policy class Philip Maybank 2025-11-13 17:16:14 +00:00
  • 0997e2eb6d Merge commit '7d57bc169f8206f06bc516a7f930f388def32347' into develop assistant-librarian[bot] 2025-11-13 17:13:19 +00:00
  • 547165ce4c [CK_BUILDER] Forward convolution builder improvements (#3179) Ville Pietilä 2025-11-13 18:47:25 +02:00
  • 91f7d4ac75 [CK_BUILDER] Forward convolution builder improvements (#3179) Ville Pietilä 2025-11-13 18:47:25 +02:00
  • 7d57bc169f [CK_BUILDER] Forward convolution builder improvements (#3179) Ville Pietilä 2025-11-13 18:47:25 +02:00
  • 46929142bf Merge commit 'ca2ee0eb8ae4069175df9e4731c7b0aed56d6c8d' into develop assistant-librarian[bot] 2025-11-13 16:14:02 +00:00