mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-07-01 04:07:56 +00:00
276863ca874bedeee72fa8f46094de085c258aa6
9 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
96c39b331e |
[rocm-libraries] ROCm/rocm-libraries#7829 (commit 13af7da)
[ck] Enforce ASCII-only C/C++ sources for hipRTC compatibility (#7829) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary CK source files must be compilable via **hipRTC (HIP runtime compilation)**, whose preprocessor does not accept non-ASCII bytes anywhere in a translation unit — **including in comments**. Bytes that are harmless under `hipcc` (em-dashes, smart quotes, multiplication signs, Greek letters, box-drawing glyphs, etc.) cause hipRTC to fail at preprocessing time. These regularly leak in via LLM-assisted authoring or copy/paste from formatted documents and silently break hipRTC paths that are not exercised by the default `hipcc`-based build matrix. This PR (a) cleans every existing violation (53 files) and (b) adds a pre-checkin gate so new violations are rejected before merge. ## File extensions covered Both the cleanup scan and the new Jenkins enforcement stage use the same predicate: ``` *.h *.hpp *.cpp *.h.in *.hpp.in *.cpp.in *.inc *.cl ``` (excluding `*/build/*` and `*/include/rapidjson/*`). This is a strict superset of the existing `Clang Format` stage's predicate — `*.inc` is added so test-fixture include files are also gated. The local pre-commit hook's `c++/inc` type filter covers the same set. ## Why no enforcement today CK is opted out of the rocm-libraries root `.pre-commit-config.yaml`, so the existing `pre-commit` workflow doesn't touch CK. The local CK `.pre-commit-config.yaml` only runs for developers who installed hooks. The **authoritative gate is therefore the new Jenkins stage** in this PR; the local hook is convenience. ## Commit layout (bisect-friendly) 1. `79798aa6261` — **`[ck] Convert reflect/ rendering to ASCII for hipRTC compatibility`** Behavior change, isolated. `TreeFormatter` swaps `├─ / └─ / │ ` for `|- / +- / | ` (3-col width preserved so alignment is unchanged). `conv_description.hpp` swaps `×` for `x` as the dimension separator. `test_conv_description.cpp` expected strings updated in lockstep so the snapshot test stays green. This is the only commit in the series with observable runtime impact. 2. `738fdb0d81c` — **`[ck] Strip non-ASCII bytes from C++ sources for hipRTC compatibility`** Mechanical text cleanup across 53 files. Replacements happen in comments or in `std::cout` strings that are not asserted on by any test. None of the 174 `.inc` files in the tree required edits, but they were in the scan's predicate so the enforcement stage's predicate is a superset of what was scanned. Full replacement table in the commit message. 3. `1d7cd8ba235` — **`[ck] Enforce ASCII-only C/C++ sources for hipRTC compatibility`** - New `projects/composablekernel/script/check_ascii_only.sh` (modeled on `check_copyright_year.sh`). - New entry in `projects/composablekernel/.pre-commit-config.yaml` under the local-hooks block (`types_or: [c++, inc]`). - New `ASCII Only Check` parallel stage in `projects/composablekernel/Jenkinsfile`'s `Static checks` block, mirroring the existing `Clang Format` stage but with `*.inc` added to the find predicate. Always-on, no `RUN_CPPCHECK` gate. The tree is buildable at every commit boundary. Commit 1 leaves 50 known violations; commit 2 leaves 0; commit 3 wires the gate. ## Demo Script output on a synthesized violation: ``` $ printf '// em-dash test \xe2\x80\x94 here\n' > /tmp/bad.cpp $ projects/composablekernel/script/check_ascii_only.sh /tmp/bad.cpp ERROR: /tmp/bad.cpp contains non-ASCII bytes: 1:// em-dash test — here Fix: replace with ASCII (em-dash -> --, smart quotes -> ", arrows -> ->, etc.) $ echo $? 1 ``` Full repo scan after the cleanup commits (note the `-name '*.inc'` clause): ``` $ cd projects/composablekernel && find . -type f \( -name '*.h' -o -name '*.hpp' -o -name '*.cpp' \ -o -name '*.h.in' -o -name '*.hpp.in' -o -name '*.cpp.in' -o -name '*.inc' -o -name '*.cl' \) \ -not -path '*/build/*' -not -path '*/include/rapidjson/*' -print0 \ | xargs -0 -P 8 -n 64 script/check_ascii_only.sh $ echo $? 0 ``` ## Test plan - [ ] Jenkins PR build: confirm new `Static checks -> ASCII Only Check` stage runs green over the full predicate (incl. `*.inc`) and existing `Clang Format` stage is unaffected. - [ ] `test_conv_description` passes against the ASCII tree-formatter output (touched in commit 1). - [ ] Local: `pre-commit run ascii-only-checker --all-files` runs cleanly after installing CK pre-commit hooks via `script/install_precommit.sh`. - [ ] Manually inject a non-ASCII byte in any `.cpp/.hpp/.inc` file, push: confirm Jenkins fails the new stage with a clear error. - [ ] Spot-check a representative subset of touched files under hipRTC compilation to confirm no remaining hipRTC-blocking content (optional, since the static byte check is a sufficient condition for hipRTC preprocessor acceptance on this dimension). 🤖 Generated with [Claude Code](https://claude.com/claude-code) |
||
|
|
e02c566795 |
[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)
[CK] upgrade CI to rocm7.13 as default compiler (#7612) ## Motivation Upgrade the default docker and compiler version in CI to rocm7.13. In order to pass all the checks I had to also clean up a lot of non-ascii characters in the source code comments and modify a couple of tests that were affected by a new compiler logic. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com> |
||
|
|
004784ef98 |
chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template * Fix build --------- Co-authored-by: Sami Remes <samremes@amd.com> |
||
|
|
d85f065b15 |
chore(copyright): update copyright header for example directory (#3273)
* chore(copyright): update copyright header for codegen directory * chore(copyright): update copyright header for example directory |
||
|
|
ad57f6ef0b |
[CK_BUILDER] Put global CK functions in an the CK namespace (#3232)
* Wrap ck host utitlies in CK namespace. The CK and CK-Tile source code bases are incompatible because CK is not properly using namespaces everywhere. In particular, we need to put hip_check_error in the ck namespace. Move all functions in include/ck_/host_utility that were in global namespace into the ck namespace. There may be additional namespace problems like this, and it's possible we'll have namespace clashes. But it is good design to properly guard our to code bases (CK and CKTile) so that they can both coexist. Moreover, estabilishing this compatiblity is essential if we are going to allow the builder to instantiate kernels from either template library. * Add using declarations to test code. After moving some of the untils into the ck namespace, most examples and a few tests had to be updated to recognize the new namespace declarations. We add using declarations to individual compute units for functions that were previously in the global namespace. * Add using declarations to client examples. |
||
|
|
92c67a824f |
[DOCS] Documentation Addition (Readme updates) (#2495)
* GH-2368 Adding a basic glossary GH-2368 Minor edits GH-2368 Adding missing READMEs and standardization. resolving readme updates GH-2368 Minor improvements to documentation. Improving some readmes. Further improvement for readmes. Cleaned up the documentation in 'client_example' (#2468) Update for PR Update ACRONYMS.md to remove trivial terms Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine. revise 37_transpose readme revise 36_copy readme Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity. Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity. Remove references to the Tile Engine in README files across multiple examples * GH-2368 Adding a basic glossary GH-2368 Minor edits GH-2368 Adding missing READMEs and standardization. resolving readme updates GH-2368 Minor improvements to documentation. Improving some readmes. Further improvement for readmes. Cleaned up the documentation in 'client_example' (#2468) Update for PR Update ACRONYMS.md to remove trivial terms Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine. revise 37_transpose readme revise 36_copy readme Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity. Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity. Remove references to the Tile Engine in README files across multiple examples Refine README files by removing outdated references to the Tile Engine * Updates based on PR feedback 1 * Updates based on PR feedback 2 * Updates based on PR feedback 3 * Updates based on PR feedback 4 * Updates based on PR feedback 5 * Updates based on PR feedback 6 * Updates based on PR feedback 7 * Updates based on PR feedback 8 * Content Modification of CK Tile Example * Modify the ck_tile gemm config --------- Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com> |
||
|
|
db2524be2d |
Verify HostTensorDescriptor when it is created (#2829)
* add proper GEMM layout verification * Handle "auto" strides. CalculateStrides only called when tensor's strides are empty or all of them are <=0 (auto strides). CalculateStrides now supports GEMM::ColumnsMajor order. The assumption is still that it applies only to the inner two dims. ValidateStrides throws if any of the tensor's strides is <=0. profile_gemm_multiply_add updated to support "auto" strides for tensors. Manual tests for profile_gemm_multiply_add (matrix B in Row and Col modes) auto-strides bin/ckProfiler gemm_multiply_add 0 0 1 1 0 1 128 128 128 0 0 0 0 0 bin/ckProfiler gemm_multiply_add 0 1 1 1 0 1 128 128 128 0 0 0 0 0 bin/ckProfiler gemm_multiply_add 0 0 1 1 0 1 128 128 128 -1 -1 -1 -1 -1 Note, -1 should be deprecated (use 0 instead) explicit strides (same as auto) bin/ckProfiler gemm_multiply_add 0 0 1 1 0 1 128 128 128 128 128 128 128 128 bin/ckProfiler gemm_multiply_add 0 1 1 1 0 1 128 128 128 128 128 128 128 128 explicit strides (not the same as auto) bin/ckProfiler gemm_multiply_add 0 0 1 1 0 1 128 128 128 130 132 134 136 138 bin/ckProfiler gemm_multiply_add 0 1 1 1 0 1 128 128 128 130 132 134 136 138 mix of explicit and auto strides bin/ckProfiler gemm_multiply_add 0 0 1 1 0 1 128 128 128 128 128 128 128 0 invalid stride bin/ckProfiler gemm_multiply_add 0 0 1 1 0 1 128 128 128 0 0 0 0 64 terminate called after throwing an instance of 'std::runtime_error' what(): Invalid strides for RowMajor: mLens: 128 128 , mStrides: 64 1 Aborted (core dumped) * - add more names to ck::tensor_layout for easier namespace hierarchy checking - updated convolutional layouts to use explicit ones or BaseConvolutionalLayout where it is not clear which layout to use (TBD) - see include/ck/library/utility/convolution_host_tensor_descriptor_helper.hpp * added handling of partially initialized strides for GEMM. fixed more tests. * clang-format and more fixes * replace long dash by a simple hyphen - causes build failure in CK codegen. * increase sizeof input, otherwise output size becomes zero or negative with large filter size * select stride based on layout * specify layout explicitly to avoid errors in HostTensorDescriptor creation * add validation for higher GEMM tensor dimensions.; Add docstring to `HostTensorDescriptor` * Not clear why permute test in test/permute_scale/test_permute_scale.cpp uses a lot of invalid strides. Setting layout to BypassLayoutVerification to avoid a lot of errors * fix test (incl removing invalid config) * fix moe examples: - (in .cpp) add layout argument to non-2D tensors - (in .hpp) fix asserts/failures that show up in Debug mode, specifically addressing 2D tensor by a single index (and 3D tensor by 2d index) * fix moe_gemm2 example. * fix profile and wmma examples * clean-up early mods for ckprofile. verified with: ``` ckProfiler gemm_multiply_add 0 0 1 1 0 1 128 128 128 0 0 0 0 0 ckProfiler gemm_multiply_add 0 1 1 1 0 1 128 128 128 0 0 0 0 0 ckProfiler gemm_multiply_add 0 0 1 1 0 1 128 128 128 130 132 134 136 138 ckProfiler gemm_multiply_add 0 1 1 1 0 1 128 128 128 130 132 134 136 138 # ckProfiler gemm_fastgelu 1 0 1 2 0 1 128 128 128 0 0 0 ckProfiler gemm_fastgelu 1 1 1 2 0 1 128 128 128 0 0 0 ckProfiler gemm_fastgelu 1 2 1 2 0 1 128 128 128 0 0 0 ckProfiler gemm_fastgelu 1 3 1 2 0 1 128 128 128 0 0 0 ckProfiler gemm_fastgelu 1 0 1 2 0 1 128 128 128 128 128 128 # ckProfiler gemm_add_relu 0 0 1 1 0 1 128 128 128 0 0 0 0 # ckProfiler gemm_add_relu 0 1 1 1 0 1 128 128 128 0 0 0 0 # not implemented # ckProfiler gemm_add_relu 0 2 1 1 0 1 128 128 128 0 0 0 0 # not implemented # ckProfiler gemm_add_relu 0 3 1 1 0 1 128 128 128 0 0 0 0 # not implemented ckProfiler gemm_add_relu 0 0 1 1 0 1 128 128 128 128 128 128 128 # ckProfiler gemm_add_relu_add_layernorm 1 0 1 1 0 0 128 128 128 0 0 0 0 0 ckProfiler gemm_add_relu_add_layernorm 1 1 1 1 0 0 128 128 128 0 0 0 0 0 ckProfiler gemm_add_relu_add_layernorm 1 2 1 1 0 0 128 128 128 0 0 0 0 0 ckProfiler gemm_add_relu_add_layernorm 1 3 1 1 0 0 128 128 128 0 0 0 0 0 ckProfiler gemm_add_relu_add_layernorm 1 0 1 1 0 0 128 128 128 130 132 134 136 138 # example_gemm_add_multiply_dl_fp16 example_gemm_add_multiply_xdl_fp16 # ckProfiler gemm_blockscale_wp 7 1 1 1 1 0 1 128 128 128 0 0 0 ckProfiler gemm_blockscale_wp 7 1 1 1 1 0 1 128 128 128 128 128 128 ``` * temporary skip first 8 test configs - they throw error * temporary skip first 8 test configs in wmma too - they throw error --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> |
||
|
|
ae57e5938e |
Split the instances by architecture. (#1223)
* parse examples inside the add_example_executable function * fix the example 64 cmake file * add xdl flag to the gemm_bias_softmax_gemm_permute example * add filtering of tests based on architecture type * enable test_grouped_gemm for gfx9 only * enable test_transpose only for gfx9 * only linnk test_transpose if it gets built * split the gemm instances by architectures * split gemm_bilinear,grouped_conv_bwd_weight instances by targets * split instances by architecture * split grouped_conv instances by architecture * fix clang format * fix the if-else logic in group_conv headers * small fix for grouped convolution instances * fix the grouped conv bwd weight dl instances * fix client examples * only enable client examples 3 and 4 on gfx9 * set the gfx9 macro * make sure the architecture macros are set by cmake * use separate set of xdl/wmma flags for host code * sinmplify the main cmake file * add conv_fwd_bf8 instance declaration |
||
|
|
1837040a9c |
Navi3 rel (#1176)
* wmma_op + unit test * add arch limitation to wmma test * change arch limitation * Refactor + Add all type unit test(int4 compile failed) * Add f32_16x16x16_bf16 unit test * tempsave * tempsave * tempsave * runtime bug, cannot find symbol * workaround for incorrect HIP warpSize return value * debugging * tempsave * Correctness OK, waiting for optimization * Tidy up + format * temp save * temp save, reproduce the v_bfi_b32 issue * add inline asm for wmmaop test * tidy up * clean some debug purpose code * discard some codes * clang format * clang format * compiler issue fixed + increase tile size * navi3x_multipleD+example * temp save * workable * batchedgemm[OK], groupconv[debug] * groupconv: Sanity check[OK], Performance[Bad] * navi3x_groupconv_need_optimization * create necessary files * save progress * Add Inter-Row thread transfer * save progress * save debugging progress * sanity check pass * fix a host tensor bug and clean up flash-attn code * format * cancel unnecessary change * cancel unnecessary change * cancel unnecessary change * temp save, add asm backend flag to amd_wmma * Mat-A LDS Bypass sanity pass * temp save * gemm sanity fix * Porting new blockwise gemm to flash attention * Example branch provide to compiler team * tempsave * Fix a bug * batched gemm ported * conv A-skip lds ported * Skip B-Lds real gemm * Skip B Lds Gemm + MulD * batched gemm, conv, skip b lds * format * Attn, skip b lds * Change GridwiseOp nam * fix a typo caused bug * Skip A_Lds sanity pass, Skip B_Lds scratch occured * Bug found, intra-row permute off caused * bug found * a fix * disable buffer load due to incorrect 3rd dword * update fmha config, no scratch generated * update 3rd dword * fmha config update * FMHA, add support to gfx1101/gfx1102 * Merge origin dev (#2) * [Navi3x] Fix Gridwise_multiple_d operation (#649) * Add CMake Option "USE_OPT_NAVI3X" * fix bug * standardize docs (#655) * Separate bibtex requirement from rocm-docs-core (#656) * separate bibtex requirement from rocm-docs-core * point requirements to source rocm-docs-core repo * Add CMake Option "USE_OPT_NAVI3X" (#647) * Add CMake Option "USE_OPT_NAVI3X" * remove navi3x opt compile option from cmake script * Conv + quantization + tanh (#645) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> * Add a denorm test fix (#603) * Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> * simplify karg in device/grid of split-k op (#644) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout * fix 3rd dword of buffer source descriptor (#659) * add fp64 instances (#658) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> * Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665) This reverts commit |