mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-29 11:16:59 +00:00
[ck] Enforce ASCII-only C/C++ sources for hipRTC compatibility (#7829) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary CK source files must be compilable via **hipRTC (HIP runtime compilation)**, whose preprocessor does not accept non-ASCII bytes anywhere in a translation unit — **including in comments**. Bytes that are harmless under `hipcc` (em-dashes, smart quotes, multiplication signs, Greek letters, box-drawing glyphs, etc.) cause hipRTC to fail at preprocessing time. These regularly leak in via LLM-assisted authoring or copy/paste from formatted documents and silently break hipRTC paths that are not exercised by the default `hipcc`-based build matrix. This PR (a) cleans every existing violation (53 files) and (b) adds a pre-checkin gate so new violations are rejected before merge. ## File extensions covered Both the cleanup scan and the new Jenkins enforcement stage use the same predicate: ``` *.h *.hpp *.cpp *.h.in *.hpp.in *.cpp.in *.inc *.cl ``` (excluding `*/build/*` and `*/include/rapidjson/*`). This is a strict superset of the existing `Clang Format` stage's predicate — `*.inc` is added so test-fixture include files are also gated. The local pre-commit hook's `c++/inc` type filter covers the same set. ## Why no enforcement today CK is opted out of the rocm-libraries root `.pre-commit-config.yaml`, so the existing `pre-commit` workflow doesn't touch CK. The local CK `.pre-commit-config.yaml` only runs for developers who installed hooks. The **authoritative gate is therefore the new Jenkins stage** in this PR; the local hook is convenience. ## Commit layout (bisect-friendly) 1. `79798aa6261` — **`[ck] Convert reflect/ rendering to ASCII for hipRTC compatibility`** Behavior change, isolated. `TreeFormatter` swaps `├─ / └─ / │ ` for `|- / +- / | ` (3-col width preserved so alignment is unchanged). `conv_description.hpp` swaps `×` for `x` as the dimension separator. `test_conv_description.cpp` expected strings updated in lockstep so the snapshot test stays green. This is the only commit in the series with observable runtime impact. 2. `738fdb0d81c` — **`[ck] Strip non-ASCII bytes from C++ sources for hipRTC compatibility`** Mechanical text cleanup across 53 files. Replacements happen in comments or in `std::cout` strings that are not asserted on by any test. None of the 174 `.inc` files in the tree required edits, but they were in the scan's predicate so the enforcement stage's predicate is a superset of what was scanned. Full replacement table in the commit message. 3. `1d7cd8ba235` — **`[ck] Enforce ASCII-only C/C++ sources for hipRTC compatibility`** - New `projects/composablekernel/script/check_ascii_only.sh` (modeled on `check_copyright_year.sh`). - New entry in `projects/composablekernel/.pre-commit-config.yaml` under the local-hooks block (`types_or: [c++, inc]`). - New `ASCII Only Check` parallel stage in `projects/composablekernel/Jenkinsfile`'s `Static checks` block, mirroring the existing `Clang Format` stage but with `*.inc` added to the find predicate. Always-on, no `RUN_CPPCHECK` gate. The tree is buildable at every commit boundary. Commit 1 leaves 50 known violations; commit 2 leaves 0; commit 3 wires the gate. ## Demo Script output on a synthesized violation: ``` $ printf '// em-dash test \xe2\x80\x94 here\n' > /tmp/bad.cpp $ projects/composablekernel/script/check_ascii_only.sh /tmp/bad.cpp ERROR: /tmp/bad.cpp contains non-ASCII bytes: 1:// em-dash test — here Fix: replace with ASCII (em-dash -> --, smart quotes -> ", arrows -> ->, etc.) $ echo $? 1 ``` Full repo scan after the cleanup commits (note the `-name '*.inc'` clause): ``` $ cd projects/composablekernel && find . -type f \( -name '*.h' -o -name '*.hpp' -o -name '*.cpp' \ -o -name '*.h.in' -o -name '*.hpp.in' -o -name '*.cpp.in' -o -name '*.inc' -o -name '*.cl' \) \ -not -path '*/build/*' -not -path '*/include/rapidjson/*' -print0 \ | xargs -0 -P 8 -n 64 script/check_ascii_only.sh $ echo $? 0 ``` ## Test plan - [ ] Jenkins PR build: confirm new `Static checks -> ASCII Only Check` stage runs green over the full predicate (incl. `*.inc`) and existing `Clang Format` stage is unaffected. - [ ] `test_conv_description` passes against the ASCII tree-formatter output (touched in commit 1). - [ ] Local: `pre-commit run ascii-only-checker --all-files` runs cleanly after installing CK pre-commit hooks via `script/install_precommit.sh`. - [ ] Manually inject a non-ASCII byte in any `.cpp/.hpp/.inc` file, push: confirm Jenkins fails the new stage with a clear error. - [ ] Spot-check a representative subset of touched files under hipRTC compilation to confirm no remaining hipRTC-blocking content (optional, since the static byte check is a sufficient condition for hipRTC preprocessor acceptance on this dimension). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
213 lines
8.1 KiB
C++
213 lines
8.1 KiB
C++
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
|
// SPDX-License-Identifier: MIT
|
|
|
|
#pragma once
|
|
|
|
#include <string>
|
|
|
|
#include "ck_tile/core.hpp"
|
|
#include "ck_tile/host/kernel_launch.hpp"
|
|
|
|
struct AddDs
|
|
{
|
|
template <typename E, typename C, typename... Ds>
|
|
CK_TILE_HOST_DEVICE auto operator()(E& e, const C& c, const Ds&... ds) const -> void
|
|
{
|
|
const float x0_f =
|
|
ck_tile::type_convert<float>(c) + (ck_tile::type_convert<float>(ds) + ...);
|
|
|
|
e = ck_tile::type_convert<E>(x0_f);
|
|
}
|
|
};
|
|
|
|
#define GEMM_PIPELINE ck_tile::GemmPipelineAgBgCrCompV3
|
|
#define UNIVERSAL_GEMM_PIPELINE ck_tile::BaseGemmPipelineAgBgCrCompV3
|
|
#define GEMM_PIPELINE_SCHEDULER ck_tile::GemmPipelineScheduler::Intrawave
|
|
|
|
template <typename DataType>
|
|
struct BatchedContractionTypeConfig
|
|
{
|
|
using ADataType = DataType;
|
|
using BDataType = DataType;
|
|
using AccDataType = float;
|
|
using EDataType = DataType;
|
|
using DDataType = DataType;
|
|
};
|
|
|
|
using ContractionTypes = BatchedContractionTypeConfig<ck_tile::half_t>;
|
|
|
|
using ADataType = ContractionTypes::ADataType;
|
|
using BDataType = ContractionTypes::BDataType;
|
|
using AccDataType = ContractionTypes::AccDataType;
|
|
using EDataType = ContractionTypes::EDataType;
|
|
using DDataType = ContractionTypes::DDataType;
|
|
|
|
void print_help(const char* program_name)
|
|
{
|
|
std::cout << "\n";
|
|
std::cout << "Batched Tensor Contraction with element-wise fusion\n";
|
|
std::cout << "E[G,M,N] = element_wise_op(contraction(A[G,M,K], B[G,N,K]), D0, D1, ...)\n";
|
|
std::cout << "(Supports multiple D tensors with configurable element-wise operations)\n\n";
|
|
|
|
std::cout << "Usage: " << program_name << " [OPTIONS]\n\n";
|
|
|
|
std::cout << "Dimension Arguments (comma-separated, no spaces):\n";
|
|
std::cout << " -g_dims=<dims> Batch dimensions (default: \"1,2\")\n";
|
|
std::cout << " -m_dims=<dims> M (row) dimensions (default: \"4,256\")\n";
|
|
std::cout << " -n_dims=<dims> N (column) dimensions (default: \"16,128\")\n";
|
|
std::cout << " -k_dims=<dims> K (contract) dims (default: \"64\")\n";
|
|
std::cout << " -num_d=<int> Number of D tensors (default: 2, range: 0-4)\n\n";
|
|
|
|
std::cout << "Custom Stride Arguments (for testing non-contiguous tensors):\n";
|
|
std::cout << " -strides_a=<s> A tensor strides (comma-separated, empty = auto)\n";
|
|
std::cout << " -strides_b=<s> B tensor strides (comma-separated, empty = auto)\n";
|
|
std::cout << " -strides_e=<s> E tensor strides (comma-separated, empty = auto)\n";
|
|
std::cout << " -strides_ds=<s> D tensors strides (semicolon-separated, empty = same as E)\n";
|
|
std::cout << " Example: -strides_a=\"32768,128,1\" -strides_ds=\"512,2,1;1024,4,1\"\n\n";
|
|
|
|
std::cout << "Layout Arguments:\n";
|
|
std::cout
|
|
<< " -a_layout=<R|C> A tensor layout (R=Row-major, C=Column-major, default: \"R\")\n";
|
|
std::cout << " -b_layout=<R|C> B tensor layout (default: \"C\")\n";
|
|
std::cout << " -e_layout=<R|C> E tensor layout (default: \"R\")\n\n";
|
|
|
|
std::cout << "Examples:\n";
|
|
std::cout << " Single batch (12 batches of 256x128):\n";
|
|
std::cout << " " << program_name
|
|
<< " -g_dims=\"12\" -m_dims=\"256\" -n_dims=\"128\" -k_dims=\"64\"\n\n";
|
|
|
|
std::cout << " 2D batch grid (2x3=6 batches):\n";
|
|
std::cout << " " << program_name
|
|
<< " -g_dims=\"2,3\" -m_dims=\"128\" -n_dims=\"128\" -k_dims=\"64\"\n\n";
|
|
|
|
std::cout << " Multi-dimensional (flattened to M=128, N=128, K=128):\n";
|
|
std::cout << " " << program_name
|
|
<< " -g_dims=\"4\" -m_dims=\"8,16\" -n_dims=\"32,4\" -k_dims=\"16,8\"\n\n";
|
|
|
|
std::cout << "Other Options:\n";
|
|
std::cout << " -v=<0|1> Validation (0=off, 1=on, default: 1)\n";
|
|
std::cout << " -split_k=<int> Split-K value (default: 1)\n";
|
|
std::cout << " -warmup=<int> Warmup iterations (default: 5)\n";
|
|
std::cout << " -repeat=<int> Benchmark iterations (default: 10)\n";
|
|
std::cout << " -log=<0|1> Logging level (default: 1)\n";
|
|
std::cout << " -help Show this help\n\n";
|
|
}
|
|
|
|
auto create_args(int argc, char* argv[])
|
|
{
|
|
// Check for --help flag
|
|
for(int i = 1; i < argc; ++i)
|
|
{
|
|
std::string arg = argv[i];
|
|
if(arg == "--help" || arg == "-h" || arg == "-help")
|
|
{
|
|
print_help(argv[0]);
|
|
std::exit(0);
|
|
}
|
|
}
|
|
|
|
ck_tile::ArgParser arg_parser;
|
|
arg_parser.insert("m_dims", "4,256", "M dimensions separated by comma (e.g., '16,32' for 2D M)")
|
|
.insert("n_dims", "16,128", "N dimensions separated by comma (e.g., '32,32' for 2D N)")
|
|
.insert("k_dims", "64", "K dimensions separated by comma (e.g., '64,32' for 2D K)")
|
|
.insert(
|
|
"g_dims", "1,2", "G dimensions separated by comma (e.g., '4,2' for 2D, '2,3,4' for 3D)")
|
|
.insert("num_d", "2", "Number of D (auxiliary input) tensors")
|
|
.insert("strides_a", "", "A tensor strides (comma-separated, empty = auto/contiguous)")
|
|
.insert("strides_b", "", "B tensor strides (comma-separated, empty = auto/contiguous)")
|
|
.insert("strides_e", "", "E tensor strides (comma-separated, empty = auto/contiguous)")
|
|
.insert("strides_ds",
|
|
"",
|
|
"D tensors strides (semicolon-separated for multiple, empty = same as E)")
|
|
.insert("a_layout", "R", "A tensor data layout - Row by default")
|
|
.insert("b_layout", "C", "B tensor data layout - Col by default")
|
|
.insert("e_layout", "R", "E tensor data layout - Row by default")
|
|
.insert("v", "1", "0. No validation, 1. Validation on CPU")
|
|
.insert("prec", "fp16", "data type. fp32/fp16/bf16")
|
|
.insert("warmup", "5", "number of iterations before benchmark the kernel")
|
|
.insert("repeat", "10", "number of iterations to benchmark the kernel")
|
|
.insert("timer", "gpu", "gpu:gpu timer, cpu:cpu timer")
|
|
.insert("split_k", "1", "splitK value")
|
|
.insert("log", "1", "log level for debugging");
|
|
|
|
bool result = arg_parser.parse(argc, argv);
|
|
return std::make_tuple(result, arg_parser);
|
|
}
|
|
|
|
// Helper function to parse G, M, N, K dimensions from string
|
|
std::vector<ck_tile::index_t> parse_dimensions(const std::string& dims_str)
|
|
{
|
|
std::vector<ck_tile::index_t> dims;
|
|
std::stringstream ss(dims_str);
|
|
std::string token;
|
|
|
|
while(std::getline(ss, token, ','))
|
|
{
|
|
dims.push_back(std::stoi(token));
|
|
}
|
|
|
|
if(dims.empty())
|
|
{
|
|
throw std::invalid_argument("Dimensions cannot be empty");
|
|
}
|
|
|
|
return dims;
|
|
}
|
|
|
|
// Helper function to Calculate total elements from multi-dimensional vector
|
|
ck_tile::index_t calculate_total_elements(const std::vector<ck_tile::index_t>& dims)
|
|
{
|
|
ck_tile::index_t total = 1;
|
|
for(auto dim : dims)
|
|
{
|
|
total *= dim;
|
|
}
|
|
return total;
|
|
}
|
|
|
|
/**
|
|
* @brief Flattens a list of tensor dimension components into a single dimension vector.
|
|
*
|
|
* This function takes a list of dimension vectors (e.g., representing different components
|
|
* such as G, M, N, or K dimensions) and concatenates them into a single vector.
|
|
*
|
|
* Example:
|
|
* Input: {{G0, G1}, {M0, M1}, {K0}}
|
|
* Output: {G0, G1, M0, M1, K0}
|
|
*
|
|
* @param dim_components A vector of vectors, where each inner vector represents a set of tensor
|
|
* dimensions.
|
|
* @return A single vector containing all dimensions concatenated in order.
|
|
*/
|
|
std::vector<ck_tile::index_t>
|
|
concatenate_dim_components(const std::vector<std::vector<ck_tile::index_t>>& dim_components)
|
|
{
|
|
std::vector<ck_tile::index_t> result;
|
|
|
|
// Concatenate all dimension components into a single vector
|
|
for(const auto& component : dim_components)
|
|
{
|
|
result.insert(result.end(), component.begin(), component.end());
|
|
}
|
|
|
|
return result;
|
|
}
|
|
|
|
// Helper function for printing dimensions
|
|
void print_dims(const std::string& name,
|
|
const std::vector<ck_tile::index_t>& dims,
|
|
ck_tile::index_t total)
|
|
{
|
|
std::cout << name << ": [";
|
|
for(size_t i = 0; i < dims.size(); ++i)
|
|
{
|
|
std::cout << dims[i];
|
|
if(i < dims.size() - 1)
|
|
std::cout << ",";
|
|
}
|
|
std::cout << "] ";
|
|
if(total != 0)
|
|
std::cout << "(total=" << total << ")";
|
|
std::cout << std::endl;
|
|
}
|