composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-29 11:16:59 +00:00

Author	SHA1	Message	Date
Max Podkorytov	83b6155354	Add ck-rocprof: GPU profiling tool for rocprof-compute (#3627 ) * Decouple configure/build/test tools from Docker Create a two-layer tool architecture: - Core tools (ck-configure, ck-build, ck-test): Environment-agnostic, work on any system with ROCm - no Docker dependency - Container tools (ck-docker): Manage Docker containers and delegate to core tools via docker exec Changes: - Add ck-configure: New CMake configuration tool with preset support, native GPU detection, and flexible options - Refactor ck-build: Remove Docker dependency, add --configure and --list options, call ninja directly - Refactor ck-test: Remove Docker dependency, add CTest integration with --smoke/--regression/--all options - Enhance common.sh: Add native GPU detection, build directory utils, and output helpers - Update ck-docker: Add configure/build/test/exec commands that delegate to core tools inside container This enables: - Native development on ROCm hosts without Docker - Simpler CI/CD integration - Consistent behavior inside and outside containers Co-Authored-By: Claude <noreply@anthropic.com> * Add ck-rocprof: GPU profiling tool for rocprof-compute Adds a command-line profiling tool to simplify GPU performance analysis workflow using AMD rocprof-compute. Features: - Easy setup with automatic Python venv configuration - Simple CLI: setup, run, analyze, compare, list - Automatic GPU architecture detection - Focus on LDS metrics (Block 12) for bank conflict analysis - Comprehensive documentation with examples and troubleshooting Usage: ck-rocprof setup # One-time environment setup ck-rocprof run <name> <executable> # Profile executable ck-rocprof analyze <name> [block] # Analyze metrics ck-rocprof compare <name1> <name2> # Compare two runs ck-rocprof list # List available runs * Make ck-rocprof documentation concise and improve Docker integration - Streamlined documentation from 416 to 157 lines (62% reduction) - Focused on essential commands, metrics, and workflows - Enhanced script to run all operations inside Docker containers - Fixed workload directory path and improved container management - Added automatic rocprofiler-compute installation and dependency handling * Add --no-roof flag to ck-rocprof profile command Skip roofline analysis by default to speed up profiling. Roofline analysis can add significant time to profiling runs but is not needed for most LDS bank conflict analysis workflows. * Make ck-rocprof work independently of Docker Add native execution mode that runs rocprof-compute directly on the host system when available, falling back to Docker mode when not. Key changes: - Auto-detect native mode when rocprof-compute is in PATH or common locations - Add execution mode wrappers (exec_cmd, file_exists, dir_exists, etc.) - Native mode stores venv at .ck-rocprof-venv in project root - Native mode stores workloads at build/workloads/ - Support user-installed rocprofiler-compute (e.g., ~/.local/rocprofiler-compute) - Add CK_FORCE_DOCKER env var to force Docker mode - Update help message to show current execution mode - Maintain full backward compatibility with existing Docker workflow Tested successfully with rocprofiler-compute 3.4.0 installed from source on MI300X GPU in native mode. Co-Authored-By: Claude <noreply@anthropic.com> * Add clean/status commands and improve ck-rocprof robustness - Add 'clean' command to remove profiling runs (supports --all) - Add 'status' command to show configuration and environment info - Add workload name validation to prevent path traversal attacks - Fix uv installation to use pip instead of curl for reliability - Add cross-platform stat support for macOS compatibility - Consolidate ROCPROF_CANDIDATES to avoid code duplication - Expand help documentation with all profiling block descriptions - Fix Docker wrapper script escaping issues Co-Authored-By: Claude <noreply@anthropic.com> * Fix analyze command to use correct workload path rocprof-compute stores results directly in the workload directory (pmc_perf.csv) rather than in a GPU architecture subdirectory. Updated find_workload_path to detect this correctly. Co-Authored-By: Claude <noreply@anthropic.com> * Address PR review security and robustness issues Security fixes: - Escape executable path in cmd_run to prevent shell injection - Add workload name validation to cmd_analyze and cmd_compare Robustness improvements: - Add error checking for uv package manager installation - Use consistent project root detection (find_project_root \|\| get_project_root) - Use /opt/rocm instead of hardcoded /opt/rocm-7.0.1 in Docker mode - Derive ROCM_REQUIREMENTS path from ROCPROF_BIN for flexibility - Use gfx950 as fallback GPU consistent with common.sh Documentation updates: - Fix env var name GPU_TARGET -> CK_GPU_TARGET - Update storage layout to reflect current structure (workloads/<name>/) - Document clean and status commands - Clarify native vs Docker default paths Co-Authored-By: Claude <noreply@anthropic.com> * Simplify ck-rocprof to native-only mode Remove Docker mode from ck-rocprof. Docker users should run the tool via `ck-docker exec ck-rocprof ...` instead. This simplification: - Removes ~210 lines of Docker-specific code - Eliminates mode detection complexity - Makes the script easier to maintain - Provides clearer error messages when rocprof-compute is not found The setup command now lists all searched locations when rocprof-compute is not found, helping users understand how to install it. Co-Authored-By: Claude <noreply@anthropic.com> * Add rocprofiler-compute source installation fallback When rocprof-compute is not found in system locations, automatically install rocprofiler-compute 3.4.0 from source as a fallback. This eliminates the hard dependency on system ROCm packages. Implementation details: - Clone rocprofiler-compute from GitHub to ~/.local/ - Install dependencies via requirements.txt (not editable install) - Create wrapper that sets PYTHONPATH to source directory - Execute source script directly rather than importing as module This approach matches the project's development workflow and works around the incomplete pyproject.toml that prevents editable installs. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-01-29 17:20:22 -08:00
John Shumway	a213ce676b	Add python analysis scripts for Clang's time trace (#3644 ) This PR introduces a Python toolkit for analyzing Clang's `-ftime-trace` build performance data. This is the foundation for our systematic effort to reduce CK and CK-Tile build times (#3575). The toolkit provides fast parsing of trace JSON files into pandas DataFrames using orjson, with specialized functions for analyzing template instantiation costs and compilation phase breakdowns. It includes a core library (`trace_analysis/`), example scripts for quick analysis, a comprehensive README with usage documentation, and an interactive Jupyter notebook demonstration. Key features include memory-efficient DataFrame schemas with optimized dtypes, recursive hierarchical phase analysis, automatic metadata extraction (source file, compilation timing), and template instantiation filtering. The design supports both standalone scripts and interactive Jupyter notebook workflows. This single-file analysis capability lays the groundwork for future multi-file analysis across thousands of compilation units, enabling data-driven optimization and build time regression detection.	2026-01-26 13:44:36 -08:00
Thomas Ning	3900e1e7ce	Solve the CTAD regression & add up the Shell file for the docker management in testing (#3634 ) * Finished the work * Fix the pipeline	2026-01-26 10:29:28 -08:00
Yi DING	f41f37da96	Add CMakePresets.json (#3284 ) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2026-01-21 08:04:24 -08:00
Max Podkorytov	086a1f8861	Add LLM-agnostic Docker and build analysis tools (#3576 ) This commit introduces utility tools for building, testing, and analyzing Composable Kernel. The tools are designed to be LLM-agnostic and can be used with any AI assistant or directly from the command line. Tools Added: ============ 1. ck-docker - Docker container management - Start/stop ROCm-enabled containers - Build targets with CMake + Ninja - Run tests with gtest filters - Auto-detect GPU targets (gfx950, gfx942, etc.) - Per-user, per-branch container naming to avoid conflicts 2. ck-build-analysis - Build time profiling - Uses Clang's -ftime-trace for compilation analysis - Aggregates statistics across multiple trace files - Identifies template instantiation bottlenecks - Generates detailed Markdown reports with: * Compilation phase breakdown * Top expensive instantiations * Template family analysis * Data-driven optimization recommendations - Configurable granularity (1µs to 500µs) - PEP 723 compliant Python script with auto-dependency management via uv Key Features: ============= - LLM-agnostic design (works with any AI assistant) - Zero-configuration setup with automatic dependency installation - Comprehensive documentation in script/tools/README.md - Security hardening (input validation, no command injection) - Multi-file trace aggregation for accurate build analysis - Jinja2-based report generation for customizable output Implementation: =============== - script/tools/ck-docker - Main Docker orchestration script - script/tools/ck-build-analysis - Build analysis orchestration - script/tools/common.sh - Shared utilities (container mgmt, GPU detection) - script/tools/analyze_build_trace.py - PEP 723 compliant Python analyzer - script/tools/templates/ - Jinja2 templates for report generation - script/tools/README.md - Comprehensive documentation Directory Structure: ==================== script/tools/ ├── README.md # Main overview ├── README_ck-docker.md # ck-docker documentation ├── README_ck-build-analysis.md # ck-build-analysis documentation ├── ck-docker # Docker orchestration script ├── ck-build-analysis # Build analysis orchestration ├── common.sh # Shared utilities ├── analyze_build_trace.py # Python analyzer (PEP 723) └── templates/ └── build_analysis_report.md.jinja # Report template The tools follow Unix philosophy: do one thing well, compose easily, and work from both CLI and programmatic contexts.	2026-01-15 08:30:23 -08:00
Ville Pietilä	1fc5a3f3ac	Build CK on Windows (#3458 ) * CMakeLists.txt hack for Windows. * Add Windows build instructions. * Fix type issue with variadic min function. * Use std::common_type to fix the variadic min/max functions. * Enable CPU guard compilation on Windows. * Suppress warnings related to std::getenv on Windows platform. * Git ignore the output directory on Windows platform. * Powershell script for running tests and generating reports. * Improve test logging. * Disable non-conv tests. * Fix Debug build on Windows. * More debug build changes. * Update Windows build instructions. * Enable all tests. * Test fixes. * Suppress not found linker options warning. * Update unsigned long literals and format specifiers to work correctly in Windows * Fix conv 3D bwd weight bilinear tests on Windows. * Revert changes on .gitignore. * Clean-up CMake project file for Windows builds. * clang-format * Fix definition of CMAKE_PREFIX_PATH on both Linux and Windows platforms. * Fix building examples on Windows. * Update Readme. * Remove the suppression of the deprecated warnings. * Remove Windows specific min/max implementations from CK Tile math core. * Remove unnecessary no-op on Windows. --------- Co-authored-by: User <user@example.com> Co-authored-by: Ville Pietilä <none> Co-authored-by: John Afaganis <john.afaganis@amd.com> Co-authored-by: Ville Pietilä <>	2026-01-14 07:31:45 -08:00
andrew clark	e77a7ca2bc	Supporting Custom Build Trace File Names (#3443 ) * Removing hard-coded trace filename * Including stage name in notification * Simplifying capture setup and tagging file names with arch * Removed test property from notification message * Fixing regex to get arch name * Fixing error in notification and modified regex	2025-12-18 12:15:33 -08:00
Matti Eskelinen	fe3d52d9b0	Fix minor issues in cmake-ck-dev script (#3438 ) * Remove extra slash from cmake-ck-dev.sh * Add quoting around string variables	2025-12-17 08:57:21 -08:00
andrew clark	e67cd7edeb	Adding sscache stats monitoring (#3428 ) * Adding additional sccache and redis logging to each build * Removing custom workspace * Removing script reference * Logging complete sccache stats * Ensuring monitor is stopped if build fails * Including additional sccache logging * Removing build duration log * Fixing groovy syntax error * Fixing syntax * Modifying logging statements * Fixing syntax * Modifying logging * Modifying logging * Including additional logging * Fixing logging message * Logging build path * Testing * Testing workspace path logs * Adding additonal logging to monitor * Modifying comments * Adding copyright info * Cleaning unnecessary logs * Removing build time logs * Merge branch 'develop' into aick-457	2025-12-17 09:15:27 -07:00
Illia Silin	3dfa794fab	Add build trace diagnostics to CI. (#3432 ) * generate and visualize build traces for all archs * generate build traces in all cases * fix jenkins logic * fix typo * use more threads for parsing dependency map * add script to parse ninja traces and issue warnings * fix python script syntax and header * fix python syntax one more time * fix python syntax	2025-12-16 08:22:52 -08:00
Robin Voetter	6219b12730	[CK_BUILDER] convolution testing (#3267 ) * Add README.md for testing * Add tensor_memory_manager. * ck-builder: tensor memory manager rebase fixes This fixes some issues caused by the API being changed recently. Also, this streamlines the ckt namespace to always be ck_tile::builder::test, as this is already being used by other tests Really, this commit should be squashed into the previous, but I'm keeping it separate for brevity. * ck-builder: test arguments initial prototype * ck-builder: test system initial prototype * ck-builder: fix non-standardized copyright comments * ck-builder: new prototype * ck-builder: group testing inputs/outputs into a separate structure This is basically the return of the tensor memory manager after all, except that the design is more closely tied to the actual operation. Using a struct allows us to add additional input/output tensors without breaking code (by defaulting those new parameters). Note that the tensors are split into a separate inputs/outputs because we usually want to allocate the output _twice_: once for the real computation and once for the reference computation. * ck-builder: simplify prototype naming; start docs * ck-builder: update testing readme * ck-builder: testing documentation * ck-builder: HipStatusMatcher This matcher can be used to check HIP status codes and provide nice and readable error messages. * ck-builder: tensor_buffer.hpp tests * ck-builder: conv_fwd.hpp tests * ck-builder: add example end-to-end test in conv fwd 2d fp16 * ck-builder: simplify extent usage * ck-builder: update testing doc * ck-builder: skip end to end test on non-gfx9 * fix check_copyright_year interpreter /bin/bash is not guaranteed to exist on Linux. Signed, a NixOS user * ck-builder: fix copyrights * ck-builder: reduce conv fwd testing size This test allocated 24GB of memory, too much for 16GB cards. --------- Co-authored-by: John Shumway <jshumway@amd.com>	2025-12-13 15:33:41 +01:00
Aviral Goel	6d25525adc	feat(precommit-hooks): add check for correct copyright header (#3302 ) * chore(copyright): update copyright header for left files * feat(copyright): add copyright check to precommit hooks * chore(copyright): update copyright header for include/ck_tile directory * chore(copyright): update copyright header for example directory * chore(copyright): update copyright header for .github directory * refactor: copyright_check script with better if else handling * chore(copyright): update compyright header for remaining files * feat: add script to automate copyright addition	2025-12-10 22:50:43 -08:00
andrew clark	40d7217ac7	Automated Perfetto UI Notifications (#3255 ) * Testing visualization generation * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Adding dummy test data * Update Jenkinsfile * Update Jenkinsfile * Adding notifications * Testing * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Image compression * Update Jenkinsfile * Moving capture logic to main Jenkins file * Testing generation * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Fixing curl request * Update Jenkinsfile * Clean up * Fix * Fixing notification * Testing message creation * Adjusting message payload * Testing notification generation * Updating main jenkinsfile * Fixing cleanup call * Removing test pipeline code * Comment clean up * Testing pipeline * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Moving archive Moving trace archive to safe location before source checkout * Removing test pipeline * Testing pipeline with unique file names * Update Jenkinsfile * Removing test files Updated main pipeline	2025-11-26 16:27:27 -07:00
Aviral Goel	ab68c9d384	chore(copyright): update copyright header for script directory (#3184 ) * chore(copyright): update copyright header for tile_engine directory * chore(copyright): update copyright header for script directory --------- Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>	2025-11-11 11:26:01 -08:00
andrew clark	3b076b0b74	Collecting redis stats (#3149 )	2025-11-04 18:55:11 -08:00
Vidyasagar Ananthan	31c019f589	Chunk Ctests so we dont run into large number of tests error (#3050 ) * Chunk Ctests so we dont run into large number of tests error * Addressing feedback from copilot	2025-11-04 10:31:32 -08:00
John Afaganis	3f996ee738	Add copyright notices to missing files (#3133 )	2025-10-31 07:35:11 -07:00
Robin Voetter	6f58d6e457	[CK_BUILDER] Factory tests (#3071 ) This pull requests adds some initial "factory tests" - these check that the instances which are used in MIOpen are actually present in CK. The main reason for this is documentation and sanity checking. Its likely that these tests get outdated fast, so we'll have to maintain them, but fortunately this is quite straight forward and shouldn't take a lot of time once they are in place.	2025-10-28 10:27:42 -07:00
Johannes Graner	8a4cd32d86	Pre-commit in CI (#3029 ) * Pre-commit in CI * Specify python version, and install dos2unix for remod * Refactor remod hook to correctly install dependencies * Run pre-commit	2025-10-17 09:28:38 -07:00
Johannes Graner	d40b50b9d5	Update pre-commit to fixed versions, run remod for ck_tile (#2895 ) * Fix ruff linter errors * Fix remod dos2unix command * Clang format * Ignore utility in remod * Run remod * Specify clang-format version in pre-commit * Specify ruff version * Include PoolKernelArgs in reference_pool * Add calculate_total_elements to reference batched contraction * Fix calculate_total_elements declaration * Refactor remod pre-commit hook * Fix Aquant tests --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-10-16 15:29:17 -07:00
Bartłomiej Kocot	5477811670	Grouped Conv Bwd Data out index calculation optimizations (#2917 ) * Grouped Conv Bwd Data index calculation optimizations * fixes * refactor instances * gfx12 fixes * temporary disable splitK for gfx12	2025-09-29 15:59:11 +02:00
Anton Gorenko	2aec38f9ec	[CK_TILE] FMHA Fix synchronization issues in BWD pipelines (#2876 ) * Run ctest with --output-on-failure * Fix synchronization issues in bwd pipelines The bwd kernel reuses the same area of LDS for ds (SGrad), bias and dbias (BiasGrad). This means that there must be block_sync_lds between loading one tensor and storing another to the same area. Heavy instructions like MFMA/WMMA and global loads are executed between reuses of the same memory so in MOST cases loading is finished by all warps before storing is started. However, sometimes warps progress at different speeds. Running the tests multiple times and, preferably, with multiple processes on the same GPU helps to trigger this issue: bin/test_ck_tile_fmha_bwd_bf16 --gtest_repeat=-1 --gtest_shuffle --gtest_throw_on_failure	2025-09-19 11:34:45 +05:00
pmaybank	592d73ad73	[CK_TILE] Add support for gfx12 in tile_engine for GEMM benchmarking (#2802 ) * initial work on adding support of gfx12 in tile_engine for GEMM benchmarking * add stage("Run TILE_ENGINE_GEMM Tests on gfx1201") to Jenkins config * make tile_[m/n/k] validation arch dependent	2025-09-17 17:59:01 +01:00
Aviral Goel	f3239395dc	fix(copyright header): add header to missing files (#2807 )	2025-09-11 12:27:08 -07:00
Aviral Goel	e279e9420e	feat(grouped_gemm): add preshuffle v2 support to grouped gemm example (#2721 ) * docs(README): update readme with new build instructions * feat(grouped_gemm): add support back for non persistent kernel * refactor(grouped_gemm): simplify tensor creation * refactor(grouped_gemm): Persistance is now GemmConfig value for easier management * chore(grouped_gemm): add print statements to ease debugging * WIP(grouped_gemm): add grouped_gemm_preshuffle example and update CMake configuration * fix(tile_gemm_traits): change default value of Preshuffle_ from 0 to false for clarity * WIP(grouped_gemm): add dummy variables to compile the preshuffle pipelines * chore(grouped_gemm): add print statements and variables to debug numerical error with preshuffle * style: clang format work so far * BUG!(grouped_gemm_kernel.hpp): figured out a potential bug in for numerical errors in preshuffle pipeline * fix(grouped_gemm_kernel): add function in the kernel code to dynamically calculate tail_number resolving numerical errors * refactor(gemm_presuffle): make preshuffle pipeline v2 compatible with operator () calls from grouped gemm * chore(grouped_gemm): add/remove debug comments and debug print statements * feat(grouped_gemm): integrate preshuffle pipeline v2 into grouped gemm for all supported shapes * chore(gemm_profile): add new argument combinations * fix: branch cleanup, formatting, refactoring * fix: branch cleanup, formatting, refactoring * chore(changelog): update changelog to reflect new featuer * address review comments & nit	2025-09-07 14:18:35 -07:00
rahjain-amd	4d041837ad	Add json dump support to output details from CK/CKTile Examples. (#2551 ) * Adding RapidJson Library * Adding Json Dumps in all CK_Tile Examples Not verified yet * Adding json to cktile Batched Transpose * adding json dumps to layernorm2d_fwd * Adding json dump to flatmm_basic * Adding RapidJson Library * Adding Json Dumps in all CK_Tile Examples Not verified yet * Adding json to cktile Batched Transpose * adding json dumps to layernorm2d_fwd * Adding json dump to flatmm_basic * Adding json in 03_gemm * Add json dump to 16_batched_gemm * Add json dump to gemm_multi_d_fp16 * Add json dump to grouped_gemm * fix fmha_bwd/fwd * Fix clang-format errors exclude include/rapidjson in jenkins as its a third-party library * Saparating function and defination. * Update Documentation of 03_gemm * Refactoring as per code review * Disable fp8 instances on unsupported targets (#2592) * Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt * Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt * Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt * Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt --------- Co-authored-by: AviralGoelAMD <aviralgoel@amd.com> * fix clang format * remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt * Fixing Readme and unifying jsondumps * adding moe_smoothquant * adding fused_moe * Fixing Readme for batched_gemm * Fixing Readme for grouped_gemm * adding flatmm * adding gemm_multi_d_fp16 * adding elementwise * adding File name when json is dumped * Fixing Reduce after merge * adding batched_transpose * Adding Warptile in Gemm * Fixing Clang Format --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com> Co-authored-by: AviralGoelAMD <aviralgoel@amd.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-09-02 23:31:29 -07:00
Thomas Ning	8d43155bce	fix the errors (#2771 )	2025-09-02 11:04:21 -07:00
Thomas Ning	705804d9bf	Restructure the Tile Engine to have faster build time and clear config report (#2747 ) * Making edits to identify individual compilation issues. * Minor fix for blob txt files not being created. * Fixing compilation issues. * Fixing ordering bug. * Adding python profiling functionality. * Setting individual build as default. * Setting gpu target filtering for tile engine to gfx90a, gfx942 and gfx950. * update the default running parameters and settings * Fixing bug with benchmarking, shifting file generation to build instead of config. * Updating fixes. * Fixing json output and parsing. * Disable ccache for tile engine gemm ops because we dont need it. * Removing duplicate type definition. * Improving json printing. * Add the flexibility of different layout and more warp tile support * Fix extra flag in name of individual kernels. * Fixing bug with booleans. * Solve the first patch of the post merge conflict * Compilation fixes, and cosmetic improvements. * Yet again compilation fixes after latest changes from develop. * Fixing python benchmarking script. --------- Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com> Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>	2025-08-30 06:54:18 -07:00
Illia Silin	6180685688	Resolve issues with performance logs in CI. (#2733 ) * update the performance test logic * fix unstash perf logs logic * untangle unstashing fmha logs for different archs * run process stage after running fmha tests * fix the processing of perf logs * fix arguments for run_performance scripts	2025-08-25 09:51:29 -07:00
dnovakovic-dxc	49c6b05c72	Script for generating list of files not referenced in tests (#2696 ) * script for generating list of not referenced files in tests, list is in json format * script comment added * added empty line at the end of the script * format changes	2025-08-20 08:22:51 -07:00
Max Podkorytov	696ef05784	[Dev infra] cmake_ck_dev.sh inline docs and refactor argument list (#2689 ) * invoke script directly * script fixup * keep the docs update separate * add newline * escape arg * use portable way of setting IFS	2025-08-19 00:22:23 -07:00
Max Podkorytov	8f6dc23a89	remove script (#2692 )	2025-08-19 00:20:54 -07:00
Max Podkorytov	1824d65758	modernize scripts for running cmake and clang-format (#2503 ) Co-authored-by: Aviral Goel <aviral.goel@amd.com>	2025-08-06 10:15:44 -07:00
Aviral Goel	1441a0a7ee	Integration of a new pipeline for weight preshuffle into gemm examples (#2516 ) * something khushbu can help with * v1 v2 works with flatmm develop * v0 v1 v2 numerical error gone * Fixing numerical error, and interchange preshuffle configs to match with flatmm * Refactor GEMM pipeline configurations and integrate preshuffle support - Updated preshuffle pipeline definitions to include multiple versions (V1, V2, V3). - Changed the pipeline constant from CK_TILE_PIPELINE_PRESHUFFLE to CK_TILE_PIPELINE_PRESHUFFLE_V3 in relevant configurations. - Removed obsolete code and comments * clang format * fix vectorloadsize bug * add the Preshuffle3 * update kwarp calculation in gemm utils * update vector size A and B correctly in V2 pipeline; Added few more changes to align with dteng's branch * fix: add CK_GFX950_SUPPORT macro for gfx950 detection * default disable rotating buffer * docs(CHANGELOG): update changelog for rocm 7.0 * Revert "docs(CHANGELOG): update changelog for rocm 7.0" This reverts commit `2bc16fff84`. * Remove unused Preshuffle V3 pipeline and related code; update gemm function to use Preshuffle V2; clean up comments and formatting in various files. * revert example/ck_tile/flatmm to its original state * remove comment added by second author * switch to xor ALDSDescriptor * modify the MakeALdsDescriptor() * temporary profiling script * getting rid of line marker compiler error * UniversalWeightPreshufflePipelineAgBgCrPolicy now derives from UniversalGemmBasePolicy * add a minor fix for the config * typo fix * Fix formatting in lambda function for WeightPreshufflePipelineAGmemBGmemCRegV2 * revert change in include/ck_tile/ops/flatmm/pipeline/flatmm_pipeline_agmem_bgmem_creg_v1.hpp * revert change in include/ck_tile/core/arch/amd_buffer_addressing.hpp * reenable the GemmSpatiallyLocalTilePartitioner * make GemmConfigPreshuffle_1 for v1 pipeline, GemmConfigPreshuffle_2 for v2 pipeline * remove hardcoded true for preshuffle bool template argument * rename script * remove gemm_profilie.sh script * merge conflict resolve * clang formatted * typo fix * Remove duplicate include of block_gemm_areg_bsmem_creg_v2r1.hpp in gemm.hpp * Remove commented-out code in UniversalWeightPreshufflePipelineAgBgCrPolicy * Fix missing newline at end of file in run_gemm_example.inc * Remove unused barrier call in BlockWeightPreshuffleASmemBSmemCRegV1 * addressing review comments * removing debug code * addressing review comments * Revert "addressing review comments" This reverts commit `29c45192ba`. * updating tile_engine code * addressing review comments --------- Co-authored-by: amd-khushbu <khuagarw@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-08-01 00:04:54 -07:00
Illia Silin	e8709c24f4	upgrade clang-format version in install_precommit.sh (#2589 )	2025-07-30 08:02:25 -07:00
Illia Silin	49723e94bb	fix the clang-format (#2578 )	2025-07-28 20:49:55 -07:00
Illia Silin	504b101da3	upgrade from clang-format-12 to clang-format-18 (#2568 ) * upgrade to clang-format-18 * update to clang-format-18 in pre-commit-config	2025-07-28 11:34:07 -07:00
John Shumway	67b2821623	Switch to C++20 standard for all CMake targets. (#2536 ) All our platforms support C++20 now, so update to C++20 standard for language features such as concepts, designated initializers, range-based for initializers, and consteval. This PR only switches the compiler flags to C++20, no other changes.	2025-07-22 10:52:10 -07:00
Aviral Goel	84a7600bdc	fix(cmake-dev): cmake dev script works with non bash shells (#2530 )	2025-07-19 23:15:50 -07:00
Thrupti Raj Lakshmana Gowda	0f3083ab5c	[CKTILE] Layout Support for CK Tile engine (#2482 ) * Updating runtime log message for CK TILE ENGINE * CKTile layout from config * CKTile custom config for CI * Documentation for Layout Changes * CKTile Layout changes to Jenkins * Fixing Clang Format * Changes to Jenkins file to fix error * fix(cmake-ck-dev): no longer sets invalid values as gpu arch * style(py files): ruff formatting * fix(cmake-ck-release): no longer sets invalid values as gpu arch * chore(cmake-tile_engine): add reminder to uncomment user config json * Changes to jenkin file to address more cases * Changes to Jenkins to fix Error * Changes to Jenkins file for fixing an error * Update Jenkinsfile (#2517) * Update Jenkinsfile --------- Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com> Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-07-17 12:19:41 -07:00
Aviral Goel	a26ba690fd	fix(precommit_install): fix bug for bare metal machines (#2448 ) Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>	2025-07-10 11:00:47 -06:00
Vidyasagar Ananthan	e391b025a0	New ninja tracing script (#2472 ) * Adding ninja log json convertion utility * Updating to match old ninjatracing * Updating Jenkins to use new ninjatracing * Ensuring v7 works * Removing old ninjatracing from dockerfile	2025-07-08 22:36:50 -07:00
Aviral Goel	e9036a8fc2	Enhancements in precommit_install.sh for Python and CK Tile code (#2400 ) * fix(precommit_install): script now installs packages in virtual env * fix(precommit_install): installs packages in virtual env * feat(precommit): added ruff for python linting and formatting * feat(precommit): added ruff for python linting and formatting * feat(precommit): run ruff when py files are commited * feat(precommit): remod.py is run when ck_tile modified * add empty line at the end * style(precommit.yaml): remove empty line --------- Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>	2025-07-01 01:11:10 -07:00
Illia Silin	c3c8c6a10f	Introduce dependency-based CI test selection. (#2377 ) * Selective test filter initial commit. * Expanded folder paths for parsing ninja dependencies. * Fixing default branch name in the test evaluation script. * Fixing paths for robustness and adding ctest command to the launch script. * change jenkins file and few tests to upgrade CI * Setting ninja build path. * Fixing typo in Jenkinsfile, and wrong paths. * Fixing typo in launch script. * add few more tests to check CI logic * Fixing header for shell script. * turn off performance test by default, add option to run all unit tests * revert dummy changes in source code to trigger tests * make sure develop branch runs all unit tests --------- Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>	2025-06-20 12:48:00 -07:00
Aviral Goel	3af66e99ab	add script to pre commit hooks for checking file permissions (#2322 )	2025-06-17 07:07:08 -07:00
Thomas Ning	3c4cdfac4f	Fix the CK Tile related operators (#2356 ) * fix the flatmm * Fix the pipeline * address the comment	2025-06-16 17:38:52 -07:00
Illia Silin	5523df4b2d	Revert "fix the flatmm (#2349 )" (#2352 ) This reverts commit `d996bc78be`.	2025-06-16 07:54:55 -07:00
Bartłomiej Kocot	f6c2ff9dce	Grouped convolution forward with clamp (#2334 ) * Grouped convolution forward with clamp * Optimize clamp * unary fixes * test gk bias * Revert "test gk bias" This reverts commit `8e42e29d7b`. * Revert "Revert "test gk bias"" This reverts commit `e73c0550ce`. * workaround comment	2025-06-16 15:36:53 +02:00
Thomas Ning	d996bc78be	fix the flatmm (#2349 )	2025-06-16 02:17:53 -07:00
Illia Silin	56f654a826	Limit the threads to builf ck_tile engine, use ninja. (#2342 ) * limit the threads to builf ck_tile engine, use ninja * disable ck_tile engine until it can be built safely	2025-06-13 14:13:07 -07:00

1 2 3 4

166 Commits