composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 17:00:18 +00:00

Author	SHA1	Message	Date
pmaybank	592d73ad73	[CK_TILE] Add support for gfx12 in tile_engine for GEMM benchmarking (#2802 ) * initial work on adding support of gfx12 in tile_engine for GEMM benchmarking * add stage("Run TILE_ENGINE_GEMM Tests on gfx1201") to Jenkins config * make tile_[m/n/k] validation arch dependent	2025-09-17 17:59:01 +01:00
Illia Silin	b9d69d32a8	Enable FMHA and AITER tests on gfx950. (#2812 ) * enable aiter and fmha test stages on gfx950 * use newer compiler for gfx950 * make sure gfx950 runs correct docker * fix typo * upgrade base docker for aiter * change base docker for aiter tests * do not add group render to ck_aiter image * add group irc in ck_aiter docker * do not fix the irc group id to 39 * do not set jenkins uid and gid * skip group irc for aiter tests * fix syntax error in dockerfile * change the base docker for aiter tests * add irc group back to ck_aiter docker	2025-09-12 12:20:32 -07:00
Thrupti Raj Lakshmana Gowda	f6ba94fb5c	[CK TILE ENGINE] Adding GEMM Preshuffle to CK Tile Engine (#2712 ) * Partial Progress : Completed ListBlob * Additional changes in Listbob * Partial Progress : Generate Blobs Completed * Partial Progress : Added Host side code for Preshuffle * Working code for Preshuffle before Cleanup * Partial Progress : Cleanup * Partial Progress : Datatype Validation * Partial Progress : Warptiles for preshuffle changed from hardcoding to take from config * Partial Progress : Cleanup * Partial Progress : Code Cleanup * Partial Progress : Passing all valid tiles failing for unsupported tiles * Partial Progress : Working code, testing pending for edge cases * Partial Progress for testing * Completed Code * kBlockPerCu as tunable parameter from config * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Partial Progress : Working listkernels * Partial Progress : Cleanup Working listkernels * Partial Progress : Single instance * Partial Progress : Working single instance code * Partial Progress : Working generate individual instance code * Partial Progress : Working rewamped code for given config file needed validation and edge case testing * Partial Progress : Working Code, testing pending * Removing LOGS file * Working code * Minor changes to GEMM Preshuffle : Restructured * Minor Changes in Preshuffle * Changes to Jenkins File * Changes to Jenkins file to consider new architecture * Changes to Jenkins file for fixing CI --------- Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>	2025-09-12 11:50:19 -07:00
Illia Silin	bca99a499d	build and test on gfx942 by default (#2830 )	2025-09-11 14:02:21 -07:00
Anton Gorenko	ec006bb8e0	[CK_TILE] Add gtests for FMHA (#2744 ) * Improve random number generation * use different seed for each input (Q, K, V...); * use deterministic generation of: * seqstart_q/k (for group mode); * block_table (for paged-kvcahe); * cache_batch_idx (for kvcache); * Extract arg_parser-related code from run functions to use them as tests * Split examples into main programs and fmha runners, build instances separately * Add dummy tests that use instances and runners * Fix a missed corner case of f32->f8 conversion When value if < min f8 denormal but > min f8 denormal / 2, it must be rounded to min f8 denormal (i.e. 0b1), not to 0. * Fix incorrect fp8 scales for P and O in validation code DataTypeConfig was incorrectly compared with fp8_t. * Add host generation of dropout random values and use it for validation Previously host validation (reference_batched_dropout) used random numbers generated by BlockDropout of the kernel, meaning that incorrect generation on device (bad distribution, repeated numbers, too many zeros, etc.) would not trigger any validation errors. * Implement tests from smoke_test_bwd.sh * Return result as enum to distinguish failure and missing instance * Add tests for bwd features: bias, alibi, dropout * Implement tests from smoke_test_fwd.sh * Pass seqlen_q/k as vectors to fwd and bwd runners * Add tests for fwd features: bias, alibi, dropout * Add tests for pagedkv and splitkv * Fix conditions when to use splitkv and pagedkv kernels splitkv was executed only when use_kvcache which == (need_append_kvcache \|\| use_cache_batch_idx \|\| 0 < page_block_size). In the SplitKV tests: the regular fwd kernel was executed if use_cache_batch_idx was not requested even when num_splitkv > 1. In the AppendKV tests: the pagedkv kernel was executed but it often failed to find an instance. * Add tests for appendkv * Use is_v_rowmajor = true because there are no instances with column layout anymore * Split public and private compile options for instances Tests and examples need to know only about CK_TILE_FMHA_FWD__API. Improve parsing validation in bias and mask * Pass bias as string for consistency with mask * Catch parsing and other exceptions * Add bwd test for deterministic flag * Initialize fp8 tensors (-init=ufq) similarly to uf * Fix splitkv/pagedkv invocation: use padded sk when seqlen_k_ptr is not null seqlen_k cannot be used to determine padding when seqlen_k_ptr is provided. The actual seqlen_k is taken from seqlen_k_ptr[b]. Even seqlen_k values (% bn0 == 0) use padded seqlen_k while seqlen_k_ptr may contain arbitrary values. In the example or tests this produces incorrect results with appendkv (for example, -d=32 -s=1 -s_k=64 -s_knew=7 -vlayout=c -b=8). * Fix use_pagedkv value when kvcache = true but page_block_size = 0 In this case block_table_ptr is nullptr which is accessed in the kernel. * Clean up bwd tests * Unify fwd tests for f16/bf16 and fp8 * Use better explicit instantiation declaration for fmha_bwd<2> * Use the same seed for all tests, allow to override it with env variable * Undo clang-format of one irrelevant file For some reason my local clang-format-18 and the one in CI work differently. * Do not build instances and tests on unsupported archs * Build instance libraries as OBJECT library * CI: Enable sccache for HIP There are source files with LANGUAGE HIP, they need -DCMAKE_HIP_COMPILER_LAUNCHER=sccache * Add tests to REGRESSION_TESTS * Fix OOB accesses in deterministic bwd due to incorrectly assumed kN0 The runner assumes kN0 = (hdim_q <= 128) ? 128 : 64 but there are smaller tiles (for tr_load or fp32). This can create too small dq_acc_buf. * Pass CK_TILE_FMHA_FWD__API as INTERFACE compile options The instances don't actually depend on them, only examples and tests do. Passing these definitions as INTERFACE allows to change FMHA_FWD_ENABLE_APIS without recompiling instances that are already in ccache. Fix formatting and names	2025-09-10 08:06:14 +05:00
Vidyasagar Ananthan	5224d2ead3	Fixing path for tile engine tests. (#2794 )	2025-09-05 17:34:48 -07:00
Illia Silin	4e4a784d53	set number of cpu threads in CI to min(nproc,64) (#2793 )	2025-09-05 17:26:13 -07:00
Vidyasagar Ananthan	60ea94f4fe	Fixing tile engine tests after recent refactoring. (#2791 ) * Fixing tile engine tests after recent refactoring. * Fixing line break error.	2025-09-05 14:57:59 -07:00
Illia Silin	ef6c28e989	Fix latest AITER failure and add more AITER tests in CK CI. (#2782 ) * add aiter tests and move json_dump header * remove example/include path from cmake * extend time for aiter and pytorch stages	2025-09-04 13:44:00 -07:00
rahjain-amd	4d041837ad	Add json dump support to output details from CK/CKTile Examples. (#2551 ) * Adding RapidJson Library * Adding Json Dumps in all CK_Tile Examples Not verified yet * Adding json to cktile Batched Transpose * adding json dumps to layernorm2d_fwd * Adding json dump to flatmm_basic * Adding RapidJson Library * Adding Json Dumps in all CK_Tile Examples Not verified yet * Adding json to cktile Batched Transpose * adding json dumps to layernorm2d_fwd * Adding json dump to flatmm_basic * Adding json in 03_gemm * Add json dump to 16_batched_gemm * Add json dump to gemm_multi_d_fp16 * Add json dump to grouped_gemm * fix fmha_bwd/fwd * Fix clang-format errors exclude include/rapidjson in jenkins as its a third-party library * Saparating function and defination. * Update Documentation of 03_gemm * Refactoring as per code review * Disable fp8 instances on unsupported targets (#2592) * Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt * Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt * Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt * Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt --------- Co-authored-by: AviralGoelAMD <aviralgoel@amd.com> * fix clang format * remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt * Fixing Readme and unifying jsondumps * adding moe_smoothquant * adding fused_moe * Fixing Readme for batched_gemm * Fixing Readme for grouped_gemm * adding flatmm * adding gemm_multi_d_fp16 * adding elementwise * adding File name when json is dumped * Fixing Reduce after merge * adding batched_transpose * Adding Warptile in Gemm * Fixing Clang Format --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com> Co-authored-by: AviralGoelAMD <aviralgoel@amd.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-09-02 23:31:29 -07:00
Illia Silin	0ac908fb57	Add a daily CI cron job to build pytorch. (#2755 ) * add a stage to builf pytorch * add docker file for pytorch stage * call build scripts fro mthe default path * add a daily chron build for pytorcn stage	2025-08-27 16:57:43 -07:00
JH-Leon-KIM-AMD	19d5327c45	Test comprehensive dataset (#2685 ) * Add CSV-driven convolution test pipeline - Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality - Add complete dataset generation toolchain in test_data/ - Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter - Ready for comprehensive convolution testing with scalable datasets * Update convolution test dataset generation pipeline * add 2d, 3d dataset csv files * Remove CSV test dataset files from repository * Update generate_test_dataset.sh * Fix channel division for MIOpen to CK conversion * Remove unnecessary test files * Fix clang-format-18 formatting issues * TEST: Enable comprehensive dataset tests by default * Fix test_data path in Jenkins - build runs from build directory * Add Python dependencies and debug output for CSV generation * Remove Python package installation - not needed * Add better debugging for generate_test_dataset.sh execution * Fix Jenkinsfile syntax error - escape dollar signs * Add PyTorch to Docker image for convolution test dataset generation - Install PyTorch CPU version for lightweight model execution - Fixes Jenkins CI failures where CSV files were empty due to missing PyTorch - Model generation scripts require PyTorch to extract convolution parameters * Add debugging to understand Jenkins directory structure and CSV file status - Print current working directory - List CSV files in test_data directory - Show line counts of CSV files - Will help diagnose why tests fail in Jenkins * Fix clang-format-18 formatting issues - Applied clang-format-18 to test file - Fixed brace placement and whitespace issues * Add detailed debugging for CSV dataset investigation - Check generated_datasets directory contents - List all CSV files with line counts - Show first 5 lines of main CSV file - Applied clang-format-18 formatting - This will help identify why CSV files are empty in Jenkins * keep testing add pytorch installation in shell script * Use virtual environment for PyTorch installation - Jenkins user doesn't have permission to write to /.local - Create virtual environment in current directory (./pytorch_venv) - Install PyTorch in virtual environment to avoid permission issues - Use PYTHON_CMD variable to run all Python scripts with correct interpreter - Virtual environment will be reused if it already exists * Remove debug code and reduce verbose logging in Jenkins - Remove bash -x and debug commands from Jenkinsfile execute_args - Remove all debug system() calls and getcwd from C++ test file - Remove unistd.h include that was only needed for getcwd - Remove debug print in CSV parser - Add set +x to generate_test_dataset.sh to disable command echo - Redirect Python script stdout to /dev/null for cleaner output This makes Jenkins logs much cleaner while still showing progress messages. * install gpu torch * Clean up and optimize comprehensive dataset test pipeline - Reorder Jenkinsfile execution: build -> generate data -> run test - Remove commented-out debug code from generate_test_dataset.sh - Ensure all files end with proper newline character (POSIX compliance) - Keep useful status messages while removing development debug prints - Set MAX_ITERATIONS=0 for unlimited test generation in production * Add configuration modes to reduce test execution time - Add --mode option (half/full) to generate_model_configs.py - half mode (default): ~278 configs (224 2D + 54 3D) -> ~1,058 total tests - full mode: ~807 configs (672 2D + 135 3D) -> ~3,093 total tests - Update generate_test_dataset.sh to use CONFIG_MODE environment variable - Keeps all model types but reduces parameter combinations intelligently - Fixes Jenkins timeout issue (was running 3,669 tests taking 17+ hours) - Default half mode should complete in ~4-5 hours instead of 17+ hours * Add small mode for quick testing of comprehensive dataset * jenkins pipeline test done * jenkins test done * Trigger CI build * remove test comment and update data generation option as half --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>	2025-08-26 22:18:05 +02:00
John Shumway	99d27aca17	Add a CMake property for c++ standard (17 or 20) (#2736 ) Configure C++ standard with a CMake variable. Defaults to C++20, but can be set to C++17 to test backwards compatibility. * Add validation for allowed C++ standards. * build CK in rehl8 docker with std=c++17 --------- Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-08-25 18:56:58 -07:00
Aviral Goel	bb6132116f	build!: Update composable kernel version to 1.2.0 for rocm 7.0 release (#2734 ) * build!: Update composable kernel version to 1.2.0 for rocm 7.0 release	2025-08-25 13:48:51 -04:00
Illia Silin	6180685688	Resolve issues with performance logs in CI. (#2733 ) * update the performance test logic * fix unstash perf logs logic * untangle unstashing fmha logs for different archs * run process stage after running fmha tests * fix the processing of perf logs * fix arguments for run_performance scripts	2025-08-25 09:51:29 -07:00
Illia Silin	8b55afcd93	Build ckProfiler package for all architectures. (#2701 ) * stash ckprofiler package built for all targets * build the lib for all instances in newer docker * make sure packages get posted	2025-08-18 11:16:25 -07:00
Tianyuan Wu	68134b60e4	[CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 (#2466 ) * WMMA GEMM F16 Implementation Signed-off-by: root <tianyuwu@amd.com> * Self-review Signed-off-by: root <tianyuwu@amd.com> * ASIC check minor tweak Signed-off-by: root <tianyuwu@amd.com> * add missing include file * Set GPU_TARGETS to gfx11/12 generic Signed-off-by: root <tianyuwu@amd.com> * INT8 GFX12 Signed-off-by: root <tianyuwu@amd.com> * add int8x16 branch * Fix CI script Signed-off-by: root <tianyuwu@amd.com> * Fix typo Signed-off-by: root <tianyuwu@amd.com> * Add CK_Tile WMMA example Signed-off-by: Tianyuan Wu <tianyuwu@amd.com> * Fix CI Signed-off-by: Tianyuan Wu <tianyuwu@amd.com> * fix clang format * Set M/N_Warp Back to Constant Signed-off-by: Tianyuan Wu <tianyuwu@amd.com> * Use GemmConfigComputeV3 by default Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12 Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Remove CK_Tile wmma gemm examples from the CI list Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Add atomic add fallback method for gfx11 Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Fix typo Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Omit copyright year Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Support non-square cases Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Fix CI Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Add get_device_ip() Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Revert "Add atomic add fallback method for gfx11" This reverts commit `07a79e797d`. Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> * Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12" This reverts commit `ceee918007`. * Revise method name and typos Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> * clang-format Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Try fix CI Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Revert "Try fix CI" This reverts commit `7a7241085e`. * clang-format Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Fix typo caused by merge Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> * Fix typo caused by merging Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> --------- Signed-off-by: root <tianyuwu@amd.com> Signed-off-by: Tianyuan Wu <tianyuwu@amd.com> Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> Co-authored-by: joye <joye@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-08-15 16:22:27 -07:00
Thrupti Raj Lakshmana Gowda	1c2078066b	Variable name correction in Jenkins file (#2686 )	2025-08-14 13:35:55 -07:00
JH-Leon-KIM-AMD	b963478759	CSV-driven convolution test pipeline (#2581 ) * Add CSV-driven convolution test pipeline - Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality - Add complete dataset generation toolchain in test_data/ - Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter - Ready for comprehensive convolution testing with scalable datasets * Update convolution test dataset generation pipeline * add 2d, 3d dataset csv files * Remove CSV test dataset files from repository * Update generate_test_dataset.sh * Fix channel division for MIOpen to CK conversion * Remove unnecessary test files * Fix clang-format-18 formatting issues --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>	2025-08-13 16:24:34 +02:00
Thrupti Raj Lakshmana Gowda	3f57ec3d2d	GEMM Multi D for CK Tile Engine (#2660 ) * Readme for GEMM Multi D * GEMM Multi D partial Progress * GEMM Multi D partial Progress! * CK Tile Engine GEMM Multi D : All Python files generated * Partial Progress * Partial Progress * Partial Progress * Partial Progress : Incorrect Result * Partial Progress : Debugging * Partial Progress : Correct Results * Partial Progress - Incorrect Results * Partial Progress - Commenting Passthrough bypass logic * Changing Passthrough to MultiplyMultiply * Correct Results! * Fix and debug the pass through feature * Sample commit * Correct Results : MultiplyMultiply * Code Cleanup * Removing Failed Instances * Working code before Unary element support * Custom Elementwise Function support and working implementation for Mul and Add * Updating README * Working for Passthrough * Review Comments : Minor Fixes * Review Comments : Minor Fixes * Readme Updated * Partial Changes after Rebase * Working Code : Changes after Rebase * Updating Jenkins file * Removing default value changed while testing * Configuration changes in config files * Tile Handler changes in GEMM Multi D Tile Engine * Tile Handler changes in GEMM Multi D Example * Change log for Gemm Multi D in CK Tile Engine * Configuration changes in config files --------- Co-authored-by: ThomasNing <thomasning@amd.com>	2025-08-12 16:05:05 -07:00
Illia Silin	bbf41b27f2	fix builds with mainline/staging compilers (#2674 )	2025-08-12 10:23:08 -07:00
Illia Silin	6bfef63414	enable aiter test_mha in daily CI (#2659 )	2025-08-11 09:50:33 -07:00
Illia Silin	8613aa1e40	remove ck_tile transpose and gemm stages from CI (#2646 )	2025-08-08 10:48:44 -07:00
Illia Silin	7ac850ac72	Add daily AITER tests on gfx942. (#2639 ) * add option to select aiter branch, add tests on gfx942	2025-08-08 09:30:46 -07:00
Illia Silin	833ae1d051	Revert "Reduce build time tile engine (#2579 )" (#2623 ) This reverts commit `e5b79b26fa`.	2025-08-05 09:27:55 -07:00
Thomas Ning	e5b79b26fa	Reduce build time tile engine (#2579 ) * Modify CMakeLists to allow for splitting. * Modify CMakeLists for data and layout logic. * Run tests and get build artifact. * Test new Cmakelists for speedup. * Further improvements for speedup. * turn off the FMHA * turn off the automatic tile engine gemm * minor fix * disable the transpose test first * Address the comment * Jenkinsfile * change the make thread to 64 * change the compile thread to 32 * Try to use with less OS memory space * Have the Unity build batch size to 2 * reduce the chunk size --------- Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>	2025-08-01 14:42:33 -07:00
Illia Silin	e6104daecc	Add a daily CI stage to test AITER with latest CK. (#2598 ) * add a CI stage for AITER testing	2025-08-01 07:55:51 -07:00
Bartłomiej Kocot	5b244105d9	Enable multiple D for grouped conv fwd large tensors (#2572 )	2025-07-28 22:39:07 +02:00
Illia Silin	504b101da3	upgrade from clang-format-12 to clang-format-18 (#2568 ) * upgrade to clang-format-18 * update to clang-format-18 in pre-commit-config	2025-07-28 11:34:07 -07:00
Illia Silin	9786087010	use ninja to build packages (#2575 )	2025-07-28 11:04:12 -07:00
Illia Silin	ead17e6265	disable building CI for gfx942 by default (#2529 )	2025-07-18 12:25:24 -07:00
Thrupti Raj Lakshmana Gowda	0f3083ab5c	[CKTILE] Layout Support for CK Tile engine (#2482 ) * Updating runtime log message for CK TILE ENGINE * CKTile layout from config * CKTile custom config for CI * Documentation for Layout Changes * CKTile Layout changes to Jenkins * Fixing Clang Format * Changes to Jenkins file to fix error * fix(cmake-ck-dev): no longer sets invalid values as gpu arch * style(py files): ruff formatting * fix(cmake-ck-release): no longer sets invalid values as gpu arch * chore(cmake-tile_engine): add reminder to uncomment user config json * Changes to jenkin file to address more cases * Changes to Jenkins to fix Error * Changes to Jenkins file for fixing an error * Update Jenkinsfile (#2517) * Update Jenkinsfile --------- Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com> Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-07-17 12:19:41 -07:00
Illia Silin	f5d1e3fa48	Use a clang20 compiler for gfx950 builds. (#2504 ) * update docker tag for gfx950 ci build * update compiler path for gfx950 ci build * suppress compiler path override for gfx950 * clean up	2025-07-16 07:37:53 -07:00
Vidyasagar Ananthan	e391b025a0	New ninja tracing script (#2472 ) * Adding ninja log json convertion utility * Updating to match old ninjatracing * Updating Jenkins to use new ninjatracing * Ensuring v7 works * Removing old ninjatracing from dockerfile	2025-07-08 22:36:50 -07:00
Vidyasagar Ananthan	33d704a6f9	Separating ninja build tracing and setting flag to false (#2470 ) * Separating ninja build tracing and setting flag to false * Add ftime-tracing flag * Fix conditional issue * Try adding a script block * Embed Clang analysis in ftime trace block	2025-07-08 10:52:00 -07:00
Vidyasagar Ananthan	d2536b91bc	Remove ftime tracing to avoid printing json files (#2452 ) * Remove ftime tracing to avoid printing json files * Factoring out build commands	2025-07-03 07:54:12 -07:00
Vidyasagar Ananthan	2fa9270a25	Fix an earlier static check error due to assignment of variable in Jenkinsfile (#2420 ) * Testing assignment of param fix * Removing redundant changes * Adding back unit test runs * Ensuring Jenkins changes work on develop - to be reverted * Revert "Ensuring Jenkins changes work on develop - to be reverted" This reverts commit `cf1cab4a43`.	2025-06-28 07:07:14 -07:00
Thomas Ning	28a63d7dcb	Revert "Enable builds on gfx942 by default and run all tests on develop branc…" (#2418 ) This reverts commit `6d6f4c76c1`.	2025-06-27 16:40:10 -07:00
Khushbu Agarwal	a14753b86f	Enabling diff datatypes for tile_engine and build with more granularity (#2392 ) * merging recent changes to universal gemm to tile_engine * Reducing Linking time by generating less intermediate files * make small libs to build faster * Reducing the instances * reducing instances * Restoring default config * Restoring default config * warp_n reverted in default config * Adding diff json files for fp8 and fp16, cmake changes for fp8 * Restructure the CMake File * Added more granularity for build and some debugging code * removed some of debugging statements * added fp8 instances * tahe datatype from command line to enable both type of json files * updated README file * code cleanup * code cleanup * updated jenkinsfile * enable tile_engine daily builds * updating cmake file * updated CMakeLists.txt * Updating CMake code fixing gfx12 build * Updating CMake code fixing gfx12 build * Fix CMake file null checks * fixed traces of rebase * Update tile_engine/ops/gemm/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * fixing rebase issue --------- Co-authored-by: khushbu <khuagarw@gmail.com> Co-authored-by: ThomasNing <thomas.ning@amd.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com> Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>	2025-06-25 15:18:24 -07:00
Illia Silin	6d6f4c76c1	Enable builds on gfx942 by default and run all tests on develop branch. (#2408 ) * add switches for architectures and force develop to run all tests * move the test condition inside the function * enable build on gfx942 by default	2025-06-25 08:01:50 -07:00
Illia Silin	c3c8c6a10f	Introduce dependency-based CI test selection. (#2377 ) * Selective test filter initial commit. * Expanded folder paths for parsing ninja dependencies. * Fixing default branch name in the test evaluation script. * Fixing paths for robustness and adding ctest command to the launch script. * change jenkins file and few tests to upgrade CI * Setting ninja build path. * Fixing typo in Jenkinsfile, and wrong paths. * Fixing typo in launch script. * add few more tests to check CI logic * Fixing header for shell script. * turn off performance test by default, add option to run all unit tests * revert dummy changes in source code to trigger tests * make sure develop branch runs all unit tests --------- Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>	2025-06-20 12:48:00 -07:00
Illia Silin	56f654a826	Limit the threads to builf ck_tile engine, use ninja. (#2342 ) * limit the threads to builf ck_tile engine, use ninja * disable ck_tile engine until it can be built safely	2025-06-13 14:13:07 -07:00
Illia Silin	b76fdbe47f	Upgrade to ROCm6.4.1 and use generic targets for gfx1x. (#2274 ) * upgrade to rocm6.4.1 and use gfx1x-generic targets * add rocm version parsing * fix the gfx10-3-generic syntax in cmake	2025-06-03 07:17:35 -07:00
Illia Silin	654956bb02	Add a daily CI build on GFX950. (#2261 ) * add CI build for gfx950 * make sure gfx950 CI always uses special docker and compiler * enable codegen tests by default	2025-05-30 12:50:08 -07:00
Casey-Shi	29574f05f7	change from ninja to make (#2253 )	2025-05-28 09:25:05 -07:00
Casey-Shi	128f5a1eab	[Tile Engine] Add benchmark for tile engine gemm. (#2193 ) * initial commit -m benchmark * only support profile * fix * fix doc * add default config * add ci * fix cmake * tmp save for gen blobs * fix bug * merge * range config * test success * fix * fix * move struct * remove config property * fix config * remove comment * add cmake option & modify * add changelog * fix * format * add pydantic module to the docker image * fix * add benchmark for cold and warmp up * python format * add asm cache control * fix README * remove pydantic module * modify changelog * fix config * recover benchmark_gemm and fix * format python * refactor profiler * fix csv bug * fix codegen bug * add kernel instance object * add benchmark gemm executable * fix jenkins & delete extra header * disable warning output & enable default config * Disable sparsity for invalid warp tile combinations * fix gemm host template func * refactor gemm profiler * filter out some inmstances * default config test & fix codegen bug * add sparse flag to gen more instances --------- Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: khuagarw <khuagarw@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-05-26 22:32:36 -07:00
Illia Silin	40668c9a99	Build and store CK library deb package for all targets daily. (#2196 ) * generate and store library package for all targets * use ninja to build packages for all targets * make sure to use ftime-trace when using ninja * make sure build trace only runs on gfx9 * archive lib package and stash only library package	2025-05-16 07:40:53 -07:00
Thomas Ning	9d1e44e56a	Vectorized Transpose for Batched Transpose CK Tile Operator (#2131 ) * Shared Memory for single data point * CKTile Transpose vectorize CP1 * CKTile Transpose vectorize CP2 * CKTile Transpose vectorize CP2.1 * fixed the compile error of the transpose tile 2d * Have the correct result for the current test sample * Changes to printing tensor * fp8 support added * Debugging for transpose * solving the corner issue * Changed padding flag * Intermideate Debugging * Intermidiate Debugging * Intermediate Debugging * Finished debugging of the transpose op * Code Cleanup * Adding edge case smoke tests * Adding Transpose test to CI/CD * Adding Transpose test to CI/CD * Adding Transpose test to CI/CD * Addressing Review Comment * Addressing Comments * Addressing Comments * Measuring Perf Tests * Code Cleanup * Changlog * Added the running iterations * clang format * Fix the changelog * Fix the compilation error * change the printing factor --------- Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>	2025-05-12 00:41:45 -07:00
Illia Silin	3448e12609	Generate ckProfiler package for gfx942 only. (#2180 ) * build CI for gfx942 exclusively * run the last stage in a docker with user jenkins * update the image for the last stage * ignore perf_log if not found * archive and store all packages * use ccache for building packages	2025-05-08 13:29:14 -07:00
Illia Silin	619fba3134	re-enable ck4inductor tests by default (#2155 )	2025-05-01 12:37:27 -07:00

1 2 3 4

200 Commits