Commit Graph

193 Commits

Author SHA1 Message Date
Vidyasagar Ananthan
2a8d24efd4 Fixing tile engine tests after recent refactoring. (#2791)
* Fixing tile engine tests after recent refactoring.

* Fixing line break error.

[ROCm/composable_kernel commit: 60ea94f4fe]
2025-09-05 14:57:59 -07:00
Illia Silin
c217c0fa93 Fix latest AITER failure and add more AITER tests in CK CI. (#2782)
* add aiter tests and move json_dump header

* remove example/include path from cmake

* extend time for aiter and pytorch stages

[ROCm/composable_kernel commit: ef6c28e989]
2025-09-04 13:44:00 -07:00
rahjain-amd
7674eb6416 Add json dump support to output details from CK/CKTile Examples. (#2551)
* Adding RapidJson Library

* Adding Json Dumps in all CK_Tile Examples

Not verified yet

* Adding json to cktile Batched Transpose

* adding json dumps to layernorm2d_fwd

* Adding  json dump to flatmm_basic

* Adding RapidJson Library

* Adding Json Dumps in all CK_Tile Examples

Not verified yet

* Adding json to cktile Batched Transpose

* adding json dumps to layernorm2d_fwd

* Adding  json dump to flatmm_basic

* Adding json in 03_gemm

* Add json dump to 16_batched_gemm

* Add json dump to gemm_multi_d_fp16

* Add json dump to grouped_gemm

* fix fmha_bwd/fwd

* Fix clang-format errors

exclude include/rapidjson in jenkins as its a third-party library

* Saparating function and defination.

* Update Documentation of 03_gemm

* Refactoring as per code review

* Disable fp8 instances on unsupported targets (#2592)

* Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt

* Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt

* Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt

* Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt

---------

Co-authored-by: AviralGoelAMD <aviralgoel@amd.com>

* fix clang format

* remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt

* Fixing Readme and unifying jsondumps

* adding moe_smoothquant

* adding fused_moe

* Fixing Readme for batched_gemm

* Fixing Readme for grouped_gemm

* adding flatmm

* adding gemm_multi_d_fp16

* adding elementwise

* adding File name when json is dumped

* Fixing Reduce after merge

* adding batched_transpose

* Adding Warptile in Gemm

* Fixing Clang Format

---------

Co-authored-by: Aviral Goel <aviral.goel@amd.com>
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 4d041837ad]
2025-09-02 23:31:29 -07:00
Illia Silin
566b5351a3 Add a daily CI cron job to build pytorch. (#2755)
* add a stage to builf pytorch

* add docker file for pytorch stage

* call build scripts fro mthe default path

* add a daily chron build for pytorcn stage

[ROCm/composable_kernel commit: 0ac908fb57]
2025-08-27 16:57:43 -07:00
JH-Leon-KIM-AMD
7d6f0107bd Test comprehensive dataset (#2685)
* Add CSV-driven convolution test pipeline

- Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality
- Add complete dataset generation toolchain in test_data/
- Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter
- Ready for comprehensive convolution testing with scalable datasets

* Update convolution test dataset generation pipeline

* add 2d, 3d dataset csv files

* Remove CSV test dataset files from repository

* Update generate_test_dataset.sh

* Fix channel division for MIOpen to CK conversion

* Remove unnecessary test files

* Fix clang-format-18 formatting issues

* TEST: Enable comprehensive dataset tests by default

* Fix test_data path in Jenkins - build runs from build directory

* Add Python dependencies and debug output for CSV generation

* Remove Python package installation - not needed

* Add better debugging for generate_test_dataset.sh execution

* Fix Jenkinsfile syntax error - escape dollar signs

* Add PyTorch to Docker image for convolution test dataset generation

- Install PyTorch CPU version for lightweight model execution
- Fixes Jenkins CI failures where CSV files were empty due to missing PyTorch
- Model generation scripts require PyTorch to extract convolution parameters

* Add debugging to understand Jenkins directory structure and CSV file status

- Print current working directory
- List CSV files in test_data directory
- Show line counts of CSV files
- Will help diagnose why tests fail in Jenkins

* Fix clang-format-18 formatting issues

- Applied clang-format-18 to test file
- Fixed brace placement and whitespace issues

* Add detailed debugging for CSV dataset investigation

- Check generated_datasets directory contents
- List all CSV files with line counts
- Show first 5 lines of main CSV file
- Applied clang-format-18 formatting
- This will help identify why CSV files are empty in Jenkins

* keep testing add pytorch installation in shell script

* Use virtual environment for PyTorch installation

- Jenkins user doesn't have permission to write to /.local
- Create virtual environment in current directory (./pytorch_venv)
- Install PyTorch in virtual environment to avoid permission issues
- Use PYTHON_CMD variable to run all Python scripts with correct interpreter
- Virtual environment will be reused if it already exists

* Remove debug code and reduce verbose logging in Jenkins

- Remove bash -x and debug commands from Jenkinsfile execute_args
- Remove all debug system() calls and getcwd from C++ test file
- Remove unistd.h include that was only needed for getcwd
- Remove debug print in CSV parser
- Add set +x to generate_test_dataset.sh to disable command echo
- Redirect Python script stdout to /dev/null for cleaner output

This makes Jenkins logs much cleaner while still showing progress messages.

* install gpu torch

* Clean up and optimize comprehensive dataset test pipeline

- Reorder Jenkinsfile execution: build -> generate data -> run test
- Remove commented-out debug code from generate_test_dataset.sh
- Ensure all files end with proper newline character (POSIX compliance)
- Keep useful status messages while removing development debug prints
- Set MAX_ITERATIONS=0 for unlimited test generation in production

* Add configuration modes to reduce test execution time

- Add --mode option (half/full) to generate_model_configs.py
  - half mode (default): ~278 configs (224 2D + 54 3D) -> ~1,058 total tests
  - full mode: ~807 configs (672 2D + 135 3D) -> ~3,093 total tests
- Update generate_test_dataset.sh to use CONFIG_MODE environment variable
- Keeps all model types but reduces parameter combinations intelligently
- Fixes Jenkins timeout issue (was running 3,669 tests taking 17+ hours)
- Default half mode should complete in ~4-5 hours instead of 17+ hours

* Add small mode for quick testing of comprehensive dataset

* jenkins pipeline test done

* jenkins test done

* Trigger CI build

* remove test comment and update data generation option as half

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 19d5327c45]
2025-08-26 22:18:05 +02:00
John Shumway
a25eb35712 Add a CMake property for c++ standard (17 or 20) (#2736)
Configure C++ standard with a CMake variable.

Defaults to C++20, but can be set to C++17  to test backwards compatibility.

* Add validation for allowed C++ standards.

* build CK in rehl8 docker with std=c++17

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 99d27aca17]
2025-08-25 18:56:58 -07:00
Aviral Goel
ceb2877c25 build!: Update composable kernel version to 1.2.0 for rocm 7.0 release (#2734)
* build!: Update composable kernel version to 1.2.0 for rocm 7.0 release

[ROCm/composable_kernel commit: bb6132116f]
2025-08-25 13:48:51 -04:00
Illia Silin
4f611a6bc4 Resolve issues with performance logs in CI. (#2733)
* update the performance test logic

* fix unstash perf logs logic

* untangle unstashing fmha logs for different archs

* run process stage after running fmha tests

* fix the processing of perf logs

* fix arguments for run_performance scripts

[ROCm/composable_kernel commit: 6180685688]
2025-08-25 09:51:29 -07:00
Illia Silin
687c09f3ab Build ckProfiler package for all architectures. (#2701)
* stash ckprofiler package built for all targets

* build the lib for all instances in newer docker

* make sure packages get posted

[ROCm/composable_kernel commit: 8b55afcd93]
2025-08-18 11:16:25 -07:00
Tianyuan Wu
ec7ee5b7b7 [CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 (#2466)
* WMMA GEMM F16 Implementation

Signed-off-by: root <tianyuwu@amd.com>

* Self-review

Signed-off-by: root <tianyuwu@amd.com>

* ASIC check minor tweak

Signed-off-by: root <tianyuwu@amd.com>

* add missing include file

* Set GPU_TARGETS to gfx11/12 generic

Signed-off-by: root <tianyuwu@amd.com>

* INT8 GFX12

Signed-off-by: root <tianyuwu@amd.com>

* add int8x16 branch

* Fix CI script

Signed-off-by: root <tianyuwu@amd.com>

* Fix typo

Signed-off-by: root <tianyuwu@amd.com>

* Add CK_Tile WMMA example

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* Fix CI

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* fix clang format

* Set M/N_Warp Back to Constant

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* Use GemmConfigComputeV3 by default

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Remove CK_Tile wmma gemm examples from the CI list

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Add atomic add fallback method for gfx11

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix typo

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Omit copyright year

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Support non-square cases

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix CI

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Add get_device_ip()

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Revert "Add atomic add fallback method for gfx11"

This reverts commit 4f664969c01b37976c8518c19833d9f1574cd746.

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12"

This reverts commit 949129a3858a825b2a2c4d3ec01663df18a165a5.

* Revise method name and typos

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* clang-format

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Try fix CI

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Revert "Try fix CI"

This reverts commit 084c683227e64ab6a8137db00c8165fb05bdc902.

* clang-format

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix typo caused by merge

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* Fix typo caused by merging

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

---------

Signed-off-by: root <tianyuwu@amd.com>
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
Co-authored-by: joye <joye@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 68134b60e4]
2025-08-15 16:22:27 -07:00
Thrupti Raj Lakshmana Gowda
b21e1f74ee Variable name correction in Jenkins file (#2686)
[ROCm/composable_kernel commit: 1c2078066b]
2025-08-14 13:35:55 -07:00
JH-Leon-KIM-AMD
4bc6c568bd CSV-driven convolution test pipeline (#2581)
* Add CSV-driven convolution test pipeline

- Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality
- Add complete dataset generation toolchain in test_data/
- Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter
- Ready for comprehensive convolution testing with scalable datasets

* Update convolution test dataset generation pipeline

* add 2d, 3d dataset csv files

* Remove CSV test dataset files from repository

* Update generate_test_dataset.sh

* Fix channel division for MIOpen to CK conversion

* Remove unnecessary test files

* Fix clang-format-18 formatting issues

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: b963478759]
2025-08-13 16:24:34 +02:00
Thrupti Raj Lakshmana Gowda
1d1d6717d2 GEMM Multi D for CK Tile Engine (#2660)
* Readme for GEMM Multi D

* GEMM Multi D partial Progress

* GEMM Multi D partial Progress!

* CK Tile Engine GEMM Multi D : All Python files generated

* Partial Progress

* Partial Progress

* Partial Progress

* Partial Progress : Incorrect Result

* Partial Progress : Debugging

* Partial Progress : Correct Results

* Partial Progress - Incorrect Results

* Partial Progress - Commenting Passthrough bypass logic

* Changing Passthrough to MultiplyMultiply

* Correct Results!

* Fix and debug the pass through feature

* Sample commit

* Correct Results : MultiplyMultiply

* Code Cleanup

* Removing Failed Instances

* Working code before Unary element support

* Custom Elementwise Function support and working implementation for Mul and Add

* Updating README

* Working for Passthrough

* Review Comments : Minor Fixes

* Review Comments : Minor Fixes

* Readme Updated

* Partial Changes after Rebase

* Working Code : Changes after Rebase

* Updating Jenkins file

* Removing default value changed while testing

* Configuration changes in config files

* Tile Handler changes in GEMM Multi D Tile Engine

* Tile Handler changes in GEMM Multi D Example

* Change log for Gemm Multi D in CK Tile Engine

* Configuration changes in config files

---------

Co-authored-by: ThomasNing <thomasning@amd.com>

[ROCm/composable_kernel commit: 3f57ec3d2d]
2025-08-12 16:05:05 -07:00
Illia Silin
5b5ae5f81d fix builds with mainline/staging compilers (#2674)
[ROCm/composable_kernel commit: bbf41b27f2]
2025-08-12 10:23:08 -07:00
Illia Silin
3c0626e2c1 enable aiter test_mha in daily CI (#2659)
[ROCm/composable_kernel commit: 6bfef63414]
2025-08-11 09:50:33 -07:00
Illia Silin
f5b69dbdc2 remove ck_tile transpose and gemm stages from CI (#2646)
[ROCm/composable_kernel commit: 8613aa1e40]
2025-08-08 10:48:44 -07:00
Illia Silin
45758022ee Add daily AITER tests on gfx942. (#2639)
* add option to select aiter branch, add tests on gfx942

[ROCm/composable_kernel commit: 7ac850ac72]
2025-08-08 09:30:46 -07:00
Illia Silin
69cfe33716 Revert "Reduce build time tile engine (#2579)" (#2623)
This reverts commit 19caeff665a8d9c499e28ff4e1703d1c87602162.

[ROCm/composable_kernel commit: 833ae1d051]
2025-08-05 09:27:55 -07:00
Thomas Ning
aa2f9b4c73 Reduce build time tile engine (#2579)
* Modify CMakeLists to allow for splitting.

* Modify CMakeLists for data and layout logic.

* Run tests and get build artifact.

* Test new Cmakelists for speedup.

* Further improvements for speedup.

* turn off the FMHA

* turn off the automatic tile engine gemm

* minor fix

* disable the transpose test first

* Address the comment

* Jenkinsfile

* change the make thread to 64

* change the compile thread to 32

* Try to use with less OS memory space

* Have the Unity build batch size to 2

* reduce the chunk size

---------

Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>

[ROCm/composable_kernel commit: e5b79b26fa]
2025-08-01 14:42:33 -07:00
Illia Silin
6c5d1d39b8 Add a daily CI stage to test AITER with latest CK. (#2598)
* add a CI stage for AITER testing

[ROCm/composable_kernel commit: e6104daecc]
2025-08-01 07:55:51 -07:00
Bartłomiej Kocot
f25da17c36 Enable multiple D for grouped conv fwd large tensors (#2572)
[ROCm/composable_kernel commit: 5b244105d9]
2025-07-28 22:39:07 +02:00
Illia Silin
3345f5f417 upgrade from clang-format-12 to clang-format-18 (#2568)
* upgrade to clang-format-18

* update to clang-format-18 in pre-commit-config

[ROCm/composable_kernel commit: 504b101da3]
2025-07-28 11:34:07 -07:00
Illia Silin
61ff984dcd use ninja to build packages (#2575)
[ROCm/composable_kernel commit: 9786087010]
2025-07-28 11:04:12 -07:00
Illia Silin
ff763142f1 disable building CI for gfx942 by default (#2529)
[ROCm/composable_kernel commit: ead17e6265]
2025-07-18 12:25:24 -07:00
Thrupti Raj Lakshmana Gowda
f3f2716ebb [CKTILE] Layout Support for CK Tile engine (#2482)
* Updating runtime log message for CK TILE ENGINE

* CKTile layout from config

* CKTile custom config for CI

* Documentation for Layout Changes

* CKTile Layout changes  to Jenkins

* Fixing Clang Format

* Changes to Jenkins file to fix error

* fix(cmake-ck-dev): no longer sets invalid values as gpu arch

* style(py files): ruff formatting

* fix(cmake-ck-release): no longer sets invalid values as gpu arch

* chore(cmake-tile_engine): add reminder to uncomment user config json

* Changes to jenkin file to address more cases

* Changes to Jenkins to fix Error

* Changes to Jenkins file for fixing an error

* Update Jenkinsfile (#2517)

* Update Jenkinsfile

---------

Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 0f3083ab5c]
2025-07-17 12:19:41 -07:00
Illia Silin
a9c43b9098 Use a clang20 compiler for gfx950 builds. (#2504)
* update docker tag for gfx950 ci build

* update compiler path for gfx950 ci build

* suppress compiler path override for gfx950

* clean up

[ROCm/composable_kernel commit: f5d1e3fa48]
2025-07-16 07:37:53 -07:00
Vidyasagar Ananthan
eb26ffa875 New ninja tracing script (#2472)
* Adding ninja log json convertion utility

* Updating to match old ninjatracing

* Updating Jenkins to use new ninjatracing

* Ensuring v7 works

* Removing old ninjatracing from dockerfile

[ROCm/composable_kernel commit: e391b025a0]
2025-07-08 22:36:50 -07:00
Vidyasagar Ananthan
89f226aace Separating ninja build tracing and setting flag to false (#2470)
* Separating ninja build tracing and setting flag to false

* Add ftime-tracing flag

* Fix conditional issue

* Try adding a script block

* Embed Clang analysis in ftime trace block

[ROCm/composable_kernel commit: 33d704a6f9]
2025-07-08 10:52:00 -07:00
Vidyasagar Ananthan
bd341803f2 Remove ftime tracing to avoid printing json files (#2452)
* Remove ftime tracing to avoid printing json files

* Factoring out build commands

[ROCm/composable_kernel commit: d2536b91bc]
2025-07-03 07:54:12 -07:00
Vidyasagar Ananthan
7125833c40 Fix an earlier static check error due to assignment of variable in Jenkinsfile (#2420)
* Testing assignment of param fix

* Removing redundant changes

* Adding back unit test runs

* Ensuring Jenkins changes work on develop - to be reverted

* Revert "Ensuring Jenkins changes work on develop - to be reverted"

This reverts commit cf1cab4a43.

[ROCm/composable_kernel commit: 2fa9270a25]
2025-06-28 07:07:14 -07:00
Thomas Ning
189056103f Revert "Enable builds on gfx942 by default and run all tests on develop branc…" (#2418)
This reverts commit e4f117a18e6856d19730c3c8be6cffcb9a3dc12d.

[ROCm/composable_kernel commit: 28a63d7dcb]
2025-06-27 16:40:10 -07:00
Khushbu Agarwal
d33891768a Enabling diff datatypes for tile_engine and build with more granularity (#2392)
* merging recent changes to universal gemm to tile_engine

* Reducing Linking time by generating less intermediate files

* make small libs to build faster

* Reducing the instances

* reducing instances

* Restoring default config

* Restoring default config

* warp_n reverted in default config

* Adding diff json files for fp8 and fp16, cmake changes for fp8

* Restructure the CMake File

* Added more granularity for build and some debugging code

* removed some of debugging statements

* added fp8 instances

* tahe datatype from command line to enable both type of json files

* updated README file

* code cleanup

* code cleanup

* updated jenkinsfile

* enable tile_engine daily builds

* updating cmake file

* updated CMakeLists.txt

* Updating CMake code fixing gfx12 build

* Updating CMake code fixing gfx12 build

* Fix CMake file null checks

* fixed traces of rebase

* Update tile_engine/ops/gemm/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* fixing rebase issue

---------

Co-authored-by: khushbu <khuagarw@gmail.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: a14753b86f]
2025-06-25 15:18:24 -07:00
Illia Silin
44656b6230 Enable builds on gfx942 by default and run all tests on develop branch. (#2408)
* add switches for architectures and force develop to run all tests

* move the test condition inside the function

* enable build on gfx942 by default

[ROCm/composable_kernel commit: 6d6f4c76c1]
2025-06-25 08:01:50 -07:00
Illia Silin
804b16c2d9 Introduce dependency-based CI test selection. (#2377)
* Selective test filter initial commit.

* Expanded folder paths for parsing ninja dependencies.

* Fixing default branch name in the test evaluation script.

* Fixing paths for robustness and adding ctest command to the launch script.

* change jenkins file and few tests to upgrade CI

* Setting ninja build path.

* Fixing typo in Jenkinsfile, and wrong paths.

* Fixing typo in launch script.

* add few more tests to check CI logic

* Fixing header for shell script.

* turn off performance test by default, add option to run all unit tests

* revert dummy changes in source code to trigger tests

* make sure develop branch runs all unit tests

---------

Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>

[ROCm/composable_kernel commit: c3c8c6a10f]
2025-06-20 12:48:00 -07:00
Illia Silin
23d7007455 Limit the threads to builf ck_tile engine, use ninja. (#2342)
* limit the threads to builf ck_tile engine, use ninja

* disable ck_tile engine until it can be built safely

[ROCm/composable_kernel commit: 56f654a826]
2025-06-13 14:13:07 -07:00
Illia Silin
d5d10f8e88 Upgrade to ROCm6.4.1 and use generic targets for gfx1x. (#2274)
* upgrade to rocm6.4.1 and use gfx1x-generic targets

* add rocm version parsing

* fix the gfx10-3-generic syntax in cmake

[ROCm/composable_kernel commit: b76fdbe47f]
2025-06-03 07:17:35 -07:00
Illia Silin
3726830d59 Add a daily CI build on GFX950. (#2261)
* add CI build for gfx950

* make sure gfx950 CI always uses special docker and compiler

* enable codegen tests by default

[ROCm/composable_kernel commit: 654956bb02]
2025-05-30 12:50:08 -07:00
Casey-Shi
b7c31ca612 change from ninja to make (#2253)
[ROCm/composable_kernel commit: 29574f05f7]
2025-05-28 09:25:05 -07:00
Casey-Shi
3bcbdd608e [Tile Engine] Add benchmark for tile engine gemm. (#2193)
* initial commit -m benchmark

* only support profile

* fix

* fix doc

* add default config

* add ci

* fix cmake

* tmp save for gen blobs

* fix bug

* merge

* range config

* test success

* fix

* fix

* move struct

* remove config property

* fix config

* remove comment

* add cmake option & modify

* add changelog

* fix

* format

* add pydantic module to the docker image

* fix

* add benchmark for cold and warmp up

* python format

* add asm cache control

* fix README

* remove pydantic module

* modify changelog

* fix config

* recover benchmark_gemm and fix

* format python

* refactor profiler

* fix csv bug

* fix codegen bug

* add kernel instance object

* add benchmark gemm executable

* fix jenkins & delete extra header

* disable warning output & enable default config

* Disable sparsity for invalid warp tile combinations

* fix gemm host template func

* refactor gemm profiler

* filter out some inmstances

* default config test & fix codegen bug

* add sparse flag to gen more instances

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 128f5a1eab]
2025-05-26 22:32:36 -07:00
Illia Silin
949ba112e6 Build and store CK library deb package for all targets daily. (#2196)
* generate and store library package for all targets

* use ninja to build packages for all targets

* make sure to use ftime-trace when using ninja

* make sure build trace only runs on gfx9

* archive lib package and stash only library package

[ROCm/composable_kernel commit: 40668c9a99]
2025-05-16 07:40:53 -07:00
Thomas Ning
1f7c2a88c0 Vectorized Transpose for Batched Transpose CK Tile Operator (#2131)
* Shared Memory for single data point

* CKTile Transpose vectorize CP1

* CKTile Transpose vectorize CP2

* CKTile Transpose vectorize CP2.1

* fixed the compile error of the transpose tile 2d

* Have the correct result for the current test sample

* Changes to printing tensor

* fp8 support added

* Debugging for transpose

* solving the corner issue

* Changed padding flag

* Intermideate Debugging

* Intermidiate Debugging

* Intermediate Debugging

* Finished debugging of the transpose op

* Code Cleanup

* Adding edge case smoke tests

* Adding Transpose test to CI/CD

* Adding Transpose test to CI/CD

* Adding Transpose test to CI/CD

* Addressing Review Comment

* Addressing Comments

* Addressing Comments

* Measuring Perf Tests

* Code Cleanup

* Changlog

* Added the running iterations

* clang format

* Fix the changelog

* Fix the compilation error

* change the printing factor

---------

Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>

[ROCm/composable_kernel commit: 9d1e44e56a]
2025-05-12 00:41:45 -07:00
Illia Silin
c940deb092 Generate ckProfiler package for gfx942 only. (#2180)
* build CI for gfx942 exclusively

* run the last stage in a docker with user jenkins

* update the image for the last stage

* ignore perf_log if not found

* archive and store all packages

* use ccache for building packages

[ROCm/composable_kernel commit: 3448e12609]
2025-05-08 13:29:14 -07:00
Illia Silin
9063797aee re-enable ck4inductor tests by default (#2155)
[ROCm/composable_kernel commit: 619fba3134]
2025-05-01 12:37:27 -07:00
Illia Silin
e6e687d8a7 add write permissions in workspace (#2154)
[ROCm/composable_kernel commit: b9d17bdb11]
2025-05-01 07:04:57 -07:00
Max Podkorytov
a65374ed85 try building ck4inductor and testing it inside a virtual environment (#2142)
use system virtualenv

use python-full ubuntu package in docker image

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 6601931949]
2025-04-29 17:22:38 -07:00
Illia Silin
8346274d24 Run CI jobs as user jenkins (#2141)
* run CI as jenkins

* remove user jenkins from docker image

* move inductor installation to a writeable path

* add a switch for inductor tests

[ROCm/composable_kernel commit: 8fcb4dff1a]
2025-04-29 07:35:10 -07:00
Bartłomiej Kocot
7942bb905b Integrate universal gemm with conv bwd data and add SplitK (#1315)
* Integrate universal gemm with conv bwd data

* Fix multi d kernel

* Add splitK support

* instances refactor

* instances refactor

* refactor

* fixeS

* fixes

* 16x16 instnaces

* Fixes

* Fix

* Fix

* Fix

* Fix

* Fix

* Fixes

* fix

* fix

[ROCm/composable_kernel commit: 4094ad158a]
2025-04-28 23:54:49 +02:00
Illia Silin
b29fc0efce fix daily gfx942 build (#2106)
[ROCm/composable_kernel commit: ce61759538]
2025-04-21 08:48:22 -07:00
Illia Silin
729c668e9d Upgrade default docker to Ubuntu24.04 (#2090)
* upgrade docker to Ubuntu24.04

* add break-system-packages flag to pip install

* fix dockerfile

[ROCm/composable_kernel commit: 3bb62f16cd]
2025-04-16 12:10:15 -07:00
Illia Silin
1a8132e9f9 Upgrade default docker image to ROCm6.4 release. (#2082)
* upgrade to rocm6.4

* fix gfx10 generic target syntax

* use gfx1101 target for unit tests

* use gfx1201 target for unit tests

* do not use generic targets until 6.4.1 release

* update target list and dockerfile.compiler

[ROCm/composable_kernel commit: d55c9cb313]
2025-04-14 16:41:47 -07:00