Commit Graph

11 Commits

Author SHA1 Message Date
Ville Pietilä
ae4e632c7d [rocm-libraries] ROCm/rocm-libraries#4797 (commit 1a30400)
[CK_TILE] Add CK Tile bwd weight profiler
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

To compare old CK and CK Tile, we need to extend the current CK profiler
to support running also CK Tile instance with the same API. In order to
have the same instance coverage in CK Tile compared to the old CK, I've
added code generation from old CK configurations to CK Tile instances
using the CK Builder.

## Technical Details

- The codegen python script for CK Tile fwd convs is extended to support
also bwd weight and bwd data.
- The generated instances are added to the CMake build (target
`device_grouped_conv_bwd_weight_tile_instance`s).
- A new profiler op (`grouped_conv_bwd_weight_tile`) has been added to
the CK Profiler.
2026-03-04 21:50:29 +00:00
Johannes Graner
1cd031c21d [rocm-libraries] ROCm/rocm-libraries#4800 (commit 9dcf0cf)
[CK Profiler] Instance selection for grouped conv profilers
 (#4800)

## Motivation

This PR adds instance selection support for ckProfiler grouped
convolution operations (forward, backward data, backward weight),
allowing users to run specific kernel instances rather than sweeping all
available instances.

When profiling or debugging convolution kernels, users often need to
test specific kernel configurations without running the full instance
sweep. This is particularly useful for:
- Debugging a specific failing instance
- Profiling a known-best configuration
- Quick validation during development

## Technical Details

**Features added**:
- `--instance <id>` flag to run only the N-th valid instance (0-indexed)
- `--list-instances` flag to list all valid instances without running
any kernels
- Named arguments can appear anywhere on the command line
- Best instance index is now printed with results for reference
- Python script support via `-ii` / `--instance_index` arguments

**Design decisions**:
- Named arguments (`--instance`, `--list-instances`) instead of
positional to avoid conflicts with existing parameters
- Instance index refers to the N-th valid instance (0-indexed), not the
global instance index
- Auto-disable verification when `--list-instances` is used for fast
enumeration
- Shared utilities in `profiler_arg_utils.hpp` to deduplicate parsing
logic

## Test Plan

Manual testing with various scenarios:

List all valid instances:
```bash
./bin/ckProfiler grouped_conv_fwd <usual args> --list-instances
```

Run only instance 5:
```bash
./bin/ckProfiler grouped_conv_fwd <usual args> --instance 5
```

Test cases:
- Single instance selection
- List instances mode
- Out-of-bounds instance index (verified warning messages)
- No instance flag (runs all instances - default behavior)
- All three operations (fwd, bwd_data, bwd_weight)

## Test Result

All test scenarios passed:
- Instance selection correctly filters kernel executions
- List mode enumerates valid instances without running kernels
- Invalid indices produce appropriate warnings without crashing
- Default behavior (all instances) unchanged when flags not provided
- Consistent behavior across all three grouped convolution operations

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-03 15:33:20 +00:00
yinglu
ba897f8435 ck:tf32:complement CK_ENABLE_TF32 controls (#3426) 2025-12-19 09:17:29 +08:00
yinglu
8fec8054b2 ck: add tf32 in DTYPES to control instances build(#3317) 2025-12-08 16:24:20 +08:00
Aviral Goel
0aadb4b2c4 chore(copyright): update copyright header for profiler directory (#3205)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory
2025-11-14 11:19:25 -08:00
yinglu
fada1a3cae Conv:TF32: add more instances - 2 (#2879)
* add instances of device_grouped_conv_fwd_xdl_f32_comp_instances
* add instances of device_grouped_conv_fwd_xdl_f32_tf32_mem_instances
* add instances of device_grouped_conv_fwd_xdl_large_tensor_f32_tf32_instances
* tf32:conv:add instances for base class DeviceConvFwd
* tf32:conv:add instances for base class DeviceGroupedConvBwdDataMultipleD
* tf32:conv:add instances for base class DeviceGroupedConvBwdWeight
* add tf32 in profiler
* remove gnhwc/ngchw/ngcdhw instances
* remove non-ndhwgc/nhwgc/nhwc instances
* add check in IsSupportedArgument()
2025-10-10 15:28:17 +08:00
Bartłomiej Kocot
4094ad158a Integrate universal gemm with conv bwd data and add SplitK (#1315)
* Integrate universal gemm with conv bwd data

* Fix multi d kernel

* Add splitK support

* instances refactor

* instances refactor

* refactor

* fixeS

* fixes

* 16x16 instnaces

* Fixes

* Fix

* Fix

* Fix

* Fix

* Fix

* Fixes

* fix

* fix
2025-04-28 23:54:49 +02:00
Bartłomiej Kocot
8c0ab61ece Grouped conv backward data GKCYX support (#2029)
* Grouped conv backward data GKCYX support

* profiler

* Converter

* split instances
2025-04-01 13:24:38 -07:00
Bartłomiej Kocot
c2e4898b4b Grouped conv bwd data NGCHW (#1967)
* Grouped conv bwd data NGCHW

* fixes

* fix

* Improvements

* Fix

* Fix

* add client example
2025-03-17 13:32:00 +01:00
Bartłomiej Kocot
49180fd60b Grouped 3d conv backward data support (#799)
* Grouped 3d conv backward data support

* Fix comments
2023-07-18 11:01:33 -05:00
Bartłomiej Kocot
63388e84ab Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757)
* Support bf16/f32/f16 and NHWGC conv2d_bwd_data

* Add interface test

* clang format

* Comment fixes

* Add more friendly error message
2023-06-21 08:20:31 -05:00