Commit Graph

3 Commits

Author SHA1 Message Date
Emily Martins
2e4b8a8fc4 [CK_TILE] Remove Old CK Tile Stream-K Artifacts (#3202)
* Remove old CK Tile Stream-K implementation

The original CK Stream-K implementation was based on old CK's Stream-K
block to C tile map. However, this implementation did not align with the
original Stream-K paper. Thus, we implemented a new tile partitioner and
associated Stream-K kernel, which was placed in the reboot namespace.

Now that the new Stream-K implementation is ready, this change removes
all artifacts of the old implementation. Specifically, the following
changes were made:
- Removes old Stream-K tile partitioner from CK Tile
- Removes the reboot namespace such that the new implementation resides
  in the ck_tile namespace only.
- Adds tests for bf8 and fp8 using the new implementation
- Removes tests for the old implementation
- Remove the v2 suffix from the new CK Tile Tile Partitioner
derived classes.
- Updates Stream-K Kernel ops file to use /** commenting style.

* Remove v2 from tile partitioner validation function names
2025-11-20 09:32:32 -07:00
Emily Martins
352dee5225 Fix CK Tile Stream-K BF16 Validation Errors (#3039)
Prior to this change, the number of accumulations passed into
calculate_rtol_atol was 1. That said, in most cases, this is not correct
when there are multiple workgroups contributing to the same macro tile
in C.

This change ensures uses the function estimate_num_wgs_per_tile, which
was extracted into a common file and generalized, to estimate the number
of workgroups per macro tile. This estimate is passed into
calculate_rtol_atol to ensure we get a better relative and absolute
tolerance.
2025-10-17 09:33:38 -07:00
arai713
0282d98412 [CK TILE] Stream-K tile partitioner (#2708)
* initial commit for skeleton code

* replaced skeleton code with old streamk b2c map functions from old CK, still need to clean up the code

* fixed up code to match CK Tile convention: data type changes, naming changes, etc.

* change for num_sk_blocks data type

* formatting fix

* minor fixes

* moved reduction argument to template

* resolved comments from PR review: standardizing naming, pruning unneeded code

* resolve errors from merge of device op PR: moved enum to common file

* switching to uint32_t due to implementation constraints: divmod only takes uint32_t and mixing signed and unsigned types causes problems

* unsigned type fix

* add const qualifier

* added documentation for template parameters

* documentation edit
2025-09-03 13:38:17 -07:00