Files
Vinay Umrethe 7470dfd7af fix: use W_org matrix only where needed (#398)
* fix: minor change

use `W_org` matrix where needed...

* Update model.py

* Update model.py

* fix: Windows hash, remove BOM marker

* docs: Add info about test cases

* feat: Tests for row_normalization PRE & NONE

* feat: CI hash files for row_normalization PRE & NONE models

* feat: Documentation instructions about test suite

* add recommendation
2026-07-01 16:13:14 +05:30
..

Test Suite Guide

Whenever we change any code-logic related to src/heretic/model.py or config.toml (e.g. row_normalization, full_normalization_lora_rank, winsorization_quantile, etc) which can affect a model's reproduciblity; Use these tests which are designed to verify that those changes does not affect reproducibility, unless they are meant to (like when we'll integrate ARA branch in future).

How to test

  1. Choose any model from tiny-random org which provides tiny models useful for debugging.

Example: tiny-random/minicpm5.

Note

It is highly recommended to use a model which does not have a special_tokens_map.json file in the repo. Because those files are almost always wrong in tiny-random/* models compared to the original model.

  1. Clone that model repository using Git and generate the SHA256 hashes using sha256sum:

On Linux:

sha256sum -b * > ../SHA256SUMS.LABEL

On Windows:

sha256sum * | Out-File -Encoding utf8NoBOM ../SHA256SUMS.LABEL

Tip

On windows, sha256sum is generally pre-installed by Git for windows.

Verify with:

Get-Command sha256sum`

Expected:

CommandType     Name                                               Version    Source
-----------     ----                                               -------    ------
Application     sha256sum.exe                                      0.0.0.0    C:\Program Files\Git\usr\bin\sha256sum...

Note

You must use Windows Powershell v7.X not the core which is v5.1. This is required for -Encoding utf8NoBOM to work.

See Differences between Windows PowerShell 5.1 and PowerShell 7.x documentation.

Where LABEL describes the type of system you are running the tests on.

Example:

  • SHA256SUMS.windows (For windows)
  • SHA256SUMS.ci (For GitHub CI)
  • SHA256SUMS.linux (For linux)
  1. Run the tests with:
uv run run_tests.py

The output hashes should FAIL against the Valid hashes in SHA256SUMS file of the test model you added. This is expected since Heretic changes the model. Without Step 2, the test model's folder will simply be ignored because it will not have a hash SUMS file to compare against.

  1. After that go to the output TEST_MODEL_DIR/model folder and re-generate the Actual hashes based on the system you are using.
cd TEST_MODEL_DIR/model
sha256sum -b * > ../SHA256SUMS.LABEL # or use windows command.
  1. Re-run the tests with:
uv run run_tests.py

This time the tests should PASS because we added the new hashes which are expected to be reproduced on the same system.

  1. After that push the SHA256SUMS.LABEL files and wait for GitHub CI actions to run those tests.

Since PyTorch does not guarantee exact cross-system reproducibility regardless of configuration, multiple valid hashes can be provided for each output file. The above update must be performed for each TEST_MODEL_DIR and on each type of system.

For this, copy the Actual hash value for each mismatched unidentical file into a SHA256SUMS.ci file.

  1. After that push the SHA256SUMS.ci files and wait for GitHub CI actions to re-run those tests.

This time the tests should PASS because we added the new hashes which are expected to be reproduced on CI.