* fix: minor change use `W_org` matrix where needed... * Update model.py * Update model.py * fix: Windows hash, remove BOM marker * docs: Add info about test cases * feat: Tests for row_normalization PRE & NONE * feat: CI hash files for row_normalization PRE & NONE models * feat: Documentation instructions about test suite * add recommendation
Test Suite Guide
Whenever we change any code-logic related to src/heretic/model.py or config.toml (e.g. row_normalization, full_normalization_lora_rank, winsorization_quantile, etc) which can affect a model's reproduciblity; Use these tests which are designed to verify that those changes does not affect reproducibility, unless they are meant to (like when we'll integrate ARA branch in future).
How to test
- Choose any model from tiny-random org which provides tiny models useful for debugging.
Example: tiny-random/minicpm5.
Note
It is highly recommended to use a model which does not have a
special_tokens_map.jsonfile in the repo. Because those files are almost always wrong intiny-random/*models compared to the original model.
- Clone that model repository using Git and generate the SHA256 hashes using
sha256sum:
On Linux:
sha256sum -b * > ../SHA256SUMS.LABEL
On Windows:
sha256sum * | Out-File -Encoding utf8NoBOM ../SHA256SUMS.LABEL
Tip
On windows,
sha256sumis generally pre-installed by Git for windows.
Verify with:
Get-Command sha256sum`
Expected:
CommandType Name Version Source
----------- ---- ------- ------
Application sha256sum.exe 0.0.0.0 C:\Program Files\Git\usr\bin\sha256sum...
Note
You must use Windows Powershell
v7.Xnot the core which isv5.1. This is required for-Encoding utf8NoBOMto work.See Differences between Windows PowerShell 5.1 and PowerShell 7.x documentation.
Where LABEL describes the type of system you are running the tests on.
Example:
SHA256SUMS.windows(For windows)SHA256SUMS.ci(For GitHub CI)SHA256SUMS.linux(For linux)
- Run the tests with:
uv run run_tests.py
The output hashes should FAIL against the Valid hashes in SHA256SUMS file of the test model you added. This is expected since Heretic changes the model. Without Step 2, the test model's folder will simply be ignored because it will not have a hash SUMS file to compare against.
- After that go to the output
TEST_MODEL_DIR/modelfolder and re-generate the Actual hashes based on the system you are using.
cd TEST_MODEL_DIR/model
sha256sum -b * > ../SHA256SUMS.LABEL # or use windows command.
- Re-run the tests with:
uv run run_tests.py
This time the tests should PASS because we added the new hashes which are expected to be reproduced on the same system.
- After that push the
SHA256SUMS.LABELfiles and wait for GitHub CI actions to run those tests.
Since PyTorch does not guarantee exact cross-system reproducibility regardless of configuration, multiple valid hashes can be provided for each output file. The above update must be performed for each TEST_MODEL_DIR and on each type of system.
For this, copy the Actual hash value for each mismatched unidentical file into a SHA256SUMS.ci file.
- After that push the
SHA256SUMS.cifiles and wait for GitHub CI actions to re-run those tests.
This time the tests should PASS because we added the new hashes which are expected to be reproduced on CI.