Modernize NVHPC CI job (to make it working again): Ubuntu-24.04 runner, NVHPC 25.11 (#5935)

* Limit busy-wait loops in per-subinterpreter GIL test

Add explicit timeouts to the busy-wait coordination loops in the
Per-Subinterpreter GIL test in tests/test_with_catch/test_subinterpreter.cpp.
Previously those loops spun indefinitely waiting for shared atomics like
`started` and `sync` to change, which is fine when CPython's free-threading
and per-interpreter GIL behavior matches the test's expectations but becomes
pathologically bad when that behavior regresses: the `test_with_catch`
executable can then hang forever, causing our 3.14t CI jobs to time out
after 90 minutes.

This change keeps the structure and intent of the test but adds a
std::chrono::steady_clock deadline to each of the coordination loops,
using a conservative 10 second bound. Worker threads record a failure and
return if they hit the timeout, while the main thread fails the test via
Catch2 instead of hanging. That way, if future CPython free-threading
patches change the semantics again, the test will fail quickly and
produced a diagnosable error instead of wedging the CI job.

* Revert "Limit busy-wait loops in per-subinterpreter GIL test"

This reverts commit 7847adacda.

* Add progress reporter for test_with_catch Catch runner

Introduce a custom Catch2 reporter for tests/test_with_catch that prints a
simple one-line status for each test case as it starts and ends, and wire the
cpptest CMake target to invoke test_with_catch with -r progress. This makes
it much easier to see where the embedded/interpreter test binary is spending
its time in CI logs, and in particular to pinpoint which test case is stuck
when the free-threading builds hang.

Compared to adding ad hoc timeouts around potentially infinite busy-wait
loops in individual tests, a progress reporter is a more general and robust
approach: it gives visibility into all tests (including future ones) without
changing their behavior, and turns otherwise opaque 90-minute timeouts into
locatable issues in the Catch output.

* Temporarily limit CI to Python 3.14t free-threading jobs

* Temporarily remove non-CI GitHub workflow files

* Temporarily disable AppVeyor builds via skip_commits

* Add DEBUG_LOOK in TEST_CASE("Move Subinterpreter")

* Add Python version banner to Catch progress reporter

Print the CPython version once at the start of the Catch-based
interpreter tests using Py_GetVersion(). This makes it trivial to
confirm which free-threaded build a failing run is using when
inspecting CI or local logs.

* Revert "Add DEBUG_LOOK in TEST_CASE("Move Subinterpreter")"

This reverts commit ad3e1c34ce.

* Pin CI free-threaded runs to CPython 3.14.0t

Update the standard-small and standard-large GitHub Actions jobs to
request python-version 3.14.0t instead of 3.14t. This forces setup-python
to use the last-known-good 3.14.0 free-threaded build rather than the
newer 3.14.1+ builds where subinterpreter finalization regressed.

* Revert "Pin CI free-threaded runs to CPython 3.14.0t"

This reverts commit 5281e1c20c.

* Revert "Temporarily disable AppVeyor builds via skip_commits"

This reverts commit ed11292636.

* Revert "Temporarily remove non-CI GitHub workflow files"

This reverts commit 0fe6a42a04.

* Revert "Temporarily limit CI to Python 3.14t free-threading jobs"

This reverts commit 60ae0e8f74.

* Pin CI free-threaded runs to CPython 3.14.0t

Update the standard-small and standard-large GitHub Actions jobs to
request python-version 3.14.0t instead of 3.14t. This forces setup-python
to use the last-known-good 3.14.0 free-threaded build rather than the
newer 3.14.1+ builds where subinterpreter finalization regressed.

* Switch NVHPC job to ubuntu-24.04 and disable AppVeyor

* Temporarily trim workflows to focus on NVHPC job

* First restore ci.yml from test-with-catch-timeouts branch, then delete all jobs except ubuntu-nvhpc7

* Change runner to ubuntu-24.04

* Use nvhpc-25-11

* Undo ALL changes relative to master (i.e. this branch is now an exact copy of master)

* Change runner to ubuntu-24.04

* Use nvhpc-25-11

* Remove misleading 7 from job name (i.e. ubuntu-nvhpc7 → ubuntu-nvhpc)
This commit is contained in:
Ralf W. Grosse-Kunstleve
2025-12-14 19:01:34 -08:00
committed by GitHub
parent 5b379161aa
commit d4f9cfbc28

View File

@@ -470,10 +470,10 @@ jobs:
# Testing on Ubuntu + NVHPC (previous PGI) compilers, which seems to require more workarounds
ubuntu-nvhpc7:
ubuntu-nvhpc:
if: github.event.pull_request.draft == false
runs-on: ubuntu-22.04
name: "🐍 3 • NVHPC 23.5 • C++17 • x64"
runs-on: ubuntu-24.04
name: "🐍 3 • NVHPC 25.11 • C++17 • x64"
timeout-minutes: 90
env:
@@ -491,7 +491,7 @@ jobs:
run: |
sudo apt-get update -y && \
sudo apt-get install -y cmake environment-modules git python3-dev python3-pip python3-numpy && \
sudo apt-get install -y --no-install-recommends nvhpc-23-5 && \
sudo apt-get install -y --no-install-recommends nvhpc-25-11 && \
sudo rm -rf /var/lib/apt/lists/*
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade pytest
@@ -502,7 +502,7 @@ jobs:
shell: bash
run: |
source /etc/profile.d/modules.sh
module load /opt/nvidia/hpc_sdk/modulefiles/nvhpc/23.5
module load /opt/nvidia/hpc_sdk/modulefiles/nvhpc/25.11
cmake -S . -B build -DDOWNLOAD_CATCH=ON \
-DCMAKE_CXX_STANDARD=17 \
-DPYTHON_EXECUTABLE=$(python3 -c "import sys; print(sys.executable)") \
@@ -510,7 +510,7 @@ jobs:
-DPYBIND11_TEST_FILTER="test_smart_ptr.cpp"
- name: Build
run: cmake --build build -j 2 --verbose
run: cmake --build build -j $(nproc) --verbose
- name: Python tests
run: cmake --build build --target pytest