Working multi gpu training. Still need a lot of tweaks and testing.

This commit is contained in:
Jaret Burkett
2025-01-25 16:46:20 -07:00
parent 441474e81f
commit 5e663746b8
9 changed files with 432 additions and 294 deletions

3
todo_multigpu.md Normal file
View File

@@ -0,0 +1,3 @@
- only do ema on main device? shouldne be needed other than saving and sampling
- check when to unwrap model and what it does
- disable timer for non main local